mirror of
https://github.com/velocitatem/PHANTOM.git
synced 2026-05-31 08:33:36 +00:00
Created detailed documentation for implementing multi-task learning system to improve agent detection and dynamic pricing: - GAMEPLAN_MULTITASK_PRICING.md: Complete 50+ page technical specification including feature engineering, supervised learning, multi-task neural networks, synthetic simulator, and knowledge distillation approach - ARCHITECTURE_OVERVIEW.md: Quick reference with visual diagrams comparing current rule-based system to proposed ML architecture, metrics, and implementation phases Key improvements proposed: - Replace O(n²) SessionState pipeline with vectorized feature extraction - Train XGBoost classifier on experimentId labels (ROC-AUC >0.90 target) - Multi-task neural network for joint agent detection + purchase prediction - Gymnasium-based synthetic pricing environment for safe experimentation - Knowledge distillation to extract interpretable pricing heuristics Addresses margin leakage concerns with learned pricing strategies instead of simple velocity thresholds.
404 lines
27 KiB
Markdown
404 lines
27 KiB
Markdown
# Multi-Task Learning Architecture - Quick Reference
|
||
|
||
## Current System (Baseline)
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ CURRENT STATE │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ Browser Events → Next.js → FastAPI → Kafka (user-interactions) │
|
||
│ ↓ │
|
||
│ Airflow (every 15min) │
|
||
│ ↓ │
|
||
│ [Messy SessionState Pipeline] │
|
||
│ ↓ │
|
||
│ Simple Rule-Based Pricing: │
|
||
│ - Surge (if demand > 10) │
|
||
│ - Elasticity formula │
|
||
│ - Velocity threshold for agents │
|
||
│ ↓ │
|
||
│ Redis (prices) │
|
||
│ ↓ │
|
||
│ Pricing Provider API │
|
||
│ │
|
||
│ ISSUES: │
|
||
│ ✗ O(n²) feature extraction │
|
||
│ ✗ No supervised ML for agent detection │
|
||
│ ✗ Simple heuristics (velocity > 5 → agent) │
|
||
│ ✗ No learning from data │
|
||
│ ✗ Margin leakage not effectively addressed │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
## Proposed System (Multi-Task Learning)
|
||
|
||
```
|
||
┌──────────────────────────────────────────────────────────────────────────┐
|
||
│ PHASE 1: DATA PIPELINE │
|
||
├──────────────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ Kafka (user-interactions) │
|
||
│ ↓ │
|
||
│ ┌─────────────────────────────────────┐ │
|
||
│ │ VECTORIZED FEATURE PIPELINE │ │
|
||
│ ├─────────────────────────────────────┤ │
|
||
│ │ 1. TemporalFeatureExtractor │ → 8 features (velocity, etc.) │
|
||
│ │ 2. BehavioralFeatureExtractor │ → 10 features (carts, hovers) │
|
||
│ │ 3. ProductFeatureExtractor │ → 8 features (prices, depth) │
|
||
│ │ 4. UserAgentParser │ → 3 features (browser type) │
|
||
│ │ 5. SessionAggregator │ → Session-level matrix │
|
||
│ │ 6. ExperimentLabelJoiner │ → Join with xp_human_only │
|
||
│ └─────────────────────────────────────┘ │
|
||
│ ↓ │
|
||
│ Feature Matrix: [sessionId, 29 features, 3 labels] │
|
||
│ │
|
||
└──────────────────────────────────────────────────────────────────────────┘
|
||
|
||
┌──────────────────────────────────────────────────────────────────────────┐
|
||
│ PHASE 2: SUPERVISED AGENT CLASSIFIER │
|
||
├──────────────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ Feature Matrix (29 features) │
|
||
│ ↓ │
|
||
│ ┌────────────────────┐ │
|
||
│ │ XGBoost Model │ │
|
||
│ ├────────────────────┤ │
|
||
│ │ Input: 29 dims │ │
|
||
│ │ Output: P(agent) │ │
|
||
│ │ Loss: BCE │ │
|
||
│ └────────────────────┘ │
|
||
│ ↓ │
|
||
│ Target: ROC-AUC > 0.90 │
|
||
│ │
|
||
│ DEPLOYMENT: │
|
||
│ - Real-time inference in Pricing Provider │
|
||
│ - Dynamic markup: P(agent) > 0.7 → 1.3x price │
|
||
│ - Retrain daily via Airflow │
|
||
│ │
|
||
└──────────────────────────────────────────────────────────────────────────┘
|
||
|
||
┌──────────────────────────────────────────────────────────────────────────┐
|
||
│ PHASE 3: MULTI-TASK LEARNING MODEL │
|
||
├──────────────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ Input: Session Features (29) + Product Features (10) + Current Price │
|
||
│ ↓ │
|
||
│ ┌───────────────────────────────────────────────────────────┐ │
|
||
│ │ MULTI-TASK NEURAL NETWORK │ │
|
||
│ ├───────────────────────────────────────────────────────────┤ │
|
||
│ │ │ │
|
||
│ │ ┌──────────────────────┐ │ │
|
||
│ │ │ Session Encoder │ (Shared) │ │
|
||
│ │ │ [29] → [128] → [64] │ │ │
|
||
│ │ └──────────┬───────────┘ │ │
|
||
│ │ │ │ │
|
||
│ │ ├────────────┬───────────────┐ │ │
|
||
│ │ ↓ ↓ ↓ │ │
|
||
│ │ ┌─────────┐ ┌─────────┐ ┌─────────────┐ │ │
|
||
│ │ │ Task A │ │ Product │ │ Task B │ │ │
|
||
│ │ │ Agent │ │ Encoder │ │ Purchase │ │ │
|
||
│ │ │ Head │ │ [10]→16 │ │ Prob Head │ │ │
|
||
│ │ └────┬────┘ └────┬────┘ └──────┬──────┘ │ │
|
||
│ │ ↓ └────┬────────────┘ │ │
|
||
│ │ P(agent) ↓ │ │
|
||
│ │ P(purchase|price) │ │
|
||
│ │ │ │
|
||
│ │ Loss = α·BCE(agent) + β·BCE(purchase) │ │
|
||
│ │ α=1.0, β=2.0 (tune these weights) │ │
|
||
│ └───────────────────────────────────────────────────────────┘ │
|
||
│ ↓ │
|
||
│ OUTPUTS: │
|
||
│ 1. Agent probability (like Phase 2) │
|
||
│ 2. Purchase probability given price │
|
||
│ 3. Session embedding (for knowledge distillation) │
|
||
│ │
|
||
│ USE CASE: │
|
||
│ Optimal Price = argmax_p [ p · P(purchase|p) · (1 + λ·P(agent)) ] │
|
||
│ │
|
||
└──────────────────────────────────────────────────────────────────────────┘
|
||
|
||
┌──────────────────────────────────────────────────────────────────────────┐
|
||
│ KNOWLEDGE DISTILLATION BRANCH │
|
||
├──────────────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ Multi-Task Model (teacher) │
|
||
│ ↓ │
|
||
│ Generate predictions on validation set │
|
||
│ ↓ │
|
||
│ ┌──────────────────────────────────────┐ │
|
||
│ │ Distill to Decision Tree (student) │ │
|
||
│ ├──────────────────────────────────────┤ │
|
||
│ │ Input: 29 session features │ │
|
||
│ │ Output: Optimal markup multiplier │ │
|
||
│ │ Max depth: 5 (interpretable) │ │
|
||
│ └──────────────────────────────────────┘ │
|
||
│ ↓ │
|
||
│ Extract Human-Readable Rules: │
|
||
│ │
|
||
│ IF interaction_velocity > 10 AND cart_to_view_ratio < 0.1: │
|
||
│ markup = 1.3 (likely agent reconnaissance) │
|
||
│ ELIF unique_products_viewed < 3 AND session_duration > 300: │
|
||
│ markup = 0.9 (engaged human, offer discount) │
|
||
│ ELSE: │
|
||
│ markup = 1.0 (baseline) │
|
||
│ │
|
||
│ Also: SHAP values for feature importance analysis │
|
||
│ │
|
||
└──────────────────────────────────────────────────────────────────────────┘
|
||
|
||
┌──────────────────────────────────────────────────────────────────────────┐
|
||
│ PHASE 4: SYNTHETIC DYNAMIC PRICING SIMULATOR │
|
||
├──────────────────────────────────────────────────────────────────────────┤
|
||
│ │
|
||
│ PURPOSE: Fast experimentation without real users │
|
||
│ │
|
||
│ ┌────────────────────────────────────────────────────┐ │
|
||
│ │ DynamicPricingEnv (Gymnasium) │ │
|
||
│ ├────────────────────────────────────────────────────┤ │
|
||
│ │ │ │
|
||
│ │ State: [demand, inventory, hour, agent_frac, │ │
|
||
│ │ avg_velocity] │ │
|
||
│ │ │ │
|
||
│ │ Action: price_multiplier ∈ [0.7, 1.5] │ │
|
||
│ │ │ │
|
||
│ │ Dynamics: │ │
|
||
│ │ - Simulate user arrivals (Poisson) │ │
|
||
│ │ - Split into humans (30%) vs agents (70%) │ │
|
||
│ │ - Purchase probability: │ │
|
||
│ │ P_human(buy) = logistic(price, sensitivity=2) │ │
|
||
│ │ P_agent(buy) = logistic(price, sensitivity=5) │ │
|
||
│ │ │ │
|
||
│ │ Reward: revenue - 0.5 * margin_leakage │ │
|
||
│ │ where margin_leakage = (oracle_price - │ │
|
||
│ │ actual_price) × │ │
|
||
│ │ agent_purchases │ │
|
||
│ └────────────────────────────────────────────────────┘ │
|
||
│ ↓ │
|
||
│ ┌────────────────────────────────────────┐ │
|
||
│ │ Train RL Agent (PPO) │ │
|
||
│ ├────────────────────────────────────────┤ │
|
||
│ │ Learn policy: State → Optimal Price │ │
|
||
│ │ 100k timesteps training │ │
|
||
│ └────────────────────────────────────────┘ │
|
||
│ ↓ │
|
||
│ BENCHMARK vs Baselines: │
|
||
│ - Fixed pricing: 1.0x always │
|
||
│ - Simple surge: 1.2x if demand > 10, else 0.9x │
|
||
│ - Elasticity-based: formula │
|
||
│ - RL policy: learned │
|
||
│ - Multi-task + RL: Use MT model predictions as state features │
|
||
│ │
|
||
│ VALIDATION: │
|
||
│ - Calibrate simulator from historical data │
|
||
│ - Run counterfactuals ("what if agent_frac=0.8?") │
|
||
│ - A/B test winner on real traffic │
|
||
│ │
|
||
└──────────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
## Data Flow (Production)
|
||
|
||
```
|
||
┌─────────────┐
|
||
│ Browser │
|
||
│ (User/Agent)│
|
||
└──────┬──────┘
|
||
│ POST /api/ingest (events + experimentId)
|
||
↓
|
||
┌──────────────┐
|
||
│ Next.js API │
|
||
└──────┬───────┘
|
||
│ Forward events
|
||
↓
|
||
┌──────────────┐
|
||
│ FastAPI │
|
||
│ /api/kafka │
|
||
│ /ingest │
|
||
└──────┬───────┘
|
||
│ Publish
|
||
↓
|
||
┌─────────────────────────┐
|
||
│ Kafka │
|
||
│ Topic: user-interactions│
|
||
└──────┬──────────────────┘
|
||
│
|
||
├──────────────────┬──────────────────┐
|
||
↓ ↓ ↓
|
||
┌──────────────┐ ┌──────────────┐ ┌──────────────────┐
|
||
│ Airflow │ │ Real-Time │ │ Kafka Streams │
|
||
│ (Batch) │ │ Inference │ │ (Feature Cache) │
|
||
│ │ │ │ │ │
|
||
│ Daily: │ │ On Price │ │ Rolling window │
|
||
│ - Retrain │ │ Request: │ │ compute session │
|
||
│ classifier │ │ - Get session│ │ features, push │
|
||
│ - Retrain MT │ │ features │ │ to Redis │
|
||
│ model │ │ - Predict │ │ │
|
||
│ - Publish to │ │ P(agent) │ │ TTL: 1 hour │
|
||
│ registry │ │ - Predict │ │ │
|
||
│ │ │ P(purchase)│ │ │
|
||
│ │ │ - Compute │ │ │
|
||
│ │ │ optimal_p │ │ │
|
||
└──────┬───────┘ └──────┬───────┘ └────────┬─────────┘
|
||
│ │ │
|
||
↓ ↓ ↓
|
||
┌──────────────────────────────────────────────┐
|
||
│ Redis (Model Registry) │
|
||
├──────────────────────────────────────────────┤
|
||
│ Keys: │
|
||
│ - classifier:agent_detector:latest (pickle) │
|
||
│ - multitask_model:latest (state_dict) │
|
||
│ - session_features:{sessionId} (json, TTL) │
|
||
│ - prices:latest (DataFrame) │
|
||
│ - elasticity:latest (DataFrame) │
|
||
└──────────────────┬───────────────────────────┘
|
||
│
|
||
↓
|
||
┌─────────────────────┐
|
||
│ Pricing Provider │
|
||
│ /api/{mode}/price/ │
|
||
│ {productId} │
|
||
│ │
|
||
│ GET sessionId │
|
||
│ → Load features │
|
||
│ → Load models │
|
||
│ → Predict │
|
||
│ → Return price │
|
||
└─────────┬───────────┘
|
||
│
|
||
↓
|
||
┌─────────────────────┐
|
||
│ Frontend │
|
||
│ (Display price) │
|
||
└─────────────────────┘
|
||
```
|
||
|
||
## Key Metrics
|
||
|
||
### Model Performance
|
||
| Metric | Target | Current | Phase |
|
||
|--------|--------|---------|-------|
|
||
| Agent Classifier ROC-AUC | >0.90 | N/A (rule-based) | Phase 2 |
|
||
| Purchase Predictor ROC-AUC | >0.75 | N/A | Phase 3 |
|
||
| Pricing Latency (p99) | <100ms | ~50ms | All |
|
||
| Retraining Frequency | Daily | Every 15min (rules) | Phase 2+ |
|
||
|
||
### Business Impact
|
||
| Metric | Target | Current | Phase |
|
||
|--------|--------|---------|-------|
|
||
| Margin Leakage Reduction | -30% | Baseline | Phase 2-4 |
|
||
| Human Conversion Rate | No change | Baseline | All |
|
||
| Agent Detection Rate | >85% precision | ~60% (velocity) | Phase 2 |
|
||
| Revenue Uplift | +10% | Baseline | Phase 3-4 |
|
||
|
||
## File Structure (New)
|
||
|
||
```
|
||
experiments/
|
||
ml/
|
||
__init__.py
|
||
|
||
# Phase 1: Features
|
||
features/
|
||
__init__.py
|
||
temporal.py # TemporalFeatureExtractor
|
||
behavioral.py # BehavioralFeatureExtractor
|
||
product.py # ProductFeatureExtractor
|
||
useragent.py # UserAgentParser
|
||
aggregator.py # SessionAggregator
|
||
|
||
pipeline.py # build_feature_pipeline()
|
||
datasets.py # load_events_from_kafka(), etc.
|
||
|
||
# Phase 2: Classifier
|
||
train_classifier.py # XGBoost training script
|
||
|
||
# Phase 3: Multi-Task
|
||
models/
|
||
__init__.py
|
||
multitask.py # MultiTaskPricingModel (PyTorch)
|
||
|
||
train_multitask.py # Multi-task training script
|
||
distill.py # Knowledge distillation
|
||
|
||
# Phase 4: Simulator
|
||
simulator/
|
||
__init__.py
|
||
env.py # DynamicPricingEnv (Gymnasium)
|
||
agents.py # HumanUser, AgentUser
|
||
train_rl.py # PPO training
|
||
|
||
# Inference
|
||
inference/
|
||
__init__.py
|
||
pricing_service.py # gRPC service (optional)
|
||
feature_cache.py # Redis feature store client
|
||
|
||
# Notebooks
|
||
notebooks/
|
||
01_eda.ipynb
|
||
02_feature_analysis.ipynb
|
||
03_model_evaluation.ipynb
|
||
04_simulator_calibration.ipynb
|
||
```
|
||
|
||
## Critical Code Changes
|
||
|
||
### 1. Replace Messy SessionState
|
||
**Before:** `experiments/procesing/steps/session.py` (O(n²) loops)
|
||
**After:** `experiments/ml/pipeline.py` (vectorized pipeline)
|
||
|
||
### 2. Upgrade Pricing Provider
|
||
**Before:** Simple velocity threshold
|
||
**After:** ML model inference with agent probability
|
||
|
||
### 3. Add Real-Time Feature Store
|
||
**Before:** No feature caching
|
||
**After:** Kafka Streams → Redis (session features)
|
||
|
||
### 4. Airflow DAG Upgrades
|
||
**Before:** `surge_pricing_pipeline` (rule-based)
|
||
**After:** Add `agent_classifier_training_pipeline` (daily retrain)
|
||
|
||
## Next Actions (Start Here)
|
||
|
||
1. ✅ **Read gameplan**: See `/home/user/PHANTOM/docs/GAMEPLAN_MULTITASK_PRICING.md`
|
||
|
||
2. **Create directory structure**:
|
||
```bash
|
||
mkdir -p experiments/ml/{features,models,simulator,inference,notebooks}
|
||
```
|
||
|
||
3. **Pull sample data**:
|
||
```python
|
||
# experiments/ml/notebooks/01_eda.ipynb
|
||
from kafka import KafkaConsumer
|
||
# Pull 1 week of events, join with experiments table
|
||
# Analyze label distribution, feature correlations
|
||
```
|
||
|
||
4. **Prototype first feature extractor**:
|
||
```python
|
||
# experiments/ml/features/temporal.py
|
||
# Start with TemporalFeatureExtractor
|
||
# Test on 10k events, validate output schema
|
||
```
|
||
|
||
5. **Review with team**: Discuss tradeoffs, priorities, timeline
|
||
|
||
## Questions to Resolve
|
||
|
||
1. **Label Quality**: How confident are we in `xp_human_only` labels? Should we add manual verification?
|
||
|
||
2. **Compute Budget**: Do we have GPU access for PyTorch training? (Phase 3)
|
||
|
||
3. **Latency Requirements**: Is 100ms p99 acceptable for pricing API?
|
||
|
||
4. **A/B Testing**: Do we have infrastructure for traffic splitting? (Deployment)
|
||
|
||
5. **Monitoring**: Who owns the Grafana dashboards? What alerting thresholds?
|
||
|
||
---
|
||
|
||
**For detailed implementation, see:** `/home/user/PHANTOM/docs/GAMEPLAN_MULTITASK_PRICING.md`
|