Created detailed documentation for implementing multi-task learning system to improve agent detection and dynamic pricing: - GAMEPLAN_MULTITASK_PRICING.md: Complete 50+ page technical specification including feature engineering, supervised learning, multi-task neural networks, synthetic simulator, and knowledge distillation approach - ARCHITECTURE_OVERVIEW.md: Quick reference with visual diagrams comparing current rule-based system to proposed ML architecture, metrics, and implementation phases Key improvements proposed: - Replace O(n²) SessionState pipeline with vectorized feature extraction - Train XGBoost classifier on experimentId labels (ROC-AUC >0.90 target) - Multi-task neural network for joint agent detection + purchase prediction - Gymnasium-based synthetic pricing environment for safe experimentation - Knowledge distillation to extract interpretable pricing heuristics Addresses margin leakage concerns with learned pricing strategies instead of simple velocity thresholds.
27 KiB
Multi-Task Learning Architecture - Quick Reference
Current System (Baseline)
┌─────────────────────────────────────────────────────────────────┐
│ CURRENT STATE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Browser Events → Next.js → FastAPI → Kafka (user-interactions) │
│ ↓ │
│ Airflow (every 15min) │
│ ↓ │
│ [Messy SessionState Pipeline] │
│ ↓ │
│ Simple Rule-Based Pricing: │
│ - Surge (if demand > 10) │
│ - Elasticity formula │
│ - Velocity threshold for agents │
│ ↓ │
│ Redis (prices) │
│ ↓ │
│ Pricing Provider API │
│ │
│ ISSUES: │
│ ✗ O(n²) feature extraction │
│ ✗ No supervised ML for agent detection │
│ ✗ Simple heuristics (velocity > 5 → agent) │
│ ✗ No learning from data │
│ ✗ Margin leakage not effectively addressed │
└─────────────────────────────────────────────────────────────────┘
Proposed System (Multi-Task Learning)
┌──────────────────────────────────────────────────────────────────────────┐
│ PHASE 1: DATA PIPELINE │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ Kafka (user-interactions) │
│ ↓ │
│ ┌─────────────────────────────────────┐ │
│ │ VECTORIZED FEATURE PIPELINE │ │
│ ├─────────────────────────────────────┤ │
│ │ 1. TemporalFeatureExtractor │ → 8 features (velocity, etc.) │
│ │ 2. BehavioralFeatureExtractor │ → 10 features (carts, hovers) │
│ │ 3. ProductFeatureExtractor │ → 8 features (prices, depth) │
│ │ 4. UserAgentParser │ → 3 features (browser type) │
│ │ 5. SessionAggregator │ → Session-level matrix │
│ │ 6. ExperimentLabelJoiner │ → Join with xp_human_only │
│ └─────────────────────────────────────┘ │
│ ↓ │
│ Feature Matrix: [sessionId, 29 features, 3 labels] │
│ │
└──────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────┐
│ PHASE 2: SUPERVISED AGENT CLASSIFIER │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ Feature Matrix (29 features) │
│ ↓ │
│ ┌────────────────────┐ │
│ │ XGBoost Model │ │
│ ├────────────────────┤ │
│ │ Input: 29 dims │ │
│ │ Output: P(agent) │ │
│ │ Loss: BCE │ │
│ └────────────────────┘ │
│ ↓ │
│ Target: ROC-AUC > 0.90 │
│ │
│ DEPLOYMENT: │
│ - Real-time inference in Pricing Provider │
│ - Dynamic markup: P(agent) > 0.7 → 1.3x price │
│ - Retrain daily via Airflow │
│ │
└──────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────┐
│ PHASE 3: MULTI-TASK LEARNING MODEL │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ Input: Session Features (29) + Product Features (10) + Current Price │
│ ↓ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ MULTI-TASK NEURAL NETWORK │ │
│ ├───────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ ┌──────────────────────┐ │ │
│ │ │ Session Encoder │ (Shared) │ │
│ │ │ [29] → [128] → [64] │ │ │
│ │ └──────────┬───────────┘ │ │
│ │ │ │ │
│ │ ├────────────┬───────────────┐ │ │
│ │ ↓ ↓ ↓ │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────────┐ │ │
│ │ │ Task A │ │ Product │ │ Task B │ │ │
│ │ │ Agent │ │ Encoder │ │ Purchase │ │ │
│ │ │ Head │ │ [10]→16 │ │ Prob Head │ │ │
│ │ └────┬────┘ └────┬────┘ └──────┬──────┘ │ │
│ │ ↓ └────┬────────────┘ │ │
│ │ P(agent) ↓ │ │
│ │ P(purchase|price) │ │
│ │ │ │
│ │ Loss = α·BCE(agent) + β·BCE(purchase) │ │
│ │ α=1.0, β=2.0 (tune these weights) │ │
│ └───────────────────────────────────────────────────────────┘ │
│ ↓ │
│ OUTPUTS: │
│ 1. Agent probability (like Phase 2) │
│ 2. Purchase probability given price │
│ 3. Session embedding (for knowledge distillation) │
│ │
│ USE CASE: │
│ Optimal Price = argmax_p [ p · P(purchase|p) · (1 + λ·P(agent)) ] │
│ │
└──────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────┐
│ KNOWLEDGE DISTILLATION BRANCH │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ Multi-Task Model (teacher) │
│ ↓ │
│ Generate predictions on validation set │
│ ↓ │
│ ┌──────────────────────────────────────┐ │
│ │ Distill to Decision Tree (student) │ │
│ ├──────────────────────────────────────┤ │
│ │ Input: 29 session features │ │
│ │ Output: Optimal markup multiplier │ │
│ │ Max depth: 5 (interpretable) │ │
│ └──────────────────────────────────────┘ │
│ ↓ │
│ Extract Human-Readable Rules: │
│ │
│ IF interaction_velocity > 10 AND cart_to_view_ratio < 0.1: │
│ markup = 1.3 (likely agent reconnaissance) │
│ ELIF unique_products_viewed < 3 AND session_duration > 300: │
│ markup = 0.9 (engaged human, offer discount) │
│ ELSE: │
│ markup = 1.0 (baseline) │
│ │
│ Also: SHAP values for feature importance analysis │
│ │
└──────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────┐
│ PHASE 4: SYNTHETIC DYNAMIC PRICING SIMULATOR │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ PURPOSE: Fast experimentation without real users │
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ DynamicPricingEnv (Gymnasium) │ │
│ ├────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ State: [demand, inventory, hour, agent_frac, │ │
│ │ avg_velocity] │ │
│ │ │ │
│ │ Action: price_multiplier ∈ [0.7, 1.5] │ │
│ │ │ │
│ │ Dynamics: │ │
│ │ - Simulate user arrivals (Poisson) │ │
│ │ - Split into humans (30%) vs agents (70%) │ │
│ │ - Purchase probability: │ │
│ │ P_human(buy) = logistic(price, sensitivity=2) │ │
│ │ P_agent(buy) = logistic(price, sensitivity=5) │ │
│ │ │ │
│ │ Reward: revenue - 0.5 * margin_leakage │ │
│ │ where margin_leakage = (oracle_price - │ │
│ │ actual_price) × │ │
│ │ agent_purchases │ │
│ └────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────┐ │
│ │ Train RL Agent (PPO) │ │
│ ├────────────────────────────────────────┤ │
│ │ Learn policy: State → Optimal Price │ │
│ │ 100k timesteps training │ │
│ └────────────────────────────────────────┘ │
│ ↓ │
│ BENCHMARK vs Baselines: │
│ - Fixed pricing: 1.0x always │
│ - Simple surge: 1.2x if demand > 10, else 0.9x │
│ - Elasticity-based: formula │
│ - RL policy: learned │
│ - Multi-task + RL: Use MT model predictions as state features │
│ │
│ VALIDATION: │
│ - Calibrate simulator from historical data │
│ - Run counterfactuals ("what if agent_frac=0.8?") │
│ - A/B test winner on real traffic │
│ │
└──────────────────────────────────────────────────────────────────────────┘
Data Flow (Production)
┌─────────────┐
│ Browser │
│ (User/Agent)│
└──────┬──────┘
│ POST /api/ingest (events + experimentId)
↓
┌──────────────┐
│ Next.js API │
└──────┬───────┘
│ Forward events
↓
┌──────────────┐
│ FastAPI │
│ /api/kafka │
│ /ingest │
└──────┬───────┘
│ Publish
↓
┌─────────────────────────┐
│ Kafka │
│ Topic: user-interactions│
└──────┬──────────────────┘
│
├──────────────────┬──────────────────┐
↓ ↓ ↓
┌──────────────┐ ┌──────────────┐ ┌──────────────────┐
│ Airflow │ │ Real-Time │ │ Kafka Streams │
│ (Batch) │ │ Inference │ │ (Feature Cache) │
│ │ │ │ │ │
│ Daily: │ │ On Price │ │ Rolling window │
│ - Retrain │ │ Request: │ │ compute session │
│ classifier │ │ - Get session│ │ features, push │
│ - Retrain MT │ │ features │ │ to Redis │
│ model │ │ - Predict │ │ │
│ - Publish to │ │ P(agent) │ │ TTL: 1 hour │
│ registry │ │ - Predict │ │ │
│ │ │ P(purchase)│ │ │
│ │ │ - Compute │ │ │
│ │ │ optimal_p │ │ │
└──────┬───────┘ └──────┬───────┘ └────────┬─────────┘
│ │ │
↓ ↓ ↓
┌──────────────────────────────────────────────┐
│ Redis (Model Registry) │
├──────────────────────────────────────────────┤
│ Keys: │
│ - classifier:agent_detector:latest (pickle) │
│ - multitask_model:latest (state_dict) │
│ - session_features:{sessionId} (json, TTL) │
│ - prices:latest (DataFrame) │
│ - elasticity:latest (DataFrame) │
└──────────────────┬───────────────────────────┘
│
↓
┌─────────────────────┐
│ Pricing Provider │
│ /api/{mode}/price/ │
│ {productId} │
│ │
│ GET sessionId │
│ → Load features │
│ → Load models │
│ → Predict │
│ → Return price │
└─────────┬───────────┘
│
↓
┌─────────────────────┐
│ Frontend │
│ (Display price) │
└─────────────────────┘
Key Metrics
Model Performance
| Metric | Target | Current | Phase |
|---|---|---|---|
| Agent Classifier ROC-AUC | >0.90 | N/A (rule-based) | Phase 2 |
| Purchase Predictor ROC-AUC | >0.75 | N/A | Phase 3 |
| Pricing Latency (p99) | <100ms | ~50ms | All |
| Retraining Frequency | Daily | Every 15min (rules) | Phase 2+ |
Business Impact
| Metric | Target | Current | Phase |
|---|---|---|---|
| Margin Leakage Reduction | -30% | Baseline | Phase 2-4 |
| Human Conversion Rate | No change | Baseline | All |
| Agent Detection Rate | >85% precision | ~60% (velocity) | Phase 2 |
| Revenue Uplift | +10% | Baseline | Phase 3-4 |
File Structure (New)
experiments/
ml/
__init__.py
# Phase 1: Features
features/
__init__.py
temporal.py # TemporalFeatureExtractor
behavioral.py # BehavioralFeatureExtractor
product.py # ProductFeatureExtractor
useragent.py # UserAgentParser
aggregator.py # SessionAggregator
pipeline.py # build_feature_pipeline()
datasets.py # load_events_from_kafka(), etc.
# Phase 2: Classifier
train_classifier.py # XGBoost training script
# Phase 3: Multi-Task
models/
__init__.py
multitask.py # MultiTaskPricingModel (PyTorch)
train_multitask.py # Multi-task training script
distill.py # Knowledge distillation
# Phase 4: Simulator
simulator/
__init__.py
env.py # DynamicPricingEnv (Gymnasium)
agents.py # HumanUser, AgentUser
train_rl.py # PPO training
# Inference
inference/
__init__.py
pricing_service.py # gRPC service (optional)
feature_cache.py # Redis feature store client
# Notebooks
notebooks/
01_eda.ipynb
02_feature_analysis.ipynb
03_model_evaluation.ipynb
04_simulator_calibration.ipynb
Critical Code Changes
1. Replace Messy SessionState
Before: experiments/procesing/steps/session.py (O(n²) loops)
After: experiments/ml/pipeline.py (vectorized pipeline)
2. Upgrade Pricing Provider
Before: Simple velocity threshold After: ML model inference with agent probability
3. Add Real-Time Feature Store
Before: No feature caching After: Kafka Streams → Redis (session features)
4. Airflow DAG Upgrades
Before: surge_pricing_pipeline (rule-based)
After: Add agent_classifier_training_pipeline (daily retrain)
Next Actions (Start Here)
-
✅ Read gameplan: See
/home/user/PHANTOM/docs/GAMEPLAN_MULTITASK_PRICING.md -
Create directory structure:
mkdir -p experiments/ml/{features,models,simulator,inference,notebooks} -
Pull sample data:
# experiments/ml/notebooks/01_eda.ipynb from kafka import KafkaConsumer # Pull 1 week of events, join with experiments table # Analyze label distribution, feature correlations -
Prototype first feature extractor:
# experiments/ml/features/temporal.py # Start with TemporalFeatureExtractor # Test on 10k events, validate output schema -
Review with team: Discuss tradeoffs, priorities, timeline
Questions to Resolve
-
Label Quality: How confident are we in
xp_human_onlylabels? Should we add manual verification? -
Compute Budget: Do we have GPU access for PyTorch training? (Phase 3)
-
Latency Requirements: Is 100ms p99 acceptable for pricing API?
-
A/B Testing: Do we have infrastructure for traffic splitting? (Deployment)
-
Monitoring: Who owns the Grafana dashboards? What alerting thresholds?
For detailed implementation, see: /home/user/PHANTOM/docs/GAMEPLAN_MULTITASK_PRICING.md