docs: Add comprehensive multi-task learning architecture and gameplan

Created detailed documentation for implementing multi-task learning system to improve agent detection and dynamic pricing: - GAMEPLAN_MULTITASK_PRICING.md: Complete 50+ page technical specification including feature engineering, supervised learning, multi-task neural networks, synthetic simulator, and knowledge distillation approach - ARCHITECTURE_OVERVIEW.md: Quick reference with visual diagrams comparing current rule-based system to proposed ML architecture, metrics, and implementation phases Key improvements proposed: - Replace O(n²) SessionState pipeline with vectorized feature extraction - Train XGBoost classifier on experimentId labels (ROC-AUC >0.90 target) - Multi-task neural network for joint agent detection + purchase prediction - Gymnasium-based synthetic pricing environment for safe experimentation - Knowledge distillation to extract interpretable pricing heuristics Addresses margin leakage concerns with learned pricing strategies instead of simple velocity thresholds.
2026-07-15 17:43:36 +00:00 · 2025-12-11 09:51:41 +00:00
parent d45b344264
commit aab54ea7c0
2 changed files with 1999 additions and 0 deletions
--- a/docs/ARCHITECTURE_OVERVIEW.md
+++ b/docs/ARCHITECTURE_OVERVIEW.md
@@ -0,0 +1,403 @@
 # Multi-Task Learning Architecture - Quick Reference
 ## Current System (Baseline)
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │                        CURRENT STATE                             │
 ├─────────────────────────────────────────────────────────────────┤
 │                                                                   │
 │  Browser Events → Next.js → FastAPI → Kafka (user-interactions)  │
 │                                           ↓                       │
 │                                      Airflow (every 15min)       │
 │                                           ↓                       │
 │                              [Messy SessionState Pipeline]       │
 │                                           ↓                       │
 │                        Simple Rule-Based Pricing:                │
 │                        - Surge (if demand > 10)                  │
 │                        - Elasticity formula                      │
 │                        - Velocity threshold for agents           │
 │                                           ↓                       │
 │                                    Redis (prices)                │
 │                                           ↓                       │
 │                                  Pricing Provider API            │
 │                                                                   │
 │  ISSUES:                                                          │
 │  ✗ O(n²) feature extraction                                      │
 │  ✗ No supervised ML for agent detection                          │
 │  ✗ Simple heuristics (velocity > 5 → agent)                      │
 │  ✗ No learning from data                                         │
 │  ✗ Margin leakage not effectively addressed                      │
 └─────────────────────────────────────────────────────────────────┘
 ```
 ## Proposed System (Multi-Task Learning)
 ```
 ┌──────────────────────────────────────────────────────────────────────────┐
 │                           PHASE 1: DATA PIPELINE                          │
 ├──────────────────────────────────────────────────────────────────────────┤
 │                                                                            │
 │  Kafka (user-interactions)                                                │
 │         ↓                                                                 │
 │  ┌─────────────────────────────────────┐                                 │
 │  │  VECTORIZED FEATURE PIPELINE        │                                 │
 │  ├─────────────────────────────────────┤                                 │
 │  │  1. TemporalFeatureExtractor        │  → 8 features (velocity, etc.)  │
 │  │  2. BehavioralFeatureExtractor      │  → 10 features (carts, hovers)  │
 │  │  3. ProductFeatureExtractor         │  → 8 features (prices, depth)   │
 │  │  4. UserAgentParser                 │  → 3 features (browser type)    │
 │  │  5. SessionAggregator               │  → Session-level matrix         │
 │  │  6. ExperimentLabelJoiner           │  → Join with xp_human_only      │
 │  └─────────────────────────────────────┘                                 │
 │         ↓                                                                 │
 │  Feature Matrix: [sessionId, 29 features, 3 labels]                      │
 │                                                                            │
 └──────────────────────────────────────────────────────────────────────────┘
 ┌──────────────────────────────────────────────────────────────────────────┐
 │                    PHASE 2: SUPERVISED AGENT CLASSIFIER                   │
 ├──────────────────────────────────────────────────────────────────────────┤
 │                                                                            │
 │  Feature Matrix (29 features)                                             │
 │         ↓                                                                 │
 │  ┌────────────────────┐                                                   │
 │  │   XGBoost Model    │                                                   │
 │  ├────────────────────┤                                                   │
 │  │  Input: 29 dims    │                                                   │
 │  │  Output: P(agent)  │                                                   │
 │  │  Loss: BCE         │                                                   │
 │  └────────────────────┘                                                   │
 │         ↓                                                                 │
 │  Target: ROC-AUC > 0.90                                                   │
 │                                                                            │
 │  DEPLOYMENT:                                                              │
 │  - Real-time inference in Pricing Provider                                │
 │  - Dynamic markup: P(agent) > 0.7 → 1.3x price                           │
 │  - Retrain daily via Airflow                                              │
 │                                                                            │
 └──────────────────────────────────────────────────────────────────────────┘
 ┌──────────────────────────────────────────────────────────────────────────┐
 │                  PHASE 3: MULTI-TASK LEARNING MODEL                       │
 ├──────────────────────────────────────────────────────────────────────────┤
 │                                                                            │
 │  Input: Session Features (29) + Product Features (10) + Current Price    │
 │         ↓                                                                 │
 │  ┌───────────────────────────────────────────────────────────┐           │
 │  │              MULTI-TASK NEURAL NETWORK                    │           │
 │  ├───────────────────────────────────────────────────────────┤           │
 │  │                                                             │           │
 │  │   ┌──────────────────────┐                                │           │
 │  │   │  Session Encoder     │  (Shared)                      │           │
 │  │   │  [29] → [128] → [64] │                                │           │
 │  │   └──────────┬───────────┘                                │           │
 │  │              │                                             │           │
 │  │              ├────────────┬───────────────┐                │           │
 │  │              ↓            ↓               ↓                │           │
 │  │       ┌─────────┐   ┌─────────┐   ┌─────────────┐        │           │
 │  │       │ Task A  │   │ Product │   │   Task B    │        │           │
 │  │       │ Agent   │   │ Encoder │   │  Purchase   │        │           │
 │  │       │ Head    │   │ [10]→16 │   │  Prob Head  │        │           │
 │  │       └────┬────┘   └────┬────┘   └──────┬──────┘        │           │
 │  │            ↓             └────┬────────────┘               │           │
 │  │       P(agent)               ↓                            │           │
 │  │                          P(purchase|price)                 │           │
 │  │                                                             │           │
 │  │   Loss = α·BCE(agent) + β·BCE(purchase)                   │           │
 │  │   α=1.0, β=2.0 (tune these weights)                       │           │
 │  └───────────────────────────────────────────────────────────┘           │
 │         ↓                                                                 │
 │  OUTPUTS:                                                                 │
 │  1. Agent probability (like Phase 2)                                     │
 │  2. Purchase probability given price                                      │
 │  3. Session embedding (for knowledge distillation)                        │
 │                                                                            │
 │  USE CASE:                                                                │
 │  Optimal Price = argmax_p [ p · P(purchase|p) · (1 + λ·P(agent)) ]      │
 │                                                                            │
 └──────────────────────────────────────────────────────────────────────────┘
 ┌──────────────────────────────────────────────────────────────────────────┐
 │                     KNOWLEDGE DISTILLATION BRANCH                         │
 ├──────────────────────────────────────────────────────────────────────────┤
 │                                                                            │
 │  Multi-Task Model (teacher)                                               │
 │         ↓                                                                 │
 │  Generate predictions on validation set                                   │
 │         ↓                                                                 │
 │  ┌──────────────────────────────────────┐                                │
 │  │  Distill to Decision Tree (student)  │                                │
 │  ├──────────────────────────────────────┤                                │
 │  │  Input: 29 session features          │                                │
 │  │  Output: Optimal markup multiplier   │                                │
 │  │  Max depth: 5 (interpretable)        │                                │
 │  └──────────────────────────────────────┘                                │
 │         ↓                                                                 │
 │  Extract Human-Readable Rules:                                            │
 │                                                                            │
 │  IF interaction_velocity > 10 AND cart_to_view_ratio < 0.1:              │
 │      markup = 1.3  (likely agent reconnaissance)                          │
 │  ELIF unique_products_viewed < 3 AND session_duration > 300:             │
 │      markup = 0.9  (engaged human, offer discount)                       │
 │  ELSE:                                                                    │
 │      markup = 1.0  (baseline)                                             │
 │                                                                            │
 │  Also: SHAP values for feature importance analysis                        │
 │                                                                            │
 └──────────────────────────────────────────────────────────────────────────┘
 ┌──────────────────────────────────────────────────────────────────────────┐
 │              PHASE 4: SYNTHETIC DYNAMIC PRICING SIMULATOR                 │
 ├──────────────────────────────────────────────────────────────────────────┤
 │                                                                            │
 │  PURPOSE: Fast experimentation without real users                         │
 │                                                                            │
 │  ┌────────────────────────────────────────────────────┐                  │
 │  │         DynamicPricingEnv (Gymnasium)              │                  │
 │  ├────────────────────────────────────────────────────┤                  │
 │  │                                                      │                  │
 │  │  State:   [demand, inventory, hour, agent_frac,    │                  │
 │  │            avg_velocity]                            │                  │
 │  │                                                      │                  │
 │  │  Action:  price_multiplier ∈ [0.7, 1.5]           │                  │
 │  │                                                      │                  │
 │  │  Dynamics:                                          │                  │
 │  │  - Simulate user arrivals (Poisson)                │                  │
 │  │  - Split into humans (30%) vs agents (70%)         │                  │
 │  │  - Purchase probability:                            │                  │
 │  │      P_human(buy) = logistic(price, sensitivity=2) │                  │
 │  │      P_agent(buy) = logistic(price, sensitivity=5) │                  │
 │  │                                                      │                  │
 │  │  Reward:  revenue - 0.5 * margin_leakage           │                  │
 │  │           where margin_leakage = (oracle_price -   │                  │
 │  │                                   actual_price) ×   │                  │
 │  │                                   agent_purchases   │                  │
 │  └────────────────────────────────────────────────────┘                  │
 │         ↓                                                                 │
 │  ┌────────────────────────────────────────┐                              │
 │  │  Train RL Agent (PPO)                  │                              │
 │  ├────────────────────────────────────────┤                              │
 │  │  Learn policy: State → Optimal Price   │                              │
 │  │  100k timesteps training                │                              │
 │  └────────────────────────────────────────┘                              │
 │         ↓                                                                 │
 │  BENCHMARK vs Baselines:                                                  │
 │  - Fixed pricing: 1.0x always                                             │
 │  - Simple surge: 1.2x if demand > 10, else 0.9x                         │
 │  - Elasticity-based: formula                                              │
 │  - RL policy: learned                                                     │
 │  - Multi-task + RL: Use MT model predictions as state features           │
 │                                                                            │
 │  VALIDATION:                                                              │
 │  - Calibrate simulator from historical data                               │
 │  - Run counterfactuals ("what if agent_frac=0.8?")                       │
 │  - A/B test winner on real traffic                                        │
 │                                                                            │
 └──────────────────────────────────────────────────────────────────────────┘
 ```
 ## Data Flow (Production)
 ```
 ┌─────────────┐
 │   Browser   │
 │ (User/Agent)│
 └──────┬──────┘
       │ POST /api/ingest (events + experimentId)
       ↓
 ┌──────────────┐
 │  Next.js API │
 └──────┬───────┘
       │ Forward events
       ↓
 ┌──────────────┐
 │ FastAPI      │
 │ /api/kafka   │
 │ /ingest      │
 └──────┬───────┘
       │ Publish
       ↓
 ┌─────────────────────────┐
 │ Kafka                   │
 │ Topic: user-interactions│
 └──────┬──────────────────┘
       │
       ├──────────────────┬──────────────────┐
       ↓                  ↓                  ↓
 ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐
 │ Airflow      │  │ Real-Time    │  │ Kafka Streams    │
 │ (Batch)      │  │ Inference    │  │ (Feature Cache)  │
 │              │  │              │  │                  │
 │ Daily:       │  │ On Price     │  │ Rolling window   │
 │ - Retrain    │  │ Request:     │  │ compute session  │
 │   classifier │  │ - Get session│  │ features, push   │
 │ - Retrain MT │  │   features   │  │ to Redis         │
 │   model      │  │ - Predict    │  │                  │
 │ - Publish to │  │   P(agent)   │  │ TTL: 1 hour      │
 │   registry   │  │ - Predict    │  │                  │
 │              │  │   P(purchase)│  │                  │
 │              │  │ - Compute    │  │                  │
 │              │  │   optimal_p  │  │                  │
 └──────┬───────┘  └──────┬───────┘  └────────┬─────────┘
       │                 │                   │
       ↓                 ↓                   ↓
 ┌──────────────────────────────────────────────┐
 │          Redis (Model Registry)              │
 ├──────────────────────────────────────────────┤
 │ Keys:                                         │
 │ - classifier:agent_detector:latest (pickle)  │
 │ - multitask_model:latest (state_dict)        │
 │ - session_features:{sessionId} (json, TTL)   │
 │ - prices:latest (DataFrame)                  │
 │ - elasticity:latest (DataFrame)              │
 └──────────────────┬───────────────────────────┘
                   │
                   ↓
         ┌─────────────────────┐
         │ Pricing Provider    │
         │ /api/{mode}/price/  │
         │ {productId}         │
         │                     │
         │ GET sessionId       │
         │ → Load features     │
         │ → Load models       │
         │ → Predict           │
         │ → Return price      │
         └─────────┬───────────┘
                   │
                   ↓
         ┌─────────────────────┐
         │   Frontend          │
         │   (Display price)   │
         └─────────────────────┘
 ```
 ## Key Metrics
 ### Model Performance
 | Metric | Target | Current | Phase |
 |--------|--------|---------|-------|
 | Agent Classifier ROC-AUC | >0.90 | N/A (rule-based) | Phase 2 |
 | Purchase Predictor ROC-AUC | >0.75 | N/A | Phase 3 |
 | Pricing Latency (p99) | <100ms | ~50ms | All |
 | Retraining Frequency | Daily | Every 15min (rules) | Phase 2+ |
 ### Business Impact
 | Metric | Target | Current | Phase |
 |--------|--------|---------|-------|
 | Margin Leakage Reduction | -30% | Baseline | Phase 2-4 |
 | Human Conversion Rate | No change | Baseline | All |
 | Agent Detection Rate | >85% precision | ~60% (velocity) | Phase 2 |
 | Revenue Uplift | +10% | Baseline | Phase 3-4 |
 ## File Structure (New)
 ```
 experiments/
  ml/
    __init__.py
    # Phase 1: Features
    features/
      __init__.py
      temporal.py           # TemporalFeatureExtractor
      behavioral.py         # BehavioralFeatureExtractor
      product.py            # ProductFeatureExtractor
      useragent.py          # UserAgentParser
      aggregator.py         # SessionAggregator
    pipeline.py             # build_feature_pipeline()
    datasets.py             # load_events_from_kafka(), etc.
    # Phase 2: Classifier
    train_classifier.py     # XGBoost training script
    # Phase 3: Multi-Task
    models/
      __init__.py
      multitask.py          # MultiTaskPricingModel (PyTorch)
    train_multitask.py      # Multi-task training script
    distill.py              # Knowledge distillation
    # Phase 4: Simulator
    simulator/
      __init__.py
      env.py                # DynamicPricingEnv (Gymnasium)
      agents.py             # HumanUser, AgentUser
      train_rl.py           # PPO training
    # Inference
    inference/
      __init__.py
      pricing_service.py    # gRPC service (optional)
      feature_cache.py      # Redis feature store client
    # Notebooks
    notebooks/
      01_eda.ipynb
      02_feature_analysis.ipynb
      03_model_evaluation.ipynb
      04_simulator_calibration.ipynb
 ```
 ## Critical Code Changes
 ### 1. Replace Messy SessionState
 **Before:** `experiments/procesing/steps/session.py` (O(n²) loops)
 **After:** `experiments/ml/pipeline.py` (vectorized pipeline)
 ### 2. Upgrade Pricing Provider
 **Before:** Simple velocity threshold
 **After:** ML model inference with agent probability
 ### 3. Add Real-Time Feature Store
 **Before:** No feature caching
 **After:** Kafka Streams → Redis (session features)
 ### 4. Airflow DAG Upgrades
 **Before:** `surge_pricing_pipeline` (rule-based)
 **After:** Add `agent_classifier_training_pipeline` (daily retrain)
 ## Next Actions (Start Here)
 1. ✅ **Read gameplan**: See `/home/user/PHANTOM/docs/GAMEPLAN_MULTITASK_PRICING.md`
 2. **Create directory structure**:
   ```bash
   mkdir -p experiments/ml/{features,models,simulator,inference,notebooks}
   ```
 3. **Pull sample data**:
   ```python
   # experiments/ml/notebooks/01_eda.ipynb
   from kafka import KafkaConsumer
   # Pull 1 week of events, join with experiments table
   # Analyze label distribution, feature correlations
   ```
 4. **Prototype first feature extractor**:
   ```python
   # experiments/ml/features/temporal.py
   # Start with TemporalFeatureExtractor
   # Test on 10k events, validate output schema
   ```
 5. **Review with team**: Discuss tradeoffs, priorities, timeline
 ## Questions to Resolve
 1. **Label Quality**: How confident are we in `xp_human_only` labels? Should we add manual verification?
 2. **Compute Budget**: Do we have GPU access for PyTorch training? (Phase 3)
 3. **Latency Requirements**: Is 100ms p99 acceptable for pricing API?
 4. **A/B Testing**: Do we have infrastructure for traffic splitting? (Deployment)
 5. **Monitoring**: Who owns the Grafana dashboards? What alerting thresholds?
 ---
 **For detailed implementation, see:** `/home/user/PHANTOM/docs/GAMEPLAN_MULTITASK_PRICING.md`
--- a/docs/GAMEPLAN_MULTITASK_PRICING.md
+++ b/docs/GAMEPLAN_MULTITASK_PRICING.md