Files
PHANTOM/docs/ARCHITECTURE_OVERVIEW.md
Claude aab54ea7c0 docs: Add comprehensive multi-task learning architecture and gameplan
Created detailed documentation for implementing multi-task learning system
to improve agent detection and dynamic pricing:

- GAMEPLAN_MULTITASK_PRICING.md: Complete 50+ page technical specification
  including feature engineering, supervised learning, multi-task neural
  networks, synthetic simulator, and knowledge distillation approach

- ARCHITECTURE_OVERVIEW.md: Quick reference with visual diagrams comparing
  current rule-based system to proposed ML architecture, metrics, and
  implementation phases

Key improvements proposed:
- Replace O(n²) SessionState pipeline with vectorized feature extraction
- Train XGBoost classifier on experimentId labels (ROC-AUC >0.90 target)
- Multi-task neural network for joint agent detection + purchase prediction
- Gymnasium-based synthetic pricing environment for safe experimentation
- Knowledge distillation to extract interpretable pricing heuristics

Addresses margin leakage concerns with learned pricing strategies instead
of simple velocity thresholds.
2025-12-11 09:51:41 +00:00

27 KiB
Raw Blame History

Multi-Task Learning Architecture - Quick Reference

Current System (Baseline)

┌─────────────────────────────────────────────────────────────────┐
│                        CURRENT STATE                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                   │
│  Browser Events → Next.js → FastAPI → Kafka (user-interactions)  │
│                                           ↓                       │
│                                      Airflow (every 15min)       │
│                                           ↓                       │
│                              [Messy SessionState Pipeline]       │
│                                           ↓                       │
│                        Simple Rule-Based Pricing:                │
│                        - Surge (if demand > 10)                  │
│                        - Elasticity formula                      │
│                        - Velocity threshold for agents           │
│                                           ↓                       │
│                                    Redis (prices)                │
│                                           ↓                       │
│                                  Pricing Provider API            │
│                                                                   │
│  ISSUES:                                                          │
│  ✗ O(n²) feature extraction                                      │
│  ✗ No supervised ML for agent detection                          │
│  ✗ Simple heuristics (velocity > 5 → agent)                      │
│  ✗ No learning from data                                         │
│  ✗ Margin leakage not effectively addressed                      │
└─────────────────────────────────────────────────────────────────┘

Proposed System (Multi-Task Learning)

┌──────────────────────────────────────────────────────────────────────────┐
│                           PHASE 1: DATA PIPELINE                          │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│  Kafka (user-interactions)                                                │
│         ↓                                                                 │
│  ┌─────────────────────────────────────┐                                 │
│  │  VECTORIZED FEATURE PIPELINE        │                                 │
│  ├─────────────────────────────────────┤                                 │
│  │  1. TemporalFeatureExtractor        │  → 8 features (velocity, etc.)  │
│  │  2. BehavioralFeatureExtractor      │  → 10 features (carts, hovers)  │
│  │  3. ProductFeatureExtractor         │  → 8 features (prices, depth)   │
│  │  4. UserAgentParser                 │  → 3 features (browser type)    │
│  │  5. SessionAggregator               │  → Session-level matrix         │
│  │  6. ExperimentLabelJoiner           │  → Join with xp_human_only      │
│  └─────────────────────────────────────┘                                 │
│         ↓                                                                 │
│  Feature Matrix: [sessionId, 29 features, 3 labels]                      │
│                                                                            │
└──────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────┐
│                    PHASE 2: SUPERVISED AGENT CLASSIFIER                   │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│  Feature Matrix (29 features)                                             │
│         ↓                                                                 │
│  ┌────────────────────┐                                                   │
│  │   XGBoost Model    │                                                   │
│  ├────────────────────┤                                                   │
│  │  Input: 29 dims    │                                                   │
│  │  Output: P(agent)  │                                                   │
│  │  Loss: BCE         │                                                   │
│  └────────────────────┘                                                   │
│         ↓                                                                 │
│  Target: ROC-AUC > 0.90                                                   │
│                                                                            │
│  DEPLOYMENT:                                                              │
│  - Real-time inference in Pricing Provider                                │
│  - Dynamic markup: P(agent) > 0.7 → 1.3x price                           │
│  - Retrain daily via Airflow                                              │
│                                                                            │
└──────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────┐
│                  PHASE 3: MULTI-TASK LEARNING MODEL                       │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│  Input: Session Features (29) + Product Features (10) + Current Price    │
│         ↓                                                                 │
│  ┌───────────────────────────────────────────────────────────┐           │
│  │              MULTI-TASK NEURAL NETWORK                    │           │
│  ├───────────────────────────────────────────────────────────┤           │
│  │                                                             │           │
│  │   ┌──────────────────────┐                                │           │
│  │   │  Session Encoder     │  (Shared)                      │           │
│  │   │  [29] → [128] → [64] │                                │           │
│  │   └──────────┬───────────┘                                │           │
│  │              │                                             │           │
│  │              ├────────────┬───────────────┐                │           │
│  │              ↓            ↓               ↓                │           │
│  │       ┌─────────┐   ┌─────────┐   ┌─────────────┐        │           │
│  │       │ Task A  │   │ Product │   │   Task B    │        │           │
│  │       │ Agent   │   │ Encoder │   │  Purchase   │        │           │
│  │       │ Head    │   │ [10]→16 │   │  Prob Head  │        │           │
│  │       └────┬────┘   └────┬────┘   └──────┬──────┘        │           │
│  │            ↓             └────┬────────────┘               │           │
│  │       P(agent)               ↓                            │           │
│  │                          P(purchase|price)                 │           │
│  │                                                             │           │
│  │   Loss = α·BCE(agent) + β·BCE(purchase)                   │           │
│  │   α=1.0, β=2.0 (tune these weights)                       │           │
│  └───────────────────────────────────────────────────────────┘           │
│         ↓                                                                 │
│  OUTPUTS:                                                                 │
│  1. Agent probability (like Phase 2)                                     │
│  2. Purchase probability given price                                      │
│  3. Session embedding (for knowledge distillation)                        │
│                                                                            │
│  USE CASE:                                                                │
│  Optimal Price = argmax_p [ p · P(purchase|p) · (1 + λ·P(agent)) ]      │
│                                                                            │
└──────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────┐
│                     KNOWLEDGE DISTILLATION BRANCH                         │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│  Multi-Task Model (teacher)                                               │
│         ↓                                                                 │
│  Generate predictions on validation set                                   │
│         ↓                                                                 │
│  ┌──────────────────────────────────────┐                                │
│  │  Distill to Decision Tree (student)  │                                │
│  ├──────────────────────────────────────┤                                │
│  │  Input: 29 session features          │                                │
│  │  Output: Optimal markup multiplier   │                                │
│  │  Max depth: 5 (interpretable)        │                                │
│  └──────────────────────────────────────┘                                │
│         ↓                                                                 │
│  Extract Human-Readable Rules:                                            │
│                                                                            │
│  IF interaction_velocity > 10 AND cart_to_view_ratio < 0.1:              │
│      markup = 1.3  (likely agent reconnaissance)                          │
│  ELIF unique_products_viewed < 3 AND session_duration > 300:             │
│      markup = 0.9  (engaged human, offer discount)                       │
│  ELSE:                                                                    │
│      markup = 1.0  (baseline)                                             │
│                                                                            │
│  Also: SHAP values for feature importance analysis                        │
│                                                                            │
└──────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────┐
│              PHASE 4: SYNTHETIC DYNAMIC PRICING SIMULATOR                 │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│  PURPOSE: Fast experimentation without real users                         │
│                                                                            │
│  ┌────────────────────────────────────────────────────┐                  │
│  │         DynamicPricingEnv (Gymnasium)              │                  │
│  ├────────────────────────────────────────────────────┤                  │
│  │                                                      │                  │
│  │  State:   [demand, inventory, hour, agent_frac,    │                  │
│  │            avg_velocity]                            │                  │
│  │                                                      │                  │
│  │  Action:  price_multiplier ∈ [0.7, 1.5]           │                  │
│  │                                                      │                  │
│  │  Dynamics:                                          │                  │
│  │  - Simulate user arrivals (Poisson)                │                  │
│  │  - Split into humans (30%) vs agents (70%)         │                  │
│  │  - Purchase probability:                            │                  │
│  │      P_human(buy) = logistic(price, sensitivity=2) │                  │
│  │      P_agent(buy) = logistic(price, sensitivity=5) │                  │
│  │                                                      │                  │
│  │  Reward:  revenue - 0.5 * margin_leakage           │                  │
│  │           where margin_leakage = (oracle_price -   │                  │
│  │                                   actual_price) ×   │                  │
│  │                                   agent_purchases   │                  │
│  └────────────────────────────────────────────────────┘                  │
│         ↓                                                                 │
│  ┌────────────────────────────────────────┐                              │
│  │  Train RL Agent (PPO)                  │                              │
│  ├────────────────────────────────────────┤                              │
│  │  Learn policy: State → Optimal Price   │                              │
│  │  100k timesteps training                │                              │
│  └────────────────────────────────────────┘                              │
│         ↓                                                                 │
│  BENCHMARK vs Baselines:                                                  │
│  - Fixed pricing: 1.0x always                                             │
│  - Simple surge: 1.2x if demand > 10, else 0.9x                         │
│  - Elasticity-based: formula                                              │
│  - RL policy: learned                                                     │
│  - Multi-task + RL: Use MT model predictions as state features           │
│                                                                            │
│  VALIDATION:                                                              │
│  - Calibrate simulator from historical data                               │
│  - Run counterfactuals ("what if agent_frac=0.8?")                       │
│  - A/B test winner on real traffic                                        │
│                                                                            │
└──────────────────────────────────────────────────────────────────────────┘

Data Flow (Production)

┌─────────────┐
│   Browser   │
│ (User/Agent)│
└──────┬──────┘
       │ POST /api/ingest (events + experimentId)
       ↓
┌──────────────┐
│  Next.js API │
└──────┬───────┘
       │ Forward events
       ↓
┌──────────────┐
│ FastAPI      │
│ /api/kafka   │
│ /ingest      │
└──────┬───────┘
       │ Publish
       ↓
┌─────────────────────────┐
│ Kafka                   │
│ Topic: user-interactions│
└──────┬──────────────────┘
       │
       ├──────────────────┬──────────────────┐
       ↓                  ↓                  ↓
┌──────────────┐  ┌──────────────┐  ┌──────────────────┐
│ Airflow      │  │ Real-Time    │  │ Kafka Streams    │
│ (Batch)      │  │ Inference    │  │ (Feature Cache)  │
│              │  │              │  │                  │
│ Daily:       │  │ On Price     │  │ Rolling window   │
│ - Retrain    │  │ Request:     │  │ compute session  │
│   classifier │  │ - Get session│  │ features, push   │
│ - Retrain MT │  │   features   │  │ to Redis         │
│   model      │  │ - Predict    │  │                  │
│ - Publish to │  │   P(agent)   │  │ TTL: 1 hour      │
│   registry   │  │ - Predict    │  │                  │
│              │  │   P(purchase)│  │                  │
│              │  │ - Compute    │  │                  │
│              │  │   optimal_p  │  │                  │
└──────┬───────┘  └──────┬───────┘  └────────┬─────────┘
       │                 │                   │
       ↓                 ↓                   ↓
┌──────────────────────────────────────────────┐
│          Redis (Model Registry)              │
├──────────────────────────────────────────────┤
│ Keys:                                         │
│ - classifier:agent_detector:latest (pickle)  │
│ - multitask_model:latest (state_dict)        │
│ - session_features:{sessionId} (json, TTL)   │
│ - prices:latest (DataFrame)                  │
│ - elasticity:latest (DataFrame)              │
└──────────────────┬───────────────────────────┘
                   │
                   ↓
         ┌─────────────────────┐
         │ Pricing Provider    │
         │ /api/{mode}/price/  │
         │ {productId}         │
         │                     │
         │ GET sessionId       │
         │ → Load features     │
         │ → Load models       │
         │ → Predict           │
         │ → Return price      │
         └─────────┬───────────┘
                   │
                   ↓
         ┌─────────────────────┐
         │   Frontend          │
         │   (Display price)   │
         └─────────────────────┘

Key Metrics

Model Performance

Metric Target Current Phase
Agent Classifier ROC-AUC >0.90 N/A (rule-based) Phase 2
Purchase Predictor ROC-AUC >0.75 N/A Phase 3
Pricing Latency (p99) <100ms ~50ms All
Retraining Frequency Daily Every 15min (rules) Phase 2+

Business Impact

Metric Target Current Phase
Margin Leakage Reduction -30% Baseline Phase 2-4
Human Conversion Rate No change Baseline All
Agent Detection Rate >85% precision ~60% (velocity) Phase 2
Revenue Uplift +10% Baseline Phase 3-4

File Structure (New)

experiments/
  ml/
    __init__.py

    # Phase 1: Features
    features/
      __init__.py
      temporal.py           # TemporalFeatureExtractor
      behavioral.py         # BehavioralFeatureExtractor
      product.py            # ProductFeatureExtractor
      useragent.py          # UserAgentParser
      aggregator.py         # SessionAggregator

    pipeline.py             # build_feature_pipeline()
    datasets.py             # load_events_from_kafka(), etc.

    # Phase 2: Classifier
    train_classifier.py     # XGBoost training script

    # Phase 3: Multi-Task
    models/
      __init__.py
      multitask.py          # MultiTaskPricingModel (PyTorch)

    train_multitask.py      # Multi-task training script
    distill.py              # Knowledge distillation

    # Phase 4: Simulator
    simulator/
      __init__.py
      env.py                # DynamicPricingEnv (Gymnasium)
      agents.py             # HumanUser, AgentUser
      train_rl.py           # PPO training

    # Inference
    inference/
      __init__.py
      pricing_service.py    # gRPC service (optional)
      feature_cache.py      # Redis feature store client

    # Notebooks
    notebooks/
      01_eda.ipynb
      02_feature_analysis.ipynb
      03_model_evaluation.ipynb
      04_simulator_calibration.ipynb

Critical Code Changes

1. Replace Messy SessionState

Before: experiments/procesing/steps/session.py (O(n²) loops) After: experiments/ml/pipeline.py (vectorized pipeline)

2. Upgrade Pricing Provider

Before: Simple velocity threshold After: ML model inference with agent probability

3. Add Real-Time Feature Store

Before: No feature caching After: Kafka Streams → Redis (session features)

4. Airflow DAG Upgrades

Before: surge_pricing_pipeline (rule-based) After: Add agent_classifier_training_pipeline (daily retrain)

Next Actions (Start Here)

  1. Read gameplan: See /home/user/PHANTOM/docs/GAMEPLAN_MULTITASK_PRICING.md

  2. Create directory structure:

    mkdir -p experiments/ml/{features,models,simulator,inference,notebooks}
    
  3. Pull sample data:

    # experiments/ml/notebooks/01_eda.ipynb
    from kafka import KafkaConsumer
    # Pull 1 week of events, join with experiments table
    # Analyze label distribution, feature correlations
    
  4. Prototype first feature extractor:

    # experiments/ml/features/temporal.py
    # Start with TemporalFeatureExtractor
    # Test on 10k events, validate output schema
    
  5. Review with team: Discuss tradeoffs, priorities, timeline

Questions to Resolve

  1. Label Quality: How confident are we in xp_human_only labels? Should we add manual verification?

  2. Compute Budget: Do we have GPU access for PyTorch training? (Phase 3)

  3. Latency Requirements: Is 100ms p99 acceptable for pricing API?

  4. A/B Testing: Do we have infrastructure for traffic splitting? (Deployment)

  5. Monitoring: Who owns the Grafana dashboards? What alerting thresholds?


For detailed implementation, see: /home/user/PHANTOM/docs/GAMEPLAN_MULTITASK_PRICING.md