# Multi-Task Learning Architecture - Quick Reference

## Current System (Baseline)

```
┌─────────────────────────────────────────────────────────────────┐
│                        CURRENT STATE                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                   │
│  Browser Events → Next.js → FastAPI → Kafka (user-interactions)  │
│                                           ↓                       │
│                                      Airflow (every 15min)       │
│                                           ↓                       │
│                              [Messy SessionState Pipeline]       │
│                                           ↓                       │
│                        Simple Rule-Based Pricing:                │
│                        - Surge (if demand > 10)                  │
│                        - Elasticity formula                      │
│                        - Velocity threshold for agents           │
│                                           ↓                       │
│                                    Redis (prices)                │
│                                           ↓                       │
│                                  Pricing Provider API            │
│                                                                   │
│  ISSUES:                                                          │
│  ✗ O(n²) feature extraction                                      │
│  ✗ No supervised ML for agent detection                          │
│  ✗ Simple heuristics (velocity > 5 → agent)                      │
│  ✗ No learning from data                                         │
│  ✗ Margin leakage not effectively addressed                      │
└─────────────────────────────────────────────────────────────────┘
```

## Proposed System (Multi-Task Learning)

```
┌──────────────────────────────────────────────────────────────────────────┐
│                           PHASE 1: DATA PIPELINE                          │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│  Kafka (user-interactions)                                                │
│         ↓                                                                 │
│  ┌─────────────────────────────────────┐                                 │
│  │  VECTORIZED FEATURE PIPELINE        │                                 │
│  ├─────────────────────────────────────┤                                 │
│  │  1. TemporalFeatureExtractor        │  → 8 features (velocity, etc.)  │
│  │  2. BehavioralFeatureExtractor      │  → 10 features (carts, hovers)  │
│  │  3. ProductFeatureExtractor         │  → 8 features (prices, depth)   │
│  │  4. UserAgentParser                 │  → 3 features (browser type)    │
│  │  5. SessionAggregator               │  → Session-level matrix         │
│  │  6. ExperimentLabelJoiner           │  → Join with xp_human_only      │
│  └─────────────────────────────────────┘                                 │
│         ↓                                                                 │
│  Feature Matrix: [sessionId, 29 features, 3 labels]                      │
│                                                                            │
└──────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────┐
│                    PHASE 2: SUPERVISED AGENT CLASSIFIER                   │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│  Feature Matrix (29 features)                                             │
│         ↓                                                                 │
│  ┌────────────────────┐                                                   │
│  │   XGBoost Model    │                                                   │
│  ├────────────────────┤                                                   │
│  │  Input: 29 dims    │                                                   │
│  │  Output: P(agent)  │                                                   │
│  │  Loss: BCE         │                                                   │
│  └────────────────────┘                                                   │
│         ↓                                                                 │
│  Target: ROC-AUC > 0.90                                                   │
│                                                                            │
│  DEPLOYMENT:                                                              │
│  - Real-time inference in Pricing Provider                                │
│  - Dynamic markup: P(agent) > 0.7 → 1.3x price                           │
│  - Retrain daily via Airflow                                              │
│                                                                            │
└──────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────┐
│                  PHASE 3: MULTI-TASK LEARNING MODEL                       │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│  Input: Session Features (29) + Product Features (10) + Current Price    │
│         ↓                                                                 │
│  ┌───────────────────────────────────────────────────────────┐           │
│  │              MULTI-TASK NEURAL NETWORK                    │           │
│  ├───────────────────────────────────────────────────────────┤           │
│  │                                                             │           │
│  │   ┌──────────────────────┐                                │           │
│  │   │  Session Encoder     │  (Shared)                      │           │
│  │   │  [29] → [128] → [64] │                                │           │
│  │   └──────────┬───────────┘                                │           │
│  │              │                                             │           │
│  │              ├────────────┬───────────────┐                │           │
│  │              ↓            ↓               ↓                │           │
│  │       ┌─────────┐   ┌─────────┐   ┌─────────────┐        │           │
│  │       │ Task A  │   │ Product │   │   Task B    │        │           │
│  │       │ Agent   │   │ Encoder │   │  Purchase   │        │           │
│  │       │ Head    │   │ [10]→16 │   │  Prob Head  │        │           │
│  │       └────┬────┘   └────┬────┘   └──────┬──────┘        │           │
│  │            ↓             └────┬────────────┘               │           │
│  │       P(agent)               ↓                            │           │
│  │                          P(purchase|price)                 │           │
│  │                                                             │           │
│  │   Loss = α·BCE(agent) + β·BCE(purchase)                   │           │
│  │   α=1.0, β=2.0 (tune these weights)                       │           │
│  └───────────────────────────────────────────────────────────┘           │
│         ↓                                                                 │
│  OUTPUTS:                                                                 │
│  1. Agent probability (like Phase 2)                                     │
│  2. Purchase probability given price                                      │
│  3. Session embedding (for knowledge distillation)                        │
│                                                                            │
│  USE CASE:                                                                │
│  Optimal Price = argmax_p [ p · P(purchase|p) · (1 + λ·P(agent)) ]      │
│                                                                            │
└──────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────┐
│                     KNOWLEDGE DISTILLATION BRANCH                         │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│  Multi-Task Model (teacher)                                               │
│         ↓                                                                 │
│  Generate predictions on validation set                                   │
│         ↓                                                                 │
│  ┌──────────────────────────────────────┐                                │
│  │  Distill to Decision Tree (student)  │                                │
│  ├──────────────────────────────────────┤                                │
│  │  Input: 29 session features          │                                │
│  │  Output: Optimal markup multiplier   │                                │
│  │  Max depth: 5 (interpretable)        │                                │
│  └──────────────────────────────────────┘                                │
│         ↓                                                                 │
│  Extract Human-Readable Rules:                                            │
│                                                                            │
│  IF interaction_velocity > 10 AND cart_to_view_ratio < 0.1:              │
│      markup = 1.3  (likely agent reconnaissance)                          │
│  ELIF unique_products_viewed < 3 AND session_duration > 300:             │
│      markup = 0.9  (engaged human, offer discount)                       │
│  ELSE:                                                                    │
│      markup = 1.0  (baseline)                                             │
│                                                                            │
│  Also: SHAP values for feature importance analysis                        │
│                                                                            │
└──────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────┐
│              PHASE 4: SYNTHETIC DYNAMIC PRICING SIMULATOR                 │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│  PURPOSE: Fast experimentation without real users                         │
│                                                                            │
│  ┌────────────────────────────────────────────────────┐                  │
│  │         DynamicPricingEnv (Gymnasium)              │                  │
│  ├────────────────────────────────────────────────────┤                  │
│  │                                                      │                  │
│  │  State:   [demand, inventory, hour, agent_frac,    │                  │
│  │            avg_velocity]                            │                  │
│  │                                                      │                  │
│  │  Action:  price_multiplier ∈ [0.7, 1.5]           │                  │
│  │                                                      │                  │
│  │  Dynamics:                                          │                  │
│  │  - Simulate user arrivals (Poisson)                │                  │
│  │  - Split into humans (30%) vs agents (70%)         │                  │
│  │  - Purchase probability:                            │                  │
│  │      P_human(buy) = logistic(price, sensitivity=2) │                  │
│  │      P_agent(buy) = logistic(price, sensitivity=5) │                  │
│  │                                                      │                  │
│  │  Reward:  revenue - 0.5 * margin_leakage           │                  │
│  │           where margin_leakage = (oracle_price -   │                  │
│  │                                   actual_price) ×   │                  │
│  │                                   agent_purchases   │                  │
│  └────────────────────────────────────────────────────┘                  │
│         ↓                                                                 │
│  ┌────────────────────────────────────────┐                              │
│  │  Train RL Agent (PPO)                  │                              │
│  ├────────────────────────────────────────┤                              │
│  │  Learn policy: State → Optimal Price   │                              │
│  │  100k timesteps training                │                              │
│  └────────────────────────────────────────┘                              │
│         ↓                                                                 │
│  BENCHMARK vs Baselines:                                                  │
│  - Fixed pricing: 1.0x always                                             │
│  - Simple surge: 1.2x if demand > 10, else 0.9x                         │
│  - Elasticity-based: formula                                              │
│  - RL policy: learned                                                     │
│  - Multi-task + RL: Use MT model predictions as state features           │
│                                                                            │
│  VALIDATION:                                                              │
│  - Calibrate simulator from historical data                               │
│  - Run counterfactuals ("what if agent_frac=0.8?")                       │
│  - A/B test winner on real traffic                                        │
│                                                                            │
└──────────────────────────────────────────────────────────────────────────┘
```

## Data Flow (Production)

```
┌─────────────┐
│   Browser   │
│ (User/Agent)│
└──────┬──────┘
       │ POST /api/ingest (events + experimentId)
       ↓
┌──────────────┐
│  Next.js API │
└──────┬───────┘
       │ Forward events
       ↓
┌──────────────┐
│ FastAPI      │
│ /api/kafka   │
│ /ingest      │
└──────┬───────┘
       │ Publish
       ↓
┌─────────────────────────┐
│ Kafka                   │
│ Topic: user-interactions│
└──────┬──────────────────┘
       │
       ├──────────────────┬──────────────────┐
       ↓                  ↓                  ↓
┌──────────────┐  ┌──────────────┐  ┌──────────────────┐
│ Airflow      │  │ Real-Time    │  │ Kafka Streams    │
│ (Batch)      │  │ Inference    │  │ (Feature Cache)  │
│              │  │              │  │                  │
│ Daily:       │  │ On Price     │  │ Rolling window   │
│ - Retrain    │  │ Request:     │  │ compute session  │
│   classifier │  │ - Get session│  │ features, push   │
│ - Retrain MT │  │   features   │  │ to Redis         │
│   model      │  │ - Predict    │  │                  │
│ - Publish to │  │   P(agent)   │  │ TTL: 1 hour      │
│   registry   │  │ - Predict    │  │                  │
│              │  │   P(purchase)│  │                  │
│              │  │ - Compute    │  │                  │
│              │  │   optimal_p  │  │                  │
└──────┬───────┘  └──────┬───────┘  └────────┬─────────┘
       │                 │                   │
       ↓                 ↓                   ↓
┌──────────────────────────────────────────────┐
│          Redis (Model Registry)              │
├──────────────────────────────────────────────┤
│ Keys:                                         │
│ - classifier:agent_detector:latest (pickle)  │
│ - multitask_model:latest (state_dict)        │
│ - session_features:{sessionId} (json, TTL)   │
│ - prices:latest (DataFrame)                  │
│ - elasticity:latest (DataFrame)              │
└──────────────────┬───────────────────────────┘
                   │
                   ↓
         ┌─────────────────────┐
         │ Pricing Provider    │
         │ /api/{mode}/price/  │
         │ {productId}         │
         │                     │
         │ GET sessionId       │
         │ → Load features     │
         │ → Load models       │
         │ → Predict           │
         │ → Return price      │
         └─────────┬───────────┘
                   │
                   ↓
         ┌─────────────────────┐
         │   Frontend          │
         │   (Display price)   │
         └─────────────────────┘
```

## Key Metrics

### Model Performance
| Metric | Target | Current | Phase |
|--------|--------|---------|-------|
| Agent Classifier ROC-AUC | >0.90 | N/A (rule-based) | Phase 2 |
| Purchase Predictor ROC-AUC | >0.75 | N/A | Phase 3 |
| Pricing Latency (p99) | <100ms | ~50ms | All |
| Retraining Frequency | Daily | Every 15min (rules) | Phase 2+ |

### Business Impact
| Metric | Target | Current | Phase |
|--------|--------|---------|-------|
| Margin Leakage Reduction | -30% | Baseline | Phase 2-4 |
| Human Conversion Rate | No change | Baseline | All |
| Agent Detection Rate | >85% precision | ~60% (velocity) | Phase 2 |
| Revenue Uplift | +10% | Baseline | Phase 3-4 |

## File Structure (New)

```
experiments/
  ml/
    __init__.py

    # Phase 1: Features
    features/
      __init__.py
      temporal.py           # TemporalFeatureExtractor
      behavioral.py         # BehavioralFeatureExtractor
      product.py            # ProductFeatureExtractor
      useragent.py          # UserAgentParser
      aggregator.py         # SessionAggregator

    pipeline.py             # build_feature_pipeline()
    datasets.py             # load_events_from_kafka(), etc.

    # Phase 2: Classifier
    train_classifier.py     # XGBoost training script

    # Phase 3: Multi-Task
    models/
      __init__.py
      multitask.py          # MultiTaskPricingModel (PyTorch)

    train_multitask.py      # Multi-task training script
    distill.py              # Knowledge distillation

    # Phase 4: Simulator
    simulator/
      __init__.py
      env.py                # DynamicPricingEnv (Gymnasium)
      agents.py             # HumanUser, AgentUser
      train_rl.py           # PPO training

    # Inference
    inference/
      __init__.py
      pricing_service.py    # gRPC service (optional)
      feature_cache.py      # Redis feature store client

    # Notebooks
    notebooks/
      01_eda.ipynb
      02_feature_analysis.ipynb
      03_model_evaluation.ipynb
      04_simulator_calibration.ipynb
```

## Critical Code Changes

### 1. Replace Messy SessionState
**Before:** `experiments/procesing/steps/session.py` (O(n²) loops)
**After:** `experiments/ml/pipeline.py` (vectorized pipeline)

### 2. Upgrade Pricing Provider
**Before:** Simple velocity threshold
**After:** ML model inference with agent probability

### 3. Add Real-Time Feature Store
**Before:** No feature caching
**After:** Kafka Streams → Redis (session features)

### 4. Airflow DAG Upgrades
**Before:** `surge_pricing_pipeline` (rule-based)
**After:** Add `agent_classifier_training_pipeline` (daily retrain)

## Next Actions (Start Here)

1. ✅ **Read gameplan**: See `/home/user/PHANTOM/docs/GAMEPLAN_MULTITASK_PRICING.md`

2. **Create directory structure**:
   ```bash
   mkdir -p experiments/ml/{features,models,simulator,inference,notebooks}
   ```

3. **Pull sample data**:
   ```python
   # experiments/ml/notebooks/01_eda.ipynb
   from kafka import KafkaConsumer
   # Pull 1 week of events, join with experiments table
   # Analyze label distribution, feature correlations
   ```

4. **Prototype first feature extractor**:
   ```python
   # experiments/ml/features/temporal.py
   # Start with TemporalFeatureExtractor
   # Test on 10k events, validate output schema
   ```

5. **Review with team**: Discuss tradeoffs, priorities, timeline

## Questions to Resolve

1. **Label Quality**: How confident are we in `xp_human_only` labels? Should we add manual verification?

2. **Compute Budget**: Do we have GPU access for PyTorch training? (Phase 3)

3. **Latency Requirements**: Is 100ms p99 acceptable for pricing API?

4. **A/B Testing**: Do we have infrastructure for traffic splitting? (Deployment)

5. **Monitoring**: Who owns the Grafana dashboards? What alerting thresholds?

---

**For detailed implementation, see:** `/home/user/PHANTOM/docs/GAMEPLAN_MULTITASK_PRICING.md`