Files
PHANTOM/docs/ARCHITECTURE_OVERVIEW.md
Claude aab54ea7c0 docs: Add comprehensive multi-task learning architecture and gameplan
Created detailed documentation for implementing multi-task learning system
to improve agent detection and dynamic pricing:

- GAMEPLAN_MULTITASK_PRICING.md: Complete 50+ page technical specification
  including feature engineering, supervised learning, multi-task neural
  networks, synthetic simulator, and knowledge distillation approach

- ARCHITECTURE_OVERVIEW.md: Quick reference with visual diagrams comparing
  current rule-based system to proposed ML architecture, metrics, and
  implementation phases

Key improvements proposed:
- Replace O(n²) SessionState pipeline with vectorized feature extraction
- Train XGBoost classifier on experimentId labels (ROC-AUC >0.90 target)
- Multi-task neural network for joint agent detection + purchase prediction
- Gymnasium-based synthetic pricing environment for safe experimentation
- Knowledge distillation to extract interpretable pricing heuristics

Addresses margin leakage concerns with learned pricing strategies instead
of simple velocity thresholds.
2025-12-11 09:51:41 +00:00

404 lines
27 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Multi-Task Learning Architecture - Quick Reference
## Current System (Baseline)
```
┌─────────────────────────────────────────────────────────────────┐
│ CURRENT STATE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Browser Events → Next.js → FastAPI → Kafka (user-interactions) │
│ ↓ │
│ Airflow (every 15min) │
│ ↓ │
│ [Messy SessionState Pipeline] │
│ ↓ │
│ Simple Rule-Based Pricing: │
│ - Surge (if demand > 10) │
│ - Elasticity formula │
│ - Velocity threshold for agents │
│ ↓ │
│ Redis (prices) │
│ ↓ │
│ Pricing Provider API │
│ │
│ ISSUES: │
│ ✗ O(n²) feature extraction │
│ ✗ No supervised ML for agent detection │
│ ✗ Simple heuristics (velocity > 5 → agent) │
│ ✗ No learning from data │
│ ✗ Margin leakage not effectively addressed │
└─────────────────────────────────────────────────────────────────┘
```
## Proposed System (Multi-Task Learning)
```
┌──────────────────────────────────────────────────────────────────────────┐
│ PHASE 1: DATA PIPELINE │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ Kafka (user-interactions) │
│ ↓ │
│ ┌─────────────────────────────────────┐ │
│ │ VECTORIZED FEATURE PIPELINE │ │
│ ├─────────────────────────────────────┤ │
│ │ 1. TemporalFeatureExtractor │ → 8 features (velocity, etc.) │
│ │ 2. BehavioralFeatureExtractor │ → 10 features (carts, hovers) │
│ │ 3. ProductFeatureExtractor │ → 8 features (prices, depth) │
│ │ 4. UserAgentParser │ → 3 features (browser type) │
│ │ 5. SessionAggregator │ → Session-level matrix │
│ │ 6. ExperimentLabelJoiner │ → Join with xp_human_only │
│ └─────────────────────────────────────┘ │
│ ↓ │
│ Feature Matrix: [sessionId, 29 features, 3 labels] │
│ │
└──────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────┐
│ PHASE 2: SUPERVISED AGENT CLASSIFIER │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ Feature Matrix (29 features) │
│ ↓ │
│ ┌────────────────────┐ │
│ │ XGBoost Model │ │
│ ├────────────────────┤ │
│ │ Input: 29 dims │ │
│ │ Output: P(agent) │ │
│ │ Loss: BCE │ │
│ └────────────────────┘ │
│ ↓ │
│ Target: ROC-AUC > 0.90 │
│ │
│ DEPLOYMENT: │
│ - Real-time inference in Pricing Provider │
│ - Dynamic markup: P(agent) > 0.7 → 1.3x price │
│ - Retrain daily via Airflow │
│ │
└──────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────┐
│ PHASE 3: MULTI-TASK LEARNING MODEL │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ Input: Session Features (29) + Product Features (10) + Current Price │
│ ↓ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ MULTI-TASK NEURAL NETWORK │ │
│ ├───────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ ┌──────────────────────┐ │ │
│ │ │ Session Encoder │ (Shared) │ │
│ │ │ [29] → [128] → [64] │ │ │
│ │ └──────────┬───────────┘ │ │
│ │ │ │ │
│ │ ├────────────┬───────────────┐ │ │
│ │ ↓ ↓ ↓ │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────────┐ │ │
│ │ │ Task A │ │ Product │ │ Task B │ │ │
│ │ │ Agent │ │ Encoder │ │ Purchase │ │ │
│ │ │ Head │ │ [10]→16 │ │ Prob Head │ │ │
│ │ └────┬────┘ └────┬────┘ └──────┬──────┘ │ │
│ │ ↓ └────┬────────────┘ │ │
│ │ P(agent) ↓ │ │
│ │ P(purchase|price) │ │
│ │ │ │
│ │ Loss = α·BCE(agent) + β·BCE(purchase) │ │
│ │ α=1.0, β=2.0 (tune these weights) │ │
│ └───────────────────────────────────────────────────────────┘ │
│ ↓ │
│ OUTPUTS: │
│ 1. Agent probability (like Phase 2) │
│ 2. Purchase probability given price │
│ 3. Session embedding (for knowledge distillation) │
│ │
│ USE CASE: │
│ Optimal Price = argmax_p [ p · P(purchase|p) · (1 + λ·P(agent)) ] │
│ │
└──────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────┐
│ KNOWLEDGE DISTILLATION BRANCH │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ Multi-Task Model (teacher) │
│ ↓ │
│ Generate predictions on validation set │
│ ↓ │
│ ┌──────────────────────────────────────┐ │
│ │ Distill to Decision Tree (student) │ │
│ ├──────────────────────────────────────┤ │
│ │ Input: 29 session features │ │
│ │ Output: Optimal markup multiplier │ │
│ │ Max depth: 5 (interpretable) │ │
│ └──────────────────────────────────────┘ │
│ ↓ │
│ Extract Human-Readable Rules: │
│ │
│ IF interaction_velocity > 10 AND cart_to_view_ratio < 0.1: │
│ markup = 1.3 (likely agent reconnaissance) │
│ ELIF unique_products_viewed < 3 AND session_duration > 300: │
│ markup = 0.9 (engaged human, offer discount) │
│ ELSE: │
│ markup = 1.0 (baseline) │
│ │
│ Also: SHAP values for feature importance analysis │
│ │
└──────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────┐
│ PHASE 4: SYNTHETIC DYNAMIC PRICING SIMULATOR │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ PURPOSE: Fast experimentation without real users │
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ DynamicPricingEnv (Gymnasium) │ │
│ ├────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ State: [demand, inventory, hour, agent_frac, │ │
│ │ avg_velocity] │ │
│ │ │ │
│ │ Action: price_multiplier ∈ [0.7, 1.5] │ │
│ │ │ │
│ │ Dynamics: │ │
│ │ - Simulate user arrivals (Poisson) │ │
│ │ - Split into humans (30%) vs agents (70%) │ │
│ │ - Purchase probability: │ │
│ │ P_human(buy) = logistic(price, sensitivity=2) │ │
│ │ P_agent(buy) = logistic(price, sensitivity=5) │ │
│ │ │ │
│ │ Reward: revenue - 0.5 * margin_leakage │ │
│ │ where margin_leakage = (oracle_price - │ │
│ │ actual_price) × │ │
│ │ agent_purchases │ │
│ └────────────────────────────────────────────────────┘ │
│ ↓ │
│ ┌────────────────────────────────────────┐ │
│ │ Train RL Agent (PPO) │ │
│ ├────────────────────────────────────────┤ │
│ │ Learn policy: State → Optimal Price │ │
│ │ 100k timesteps training │ │
│ └────────────────────────────────────────┘ │
│ ↓ │
│ BENCHMARK vs Baselines: │
│ - Fixed pricing: 1.0x always │
│ - Simple surge: 1.2x if demand > 10, else 0.9x │
│ - Elasticity-based: formula │
│ - RL policy: learned │
│ - Multi-task + RL: Use MT model predictions as state features │
│ │
│ VALIDATION: │
│ - Calibrate simulator from historical data │
│ - Run counterfactuals ("what if agent_frac=0.8?") │
│ - A/B test winner on real traffic │
│ │
└──────────────────────────────────────────────────────────────────────────┘
```
## Data Flow (Production)
```
┌─────────────┐
│ Browser │
│ (User/Agent)│
└──────┬──────┘
│ POST /api/ingest (events + experimentId)
┌──────────────┐
│ Next.js API │
└──────┬───────┘
│ Forward events
┌──────────────┐
│ FastAPI │
│ /api/kafka │
│ /ingest │
└──────┬───────┘
│ Publish
┌─────────────────────────┐
│ Kafka │
│ Topic: user-interactions│
└──────┬──────────────────┘
├──────────────────┬──────────────────┐
↓ ↓ ↓
┌──────────────┐ ┌──────────────┐ ┌──────────────────┐
│ Airflow │ │ Real-Time │ │ Kafka Streams │
│ (Batch) │ │ Inference │ │ (Feature Cache) │
│ │ │ │ │ │
│ Daily: │ │ On Price │ │ Rolling window │
│ - Retrain │ │ Request: │ │ compute session │
│ classifier │ │ - Get session│ │ features, push │
│ - Retrain MT │ │ features │ │ to Redis │
│ model │ │ - Predict │ │ │
│ - Publish to │ │ P(agent) │ │ TTL: 1 hour │
│ registry │ │ - Predict │ │ │
│ │ │ P(purchase)│ │ │
│ │ │ - Compute │ │ │
│ │ │ optimal_p │ │ │
└──────┬───────┘ └──────┬───────┘ └────────┬─────────┘
│ │ │
↓ ↓ ↓
┌──────────────────────────────────────────────┐
│ Redis (Model Registry) │
├──────────────────────────────────────────────┤
│ Keys: │
│ - classifier:agent_detector:latest (pickle) │
│ - multitask_model:latest (state_dict) │
│ - session_features:{sessionId} (json, TTL) │
│ - prices:latest (DataFrame) │
│ - elasticity:latest (DataFrame) │
└──────────────────┬───────────────────────────┘
┌─────────────────────┐
│ Pricing Provider │
│ /api/{mode}/price/ │
│ {productId} │
│ │
│ GET sessionId │
│ → Load features │
│ → Load models │
│ → Predict │
│ → Return price │
└─────────┬───────────┘
┌─────────────────────┐
│ Frontend │
│ (Display price) │
└─────────────────────┘
```
## Key Metrics
### Model Performance
| Metric | Target | Current | Phase |
|--------|--------|---------|-------|
| Agent Classifier ROC-AUC | >0.90 | N/A (rule-based) | Phase 2 |
| Purchase Predictor ROC-AUC | >0.75 | N/A | Phase 3 |
| Pricing Latency (p99) | <100ms | ~50ms | All |
| Retraining Frequency | Daily | Every 15min (rules) | Phase 2+ |
### Business Impact
| Metric | Target | Current | Phase |
|--------|--------|---------|-------|
| Margin Leakage Reduction | -30% | Baseline | Phase 2-4 |
| Human Conversion Rate | No change | Baseline | All |
| Agent Detection Rate | >85% precision | ~60% (velocity) | Phase 2 |
| Revenue Uplift | +10% | Baseline | Phase 3-4 |
## File Structure (New)
```
experiments/
ml/
__init__.py
# Phase 1: Features
features/
__init__.py
temporal.py # TemporalFeatureExtractor
behavioral.py # BehavioralFeatureExtractor
product.py # ProductFeatureExtractor
useragent.py # UserAgentParser
aggregator.py # SessionAggregator
pipeline.py # build_feature_pipeline()
datasets.py # load_events_from_kafka(), etc.
# Phase 2: Classifier
train_classifier.py # XGBoost training script
# Phase 3: Multi-Task
models/
__init__.py
multitask.py # MultiTaskPricingModel (PyTorch)
train_multitask.py # Multi-task training script
distill.py # Knowledge distillation
# Phase 4: Simulator
simulator/
__init__.py
env.py # DynamicPricingEnv (Gymnasium)
agents.py # HumanUser, AgentUser
train_rl.py # PPO training
# Inference
inference/
__init__.py
pricing_service.py # gRPC service (optional)
feature_cache.py # Redis feature store client
# Notebooks
notebooks/
01_eda.ipynb
02_feature_analysis.ipynb
03_model_evaluation.ipynb
04_simulator_calibration.ipynb
```
## Critical Code Changes
### 1. Replace Messy SessionState
**Before:** `experiments/procesing/steps/session.py` (O(n²) loops)
**After:** `experiments/ml/pipeline.py` (vectorized pipeline)
### 2. Upgrade Pricing Provider
**Before:** Simple velocity threshold
**After:** ML model inference with agent probability
### 3. Add Real-Time Feature Store
**Before:** No feature caching
**After:** Kafka Streams → Redis (session features)
### 4. Airflow DAG Upgrades
**Before:** `surge_pricing_pipeline` (rule-based)
**After:** Add `agent_classifier_training_pipeline` (daily retrain)
## Next Actions (Start Here)
1.**Read gameplan**: See `/home/user/PHANTOM/docs/GAMEPLAN_MULTITASK_PRICING.md`
2. **Create directory structure**:
```bash
mkdir -p experiments/ml/{features,models,simulator,inference,notebooks}
```
3. **Pull sample data**:
```python
# experiments/ml/notebooks/01_eda.ipynb
from kafka import KafkaConsumer
# Pull 1 week of events, join with experiments table
# Analyze label distribution, feature correlations
```
4. **Prototype first feature extractor**:
```python
# experiments/ml/features/temporal.py
# Start with TemporalFeatureExtractor
# Test on 10k events, validate output schema
```
5. **Review with team**: Discuss tradeoffs, priorities, timeline
## Questions to Resolve
1. **Label Quality**: How confident are we in `xp_human_only` labels? Should we add manual verification?
2. **Compute Budget**: Do we have GPU access for PyTorch training? (Phase 3)
3. **Latency Requirements**: Is 100ms p99 acceptable for pricing API?
4. **A/B Testing**: Do we have infrastructure for traffic splitting? (Deployment)
5. **Monitoring**: Who owns the Grafana dashboards? What alerting thresholds?
---
**For detailed implementation, see:** `/home/user/PHANTOM/docs/GAMEPLAN_MULTITASK_PRICING.md`