From b0a164795618199c7f6c4c5306e029b1ad5e7942 Mon Sep 17 00:00:00 2001
From: Daniel Rosel <daniel@alves.world>
Date: Fri, 23 Jan 2026 12:52:58 +0100
Subject: [PATCH] docs

---
 lab/README.md                | 74 +++++++++++++++++++++++++++
 lab/docs/index.rst           |  1 +
 lab/docs/system_overview.rst | 97 ++++++++++++++++++++++++++++++++++++
 3 files changed, 172 insertions(+)
 create mode 100644 lab/docs/system_overview.rst

diff --git a/lab/README.md b/lab/README.md
index c4db76a..b5226aa 100644
--- a/lab/README.md
+++ b/lab/README.md
@@ -1 +1,75 @@
 # MOS (Money Operating System)
+
+Research-grade quote-control simulator for studying dynamic pricing and market making policies.
+The system models pricing as a closed loop of **Quote → Arrival → Execution → Position**, enabling
+controlled experimentation with demand models, inventory constraints, and reward shaping.
+
+## Core Loop
+
+1. **Quote** – the policy posts prices (one-sided or two-sided depending on the mechanism).
+2. **Arrival** – a population model generates purchase opportunities or market orders.
+3. **Execution** – an execution model decides whether an arrival converts at the quoted price.
+4. **Position** – inventory/position limits censor fills and generate holding/shortage costs.
+5. **Observation & Reward** – censored fills and aggregate metrics are exposed to the agent, while
+   objectives turn metrics into a scalar reward.
+
+Each stage is pluggable via light-weight protocols so you can swap in alternative mechanisms,
+demand models, or objectives without rewriting the rest of the simulator.
+
+## Package Layout
+
+| Module            | Purpose |
+|-------------------|---------|
+| `lab.outlet`      | Core simulation engine, domain types, pricing mechanisms, objectives. |
+| `lab.population`  | Demand arrival models, execution probability models, competitor/market dynamics. |
+| `lab.experiments` | Rollout utilities, baseline policies, and off-policy evaluation helpers. |
+| `lab.config`      | Convenience factories for preconfigured retail and market-making environments. |
+
+## Preconfigured Scenarios
+
+### Retail Dynamic Pricing
+- Mechanism: posted prices with margin and delta constraints.
+- Arrivals: browsing sessions with contamination support (scrapers).
+- Execution: elasticity model with competitor cross-effects.
+- Position: inventory tracking with holding and shortage costs.
+- Market: reactive competitor that can trigger price wars.
+- Objective: PnL minus volatility, holding cost, and lost opportunity penalties.
+
+```python
+from lab.config import make_retail_platform
+from lab.experiments import rollout, fixed_price_policy
+
+platform = make_retail_platform()
+policy = fixed_price_policy(platform.instruments.refs)
+result = rollout(platform, policy, n_steps=100)
+print(result.total_pnl)
+```
+
+### Market Making
+- Mechanism: two-sided quoting with bid/ask spreads.
+- Arrivals: Hawkes order flow for clustered demand.
+- Execution: Avellaneda–Stoikov style intensity model.
+- Position: inventory risk limits and quadratic penalty objective.
+- Market: geometric Brownian motion mid-price process.
+- Objective: PnL plus spread capture minus inventory risk.
+
+```python
+from lab.config import make_market_making_platform
+from lab.experiments import rollout
+
+platform = make_market_making_platform()
+mm_policy = lambda obs, t: (platform.instruments.refs, 1.0)
+result = rollout(platform, mm_policy, n_steps=200, seed=42)
+print(result.total_pnl)
+```
+
+## Extending the Simulator
+
+- Implement `lab.outlet.protocols.Mechanism` or `ArrivalModel` to introduce new pricing
+domains or demand processes.
+- Compose objectives with `lab.outlet.objectives.factory.make_composite` to study alternate
+reward formulations.
+- Use `lab.experiments.compare_policies` to benchmark candidate policies across multiple
+random seeds.
+
+Comprehensive API documentation lives in `lab/docs` (build with `make html`).
diff --git a/lab/docs/index.rst b/lab/docs/index.rst
index b53fbba..bd36ecd 100644
--- a/lab/docs/index.rst
+++ b/lab/docs/index.rst
@@ -28,6 +28,7 @@ Quick Start
    :maxdepth: 2
    :caption: Contents:
 
+   system_overview
    modules/outlet
    modules/population
    modules/experiments
diff --git a/lab/docs/system_overview.rst b/lab/docs/system_overview.rst
new file mode 100644
index 0000000..3fda8ad
--- /dev/null
+++ b/lab/docs/system_overview.rst
@@ -0,0 +1,97 @@
+System Overview
+===============
+
+The simulator organises dynamic pricing and market-making experiments as a
+closed loop with the following stages:
+
+* **Quote** – a policy or agent emits a :class:`lab.outlet.types.Quote`. The
+  quote is normalised and validated by a concrete
+  :class:`lab.outlet.protocols.Mechanism` implementation
+  (posted-price, two-sided, auction).
+* **Arrival** – a :class:`lab.outlet.protocols.ArrivalModel` samples a stream of
+  :class:`lab.outlet.types.Opportunity` objects given the current time,
+  instrument catalogue, and market state.
+* **Execution** – the :class:`lab.outlet.protocols.ExecutionModel` converts an
+  opportunity into a probabilistic fill using the active quote, optional
+  competitor prices, and demand-side context.
+* **Position** – a :class:`lab.outlet.protocols.PositionModel` enforces
+  inventory or position constraints, censors oversized fills, and accrues
+  holding and shortage costs.
+* **Observation & Reward** – the
+  :class:`lab.outlet.protocols.ObservationBuilder` constructs the censored view
+  exposed to the agent, while a :class:`lab.outlet.protocols.Objective`
+  transforms :class:`lab.outlet.types.StepMetrics` into a scalar reward with an
+  optional breakdown per term.
+
+These components are orchestrated by :class:`lab.outlet.platform.Platform`,
+which manages internal hidden state, deterministic seeding, and logging.
+
+Component Matrix
+----------------
+
+===============================  ==============================================
+Layer                            Responsibilities / Examples
+===============================  ==============================================
+Mechanisms                       Quote normalisation, execution semantics
+                                 (`posted_price`, `two_sided`, `auction`).
+Population models                Arrivals (:mod:`lab.population.arrivals`),
+                                 execution probability models
+                                 (:mod:`lab.population.execution`), and
+                                 competitor or market dynamics
+                                 (:mod:`lab.population.competitors`).
+Position management              Inventory limits, replenishment, holding and
+                                 shortage costs (:mod:`lab.outlet.stock`).
+Observation & logging            Censored observations and optional event logs
+                                 (:mod:`lab.outlet.observation`).
+Objectives                       Reward composition utilities
+                                 (:mod:`lab.outlet.objectives`).
+Experiments                      Rollout helpers, baseline policies, off-policy
+                                 evaluation (:mod:`lab.experiments.eval`).
+===============================  ==============================================
+
+Preconfigured Platforms
+-----------------------
+
+Two high-level factories in :mod:`lab.config` wire common combinations of the
+building blocks:
+
+* **Retail dynamic pricing** – posted-price mechanism, session arrivals with
+  contamination, elasticity-based executions, reactive competitor model, and a
+  composite objective that penalises volatility, holding costs, and lost
+  opportunities.
+* **Market making** – two-sided quoting, Hawkes order flow, intensity-based
+  executions, geometric Brownian motion mid-prices, and an objective combining
+  PnL, spread capture, and quadratic inventory risk.
+
+State & Reset Behaviour
+-----------------------
+
+When you call :meth:`lab.outlet.platform.Platform.reset`, the platform resets
+instrument positions, quotes, and hidden state, but component implementations
+may maintain their own internal buffers. For reproducible experiments:
+
+* Reuse freshly instantiated arrival/market models per episode, or add explicit
+  ``reset`` methods if the model keeps history (for example,
+  :class:`lab.population.arrivals.HawkesArrivalModel` maintains an event
+  history, while :class:`lab.population.competitors.ReactiveCompetitorModel`
+  tracks prior competitor quotes).
+* Seed randomness through the factory configuration (``RetailConfig.seed`` or
+  ``MarketMakingConfig.seed``) or pass a seed to ``Platform.reset`` for
+  deterministic rollouts.
+
+Extending the Platform
+----------------------
+
+To support a new domain:
+
+1. Create custom Mechanism/Arrival/Execution/Market/Observation components by
+   implementing the respective protocol in :mod:`lab.outlet.protocols`.
+2. Compose a new objective with
+   :func:`lab.outlet.objectives.factory.make_composite` or write a bespoke
+   :class:`lab.outlet.objectives.base.BaseObjective`.
+3. Wire everything together via :class:`lab.outlet.platform.Platform` directly
+   or expose a helper factory in :mod:`lab.config`.
+
+Use :func:`lab.experiments.rollout` and
+:func:`lab.experiments.compare_policies` to benchmark candidate policies under
+multiple random seeds, collecting per-step logs for analysis or OPE.