Technical reference
Simulator & Validation Reference
The deep companion to the methodology: how the simulator is built, the gate every scenario must pass, the historical bands we validate against, the character of each validated regime, and how assets move together in a crisis.
1 · What the simulator is
A regime-switching stochastic-volatility model — not a replay of history and not random noise. A market moves through four regimes — BULL · SIDEWAYS · BEAR · CRISIS — and each imposes a characteristic drift and volatility level. The path is not locked to one regime: it transitions between them along a matrix estimated from real index history (1999–2025), which reproduces the historical regime distribution (roughly 40 / 39 / 11 / 10 %), so the alternation of calm and stressed stretches matches how real markets behave.
Volatility clustering is produced by a persistent volatility process layered on that regime oscillation: a turbulent day raises the volatility state for the days that follow (an AR(1)-like memory, roughly a four-day half-life), so large moves arrive in clusters rather than spread evenly. This is the single most universal property of real returns, and it is the model’s load-bearing feature — the admissibility gate in §3 enforces it.
Per-asset character comes from calibrating each asset to its own price history — its volatility level, tail thickness, drawdown profile and regime mix — not from hand-set knobs. The per-asset targets are the reference bands in §4.
2 · Per-asset calibration & coverage
Every asset is calibrated to its own historical statistics, and a simulated path is admissible only when its measured properties fall inside that asset’s bands (§4) — not a generic default. Clustering, volatility and tail thickness differ markedly across assets, so a "low" clustering figure on gold or crypto can be asset-appropriate rather than a defect.
Coverage. The example strategy reports span seven assets — SPY, QQQ, GOLD, WTI, BTC, ETH (VIX is held back — see §9). The live portfolio-stress substrate covers four — SPY, TLT, GOLD, BTC — the set for which the cross-asset coupling (§6) is calibrated.
3 · The admissibility gate: volatility clustering
A synthetic regime is only a valid market test if it reproduces the most fundamental stylized fact: positive volatility clustering (the lag-5 autocorrelation of absolute returns, "acf5", per Cont, 2001). A scenario with zero or negative clustering is an anti-market — a strategy "optimised" against it is optimising against a simulation artifact. So acf5 is a hard gate: every deployable regime must clear it, against the asset’s own historical band (clustering is asset-specific — see §4).
This gate is load-bearing: clearing it is the difference between a plausible market and noise. Every regime in the catalog below passes it, measured against its own asset’s band.
4 · Per-asset reference bands
The historical targets we validate against — medians over rolling 252-day windows. Clustering is structurally lower on GOLD / BTC than on the equity indices, so a "low" acf5 there can be asset-appropriate, not a bug.
| asset | acf5 | vol (ann.) | kurtosis | skew |
|---|---|---|---|---|
| SPY | +0.106 | 0.164 | 1.51 | −0.27 |
| QQQ | +0.111 | 0.200 | 1.51 | −0.24 |
| GOLD | +0.057 | 0.159 | 1.77 | −0.23 |
| WTI | +0.093 | 0.320 | 1.12 | −0.16 |
| BTC | +0.062 | 0.548 | 2.91 | −0.04 |
5 · The validated synthetic regime catalog
Thirty-two deployable regimes, validated at n = 50 replicas. The figures are the character of the regime itself — the market the strategy is tested in — not a strategy outcome or a forecast. ret is the median return over the stress window, acf5 the median clustering, vol annualised. All clear the gate (acf5 > 0).
s = sustained (the regime evolves over the full window) · f = sharp-event (the crash window is held, which is why acf5 is high by design). Harsh magnitudes (slow-crash, liquidity-stress) are intentionally adversarial / event-appropriate.
| regime | asset | ret | acf5 | vol | kind |
|---|---|---|---|---|---|
| slow_crash_no_recovery | SPY | −47% | +0.115 | 0.21 | s |
| slow_crash_no_recovery | QQQ | −50% | +0.095 | 0.22 | s |
| slow_crash_no_recovery | GOLD | −53% | +0.086 | 0.21 | s |
| slow_stagflation | SPY | −27% | +0.043 | 0.26 | s |
| slow_stagflation | QQQ | −20% | +0.060 | 0.27 | s |
| slow_decline | SPY | −15% | +0.076 | 0.18 | s |
| slow_decline | QQQ | −15% | +0.065 | 0.18 | s |
| slow_decline | GOLD | −24% | +0.091 | 0.18 | s |
| demand_destruction | WTI | −13% | +0.093 | 0.26 | s |
| liquidity_stress | SPY | −26% | +0.051 | 0.35 | s |
| liquidity_stress | WTI | −27% | +0.043 | 0.46 | s |
| liquidity_stress | BTC | −15% | +0.055 | 0.39 | s |
| vol_expansion | SPY | +10% | +0.163 | 0.24 | s |
| vol_expansion | QQQ | +6% | +0.162 | 0.27 | s |
| vol_expansion | GOLD | −1% | +0.159 | 0.27 | s |
| vol_expansion | BTC | +25% | +0.109 | 0.25 | s |
| whipsaw | SPY | +2% | +0.050 | 0.17 | s |
| whipsaw | QQQ | +5% | +0.035 | 0.15 | s |
| whipsaw | BTC | +7% | +0.043 | 0.16 | s |
| whipsaw | ETH | +11% | +0.049 | 0.17 | s |
| low_vol_grind | SPY | +11% | +0.077 | 0.10 | s |
| low_vol_grind | QQQ | +14% | +0.068 | 0.11 | s |
| low_vol_grind | GOLD | +5% | +0.015 | 0.10 | s |
| low_vol_grind | WTI | −0% | +0.074 | 0.11 | s |
| hyperinflation_boost | GOLD | +29% | +0.042 | 0.17 | s |
| sharp_crash | SPY | −4% | +0.333 | 0.29 | f |
| sharp_crash | QQQ | −12% | +0.321 | 0.29 | f |
| sharp_crash | BTC | −5% | +0.306 | 0.29 | f |
| v_recovery | SPY | +7% | +0.233 | 0.23 | f |
| v_recovery | QQQ | +11% | +0.255 | 0.23 | f |
| v_recovery | GOLD | +11% | +0.208 | 0.22 | f |
| v_recovery | BTC | −3% | +0.207 | 0.22 | f |
6 · Cross-asset coupling & the stock-bond hedge
For portfolio stress, several assets are simulated jointly and coupled by a three-part factor channel: a regime-gated equity-risk factor that strengthens correlations as markets fall, a persistent duration factor shared by Treasuries and gold, and an episodic rate-shock factor. The result reproduces the property a portfolio stress test exists to surface: a stock-bond hedge that holds in calm markets can break when both legs fall together. The table compares each pair’s calm-to-crisis correlation shift, real history versus the coupled model.
| pair | calm (real / model) | crisis (real / model) |
|---|---|---|
| SPY–TLT | −0.16 / −0.04 | −0.42 / −0.42 |
| SPY–BTC | +0.08 / +0.07 | +0.45 / +0.30 |
| TLT–GOLD | +0.19 / +0.32 | +0.13 / +0.15 |
The 2022 hedge break is modelled the way it actually happened: not as a spike in day-to-day correlation, but as a shared drawdown — on a rate shock, equities and long Treasuries both lose ground together (a 60/40 takes roughly a −14 % worst-episode drawdown in that scenario, versus the −10 % a drawdown-stop would have held it to). That is the "60/40 loses both legs" failure a stress tool must show. Honest boundary: the model adds value in the regime-conditional correlation and the joint-crash tail dependence — single-asset daily marginals are at parity with cheap baselines, and the SPY–BTC crisis magnitude is undershot. The full numeric comparison against a Gauss-copula baseline is on the verifiability page.
7 · Backtest correctness
OHLC bars hide the intra-bar order of events: when price touches both a stop-loss and a take-profit within the same bar, the execution order is ambiguous. Many engines resolve that optimistically. This engine applies the methodology of Löw, Maier-Paape & Platen (2015), defaulting to worst-case execution (the unfavourable order is assumed). Best-case and ignore modes are available as explicit bounds. The conservative default yields lower-bound performance estimates rather than optimistic ones.
8 · How we validate
- Stylized-fact validation. The model’s daily stylized facts are measured over 200 random 252-day windows of the simulated paths and compared, fact by fact, to bands derived identically from real history. SPY and TLT match all eighteen; GOLD and BTC match seventeen of eighteen (kurtosis below band — see §9).
- Value vs a cheap baseline. The cross-asset model is compared numerically to a Gauss-copula: it adds regime-conditional correlation that strengthens in a crisis and joint-crash tail dependence the copula cannot — while single-asset marginals are at parity with cheap baselines (no unique edge claimed there). Stated, then checked, on /verifiability.
- Anchored bands. Every claimed sign and magnitude is checked against the historical reference band at the point of use, so an out-of-band figure is caught before it propagates.
- Pre-registration. Hypotheses and falsification thresholds are registered before a run, to prevent reading success into noise.
- Statistical power. A single passing KS-test at small n is not "indistinguishable from historical". Magnitude is validated at the deploy sample size, not a cheap screening size — small samples are optimistically biased for high-variance regimes.
- Claim–evidence alignment. "Solved" has to hold across the full shape, not one favourable descriptor.
9 · Documented limits
What the model does not yet do well:
Short-horizon shape
Daily lag-1 autocorrelation runs slightly higher than the historical reference (over-clustered at the shortest lag). A cross-scale refinement closes most of the gap; it is not yet in the shipped engine.
Tail thickness on GOLD / BTC
The synthetic daily tails for gold and crypto are thinner than the historical reference — kurtosis comes out below the asset band. Disclosed as a residual; it is the one stylized fact (of eighteen) those two assets miss.
Below-band clustering
A few regimes (liquidity-stress at extreme volatility, the calm grind profiles) pass the gate positive but below their asset’s typical clustering band. Disclosed; inherent to extreme-vol or calm-bull paths, not a defect.
Intraday structure
The engine is built for daily bars; intraday volatility is uniform (no open-spike seasonality). Irrelevant at daily resolution; multi-resolution would need a separate time-of-day lever.
Multi-episode crashes
A crash with a genuine relief rally followed by a second decline is not yet generated reliably, and is not part of the published scenario set.
VIX
VIX is not currently offered. The simulator cannot yet generate reliable synthetic volatility-index dynamics — its tail behaviour comes out too thin — so VIX is held back until a future iteration handles it.
10 · References
- Hamilton, J. D. (1989) "A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle" Econometrica 57(2), 357–384 — The Markov regime-switching framework the simulator’s regime layer is built on.
- Cont, R. (2001) "Empirical Properties of Asset Returns: Stylized Facts and Statistical Issues" Quantitative Finance 1(2), 223–236 — The empirical regularities (vol-clustering, fat tails, regime persistence) the synthetic paths are validated against.
- Löw, R. W., Maier-Paape, S. & Platen, E. (2015) "Correctness of Backtest Engines" arXiv:1509.08248 — Handling ambiguous intra-bar events; the engine implements their worst-case execution model.
Operated under German jurisdiction (BaFin / WpHG framework). Model-based scenario simulation — descriptive, not advisory; scenario-based, not a forecast. Not investment advice.