Documentation menu

Technical reference

Simulator & Validation Reference

The deep companion to the methodology: how the simulator is built, the gate every scenario must pass, the historical bands we validate against, the character of each validated regime, and how assets move together in a crisis.

1 · What the simulator is

A regime-switching stochastic-volatility model — not a replay of history and not random noise. A market moves through four regimes — BULL · SIDEWAYS · BEAR · CRISIS — and each imposes a characteristic drift and volatility level. The path is not locked to one regime: it transitions between them along a matrix estimated from real index history (1999–2025), which reproduces the historical regime distribution (roughly 40 / 39 / 11 / 10 %), so the alternation of calm and stressed stretches matches how real markets behave.

Volatility clustering is produced by a persistent volatility process layered on that regime oscillation: a turbulent day raises the volatility state for the days that follow (an AR(1)-like memory, roughly a four-day half-life), so large moves arrive in clusters rather than spread evenly. This is the single most universal property of real returns, and it is the model’s load-bearing feature — the admissibility gate in §3 enforces it.

Per-asset character comes from calibrating each asset to its own price history — its volatility level, tail thickness, drawdown profile and regime mix — not from hand-set knobs. The per-asset targets are the reference bands in §4.

2 · Per-asset calibration & coverage

Every asset is calibrated to its own historical statistics, and a simulated path is admissible only when its measured properties fall inside that asset’s bands (§4) — not a generic default. Clustering, volatility and tail thickness differ markedly across assets, so a "low" clustering figure on gold or crypto can be asset-appropriate rather than a defect.

Coverage. The example strategy reports span seven assets — SPY, QQQ, GOLD, WTI, BTC, ETH (VIX is held back — see §9). The live portfolio-stress substrate covers four — SPY, TLT, GOLD, BTC — the set for which the cross-asset coupling (§6) is calibrated.

3 · The admissibility gate: volatility clustering

A synthetic regime is only a valid market test if it reproduces the most fundamental stylized fact: positive volatility clustering (the lag-5 autocorrelation of absolute returns, "acf5", per Cont, 2001). A scenario with zero or negative clustering is an anti-market — a strategy "optimised" against it is optimising against a simulation artifact. So acf5 is a hard gate: every deployable regime must clear it, against the asset’s own historical band (clustering is asset-specific — see §4).

This gate is load-bearing: clearing it is the difference between a plausible market and noise. Every regime in the catalog below passes it, measured against its own asset’s band.

4 · Per-asset reference bands

The historical targets we validate against — medians over rolling 252-day windows. Clustering is structurally lower on GOLD / BTC than on the equity indices, so a "low" acf5 there can be asset-appropriate, not a bug.

assetacf5vol (ann.)kurtosisskew
SPY+0.1060.1641.51−0.27
QQQ+0.1110.2001.51−0.24
GOLD+0.0570.1591.77−0.23
WTI+0.0930.3201.12−0.16
BTC+0.0620.5482.91−0.04

5 · The validated synthetic regime catalog

Thirty-two deployable regimes, validated at n = 50 replicas. The figures are the character of the regime itself — the market the strategy is tested in — not a strategy outcome or a forecast. ret is the median return over the stress window, acf5 the median clustering, vol annualised. All clear the gate (acf5 > 0).

s = sustained (the regime evolves over the full window) · f = sharp-event (the crash window is held, which is why acf5 is high by design). Harsh magnitudes (slow-crash, liquidity-stress) are intentionally adversarial / event-appropriate.

regimeassetretacf5volkind
slow_crash_no_recoverySPY−47%+0.1150.21s
slow_crash_no_recoveryQQQ−50%+0.0950.22s
slow_crash_no_recoveryGOLD−53%+0.0860.21s
slow_stagflationSPY−27%+0.0430.26s
slow_stagflationQQQ−20%+0.0600.27s
slow_declineSPY−15%+0.0760.18s
slow_declineQQQ−15%+0.0650.18s
slow_declineGOLD−24%+0.0910.18s
demand_destructionWTI−13%+0.0930.26s
liquidity_stressSPY−26%+0.0510.35s
liquidity_stressWTI−27%+0.0430.46s
liquidity_stressBTC−15%+0.0550.39s
vol_expansionSPY+10%+0.1630.24s
vol_expansionQQQ+6%+0.1620.27s
vol_expansionGOLD−1%+0.1590.27s
vol_expansionBTC+25%+0.1090.25s
whipsawSPY+2%+0.0500.17s
whipsawQQQ+5%+0.0350.15s
whipsawBTC+7%+0.0430.16s
whipsawETH+11%+0.0490.17s
low_vol_grindSPY+11%+0.0770.10s
low_vol_grindQQQ+14%+0.0680.11s
low_vol_grindGOLD+5%+0.0150.10s
low_vol_grindWTI−0%+0.0740.11s
hyperinflation_boostGOLD+29%+0.0420.17s
sharp_crashSPY−4%+0.3330.29f
sharp_crashQQQ−12%+0.3210.29f
sharp_crashBTC−5%+0.3060.29f
v_recoverySPY+7%+0.2330.23f
v_recoveryQQQ+11%+0.2550.23f
v_recoveryGOLD+11%+0.2080.22f
v_recoveryBTC−3%+0.2070.22f

6 · Cross-asset coupling & the stock-bond hedge

For portfolio stress, several assets are simulated jointly and coupled by a three-part factor channel: a regime-gated equity-risk factor that strengthens correlations as markets fall, a persistent duration factor shared by Treasuries and gold, and an episodic rate-shock factor. The result reproduces the property a portfolio stress test exists to surface: a stock-bond hedge that holds in calm markets can break when both legs fall together. The table compares each pair’s calm-to-crisis correlation shift, real history versus the coupled model.

paircalm (real / model)crisis (real / model)
SPY–TLT−0.16 / −0.04−0.42 / −0.42
SPY–BTC+0.08 / +0.07+0.45 / +0.30
TLT–GOLD+0.19 / +0.32+0.13 / +0.15

The 2022 hedge break is modelled the way it actually happened: not as a spike in day-to-day correlation, but as a shared drawdown — on a rate shock, equities and long Treasuries both lose ground together (a 60/40 takes roughly a −14 % worst-episode drawdown in that scenario, versus the −10 % a drawdown-stop would have held it to). That is the "60/40 loses both legs" failure a stress tool must show. Honest boundary: the model adds value in the regime-conditional correlation and the joint-crash tail dependence — single-asset daily marginals are at parity with cheap baselines, and the SPY–BTC crisis magnitude is undershot. The full numeric comparison against a Gauss-copula baseline is on the verifiability page.

7 · Backtest correctness

OHLC bars hide the intra-bar order of events: when price touches both a stop-loss and a take-profit within the same bar, the execution order is ambiguous. Many engines resolve that optimistically. This engine applies the methodology of Löw, Maier-Paape & Platen (2015), defaulting to worst-case execution (the unfavourable order is assumed). Best-case and ignore modes are available as explicit bounds. The conservative default yields lower-bound performance estimates rather than optimistic ones.

8 · How we validate

  • Stylized-fact validation. The model’s daily stylized facts are measured over 200 random 252-day windows of the simulated paths and compared, fact by fact, to bands derived identically from real history. SPY and TLT match all eighteen; GOLD and BTC match seventeen of eighteen (kurtosis below band — see §9).
  • Value vs a cheap baseline. The cross-asset model is compared numerically to a Gauss-copula: it adds regime-conditional correlation that strengthens in a crisis and joint-crash tail dependence the copula cannot — while single-asset marginals are at parity with cheap baselines (no unique edge claimed there). Stated, then checked, on /verifiability.
  • Anchored bands. Every claimed sign and magnitude is checked against the historical reference band at the point of use, so an out-of-band figure is caught before it propagates.
  • Pre-registration. Hypotheses and falsification thresholds are registered before a run, to prevent reading success into noise.
  • Statistical power. A single passing KS-test at small n is not "indistinguishable from historical". Magnitude is validated at the deploy sample size, not a cheap screening size — small samples are optimistically biased for high-variance regimes.
  • Claim–evidence alignment. "Solved" has to hold across the full shape, not one favourable descriptor.

9 · Documented limits

What the model does not yet do well:

Short-horizon shape

Daily lag-1 autocorrelation runs slightly higher than the historical reference (over-clustered at the shortest lag). A cross-scale refinement closes most of the gap; it is not yet in the shipped engine.

Tail thickness on GOLD / BTC

The synthetic daily tails for gold and crypto are thinner than the historical reference — kurtosis comes out below the asset band. Disclosed as a residual; it is the one stylized fact (of eighteen) those two assets miss.

Below-band clustering

A few regimes (liquidity-stress at extreme volatility, the calm grind profiles) pass the gate positive but below their asset’s typical clustering band. Disclosed; inherent to extreme-vol or calm-bull paths, not a defect.

Intraday structure

The engine is built for daily bars; intraday volatility is uniform (no open-spike seasonality). Irrelevant at daily resolution; multi-resolution would need a separate time-of-day lever.

Multi-episode crashes

A crash with a genuine relief rally followed by a second decline is not yet generated reliably, and is not part of the published scenario set.

VIX

VIX is not currently offered. The simulator cannot yet generate reliable synthetic volatility-index dynamics — its tail behaviour comes out too thin — so VIX is held back until a future iteration handles it.

10 · References

  • Hamilton, J. D. (1989) "A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle" Econometrica 57(2), 357–384The Markov regime-switching framework the simulator’s regime layer is built on.
  • Cont, R. (2001) "Empirical Properties of Asset Returns: Stylized Facts and Statistical Issues" Quantitative Finance 1(2), 223–236The empirical regularities (vol-clustering, fat tails, regime persistence) the synthetic paths are validated against.
  • Löw, R. W., Maier-Paape, S. & Platen, E. (2015) "Correctness of Backtest Engines" arXiv:1509.08248Handling ambiguous intra-bar events; the engine implements their worst-case execution model.

Operated under German jurisdiction (BaFin / WpHG framework). Model-based scenario simulation — descriptive, not advisory; scenario-based, not a forecast. Not investment advice.