Scientific Methodology
Built on published academic foundations underlying our market simulator, backtest-engine correctness, and stress-case framework.
Core Approach: Targeted Stress Testing
Traditional backtesting produces a single result: "This strategy returned +47%." That result is conditional on one specific sequence of market events that will never repeat exactly — and tells you nothing about how the strategy would behave when the market does something different.
Our approach runs the strategy against a curated catalog of stress cases: real historical market slices (e.g. Lehman 2008, COVID 2020, Luna 2022) combined with targeted synthetic probes for failure modes that history rarely produces cleanly (sustained whipsaw, low-vol grind, slow-decline-without-rebound). Each case isolates a specific way a strategy can break.
"The point is not to predict the future — it is to find out how the strategy fails before the market does."
Hybrid Field-Agent Simulator (Synthetic Probes)
For the synthetic stress probes — and only for those — we use an agent-based market simulator. The goal is controllable structural realism: we want data that behaves like a market (microstructure, herding, regime persistence) while letting us dial in specific failure-mode parameters that real history did not deliver cleanly.
Belief Field Layer
A continuous field representing market-wide sentiment, evolving via diffusion dynamics. Drives the smooth, persistent regime structure that distinguishes real markets from random walks.
Observer Layer
Heterogeneous agents (asset managers, trend followers, dealers, mean reverters, retail) observe the field and submit orders. Their relative weights and net biases come from real CoT data — see next section.
This architecture is a test generator, not a market replica. It produces structurally plausible price action that we can shape to match a specific stress scenario (e.g. "a 6-month bear without a rebound spike"). The realism check is the case catalog itself — empirical anchors carry the actual realism, the simulator fills the gaps.
State Initialization Regime
Each synthetic stress case is preceded by a 250-day antecedent pre-period, followed by a short 5–10 day transition window into the stress regime. The pre-period exists so that long-lookback indicators (SMA-200, EMA-200, ATR-200, etc.) have stable values when the stress phase begins. Performance attribution starts only after the transition; the pre-period itself is excluded from trade counts, returns, and drawdowns.
The antecedent regime is not the stress regime. It is drawn deterministically from the simulation seed, using an asset-class-typical distribution chosen so that indicators enter the stress phase from a representative, rather than artificial, prior state.
| Asset class | Antecedent distribution |
|---|---|
| Equity (SPY, QQQ) | 40% bull-trend · 35% sideways-low-vol · 25% weak-bear |
| Volatility (VIX) | 45% suppressed-vol · 40% elevated-transition · 15% panic-decay |
| Crypto (BTC, ETH) | 35% parabolic-bull · 35% high-vol-sideways · 30% unstable-decay |
| Metal (GOLD) | 50% range · 30% inflationary-upcycle · 20% demand-weakness |
| Energy (WTI) | 50% range-high-vol · 30% demand-weakness · 20% inflationary-upcycle |
Calibration of the antecedent regimes was validated against 750 simulated replicas (50 per profile-asset combination). Realized pre-period statistics fall within the specified target bands for 7 of 10 antecedent types; the remaining three —high_vol_sideways,parabolic_bull, andunstable_decay, all crypto regimes — show systematically lower micro-volatility than their target bands. This is a known limit of the current observer calibration for crypto markets and is reflected in the simulator code as an explicit disclosure.
Antecedent regimes are deterministically assigned per simulation seed, ensuring reproducibility across runs. The chosen distributions are a modeling assumption: results may depend on which antecedent regime is drawn for a given case, and the distribution itself reflects an assumption about typical pre-stress conditions for each asset class. This is a methodological choice that should be considered when interpreting results.
Empirical historical cases carry their natural pre-period from the surrounding real market data and do not require this construction.
CoT-Calibrated Agents
The agents in our simulator are not arbitrary. Their relative weights and net positioning biases are derived from real CFTC Commitment of Traders (CoT) reports — weekly disclosures of how each trader category (asset managers, hedge funds, dealers, retail, etc.) is positioned in major futures markets.
Each of the seven supported assets has its own calibration, drawn from the corresponding CoT report type and historical depth:
| Asset | CoT Report Type | Weekly Reports | Period |
|---|---|---|---|
| SPY | TFF (Financial Futures) | 342 | 2010–2016 |
| QQQ | TFF (Financial Futures) | 342 | 2010–2016 |
| VIX | TFF (Financial Futures) | 510 | 2006–2016 |
| GOLD | DCOT (Disaggregated) | 551 | 2006–2016 |
| WTI | DCOT (Disaggregated) | 551 | 2006–2016 |
| BTC | TFF (Financial Futures) | 420 | 2017–2026 |
| ETH | TFF (Financial Futures) | 264 | 2021–2026 |
| Total | 2,980 |
TFF — Traders in Financial Futures
CoT report variant for financial markets (equity indices, FX, crypto futures). Categories: Dealer Intermediaries, Asset Managers, Leveraged Money, Other Reportables, Non-Reportables.
DCOT — Disaggregated CoT
CoT report variant for commodities (metals, energy, agriculture). Categories: Producers/Merchants, Swap Dealers, Managed Money, Other Reportables, Non-Reportables.
This means every simulated agent's tendency to go long or short is calibrated against what the corresponding real-market participants actually did, averaged over hundreds of weekly snapshots. A simulated “Producer/Merchant” agent in the GOLD market carries the empirical net-short hedging pressure observed across 551 weeks of COMEX positioning data — not an arbitrary assumption.
Source: CFTC Commitments of Traders Historical Compressed reports. cftc.gov/MarketReports/CommitmentsofTraders →
Note: BTC futures launched on CME in December 2017; ETH cash-settled futures in February 2021 — those calibration windows start from contract inception. The CFTC TFF report format was published from June 2010 with backfilled coverage to 2006 for selected markets including VIX; SPY and QQQ data are available from 2010 onward. GOLD and WTI use the DCOT (Disaggregated CoT) report, which spans the full 2006–2016 window.
Failure-Mode Taxonomy & Case Catalog
Strategies don't fail the same way in every bad market — they have specific vulnerabilities. We organize stress tests around nine orthogonal failure-mode axes, each isolating one class of strategy weakness, plus V-Recovery as a diagnostic path pattern (composite of SHARP_CRASH down-leg and TREND_UP recovery — not a separate score bucket since 2026-05):
Click any mode to jump to its detailed definition below.
Each case is tagged with the failure mode(s) it stresses. A single case can carry multiple tags — Lehman 2008 contributes to TREND_DOWN, SHARP_CRASH and LIQUIDITY_STRESS simultaneously, because that historical episode genuinely combined all three. The robustness score aggregates strategy performance across the entire catalog.
Detailed Failure-Mode Definitions
Each of the 10 modes is defined with operational thresholds (verified against our case catalog), historical examples, common strategy failure patterns, strategy archetypes, and distinctions from similar regimes. The Confidence tag in each header reflects the depth of empirical anchoring in our catalog.
Trend Up
TREND_UPTrend Up describes a market regime characterized by persistent positive directional movement with shallow drawdowns over extended windows.
- Total return: typically +8% to +25% over 4–6 months (annualized roughly +15% to +50%)
- Maximum drawdown: shallow, typically less than 10%, almost never exceeding 15%
- Realized annualized volatility: 8–25% for equities and commodities; crypto runs significantly higher (60–100%+)
- Persistence: at least 3 trading months of dominant directional movement, no decisive mid-case reversal
- SPY Bull Run 2017 H1 — +7.5%, max DD −3%
- QQQ Tech Rally 2017 — +13.9%, max DD −5%
- Gold Post-GFC Inflation Hedge 2010 — +7.9%, max DD −7.8%
- WTI Pre-GFC Oil Boom — +23.3%, max DD −10.6%
- BTC 2017 Parabolic ATH — +216%, max DD −34.9% (crypto outlier, broadens upper bound)
- Mean-reversion entries fading against persistent direction
- Counter-trend stops triggered repeatedly
- Cash-allocation systems lagging benchmark performance
- Volatility-targeting underweighting during low-vol bull runs
- Short-bias structures bleeding continuously
- Long-bias trend-following systems
- Momentum allocation models
- Buy-and-hold equity exposure
- Short-bias structures
- Naive mean-reversion against trend
- Volatility-shorting in directionally ambiguous setups
Archetype groupings reflect academic and industry observation — case-specific results are reported on individual strategy pages.
- vs. V-Recovery: Trend Up starts from neutral ground; V-Recovery starts after a drawdown of 15% or more.
- vs. Vol Compression: Vol Compression is about low realized volatility regardless of direction; Trend Up is about direction regardless of volatility.
Trend Down
TREND_DOWNTrend Down describes a market regime characterized by persistent negative directional movement that sustains or deepens across the case window without full recovery.
- Total return: typically −20% to −60% over 4–6 months (annualized roughly −40% to −95%)
- Maximum drawdown: −25% to −70%, with median around −50%
- Realized annualized volatility: elevated, typically 25–90%
- Persistence: dominant downward trajectory across the full window — drawdown deepens or sustains, no full V-recovery
- Gold Taper Tantrum 2013 — −19.4%, max DD −25%
- SPY Lehman GFC 2008 — −36.4%, max DD −42%
- WTI Saudi Oil War 2014 — −48%, max DD −53%
- BTC Crypto Winter 2018 — −52.5%, max DD −66%
- QQQ Dotcom Crash 2000 — −58.5%, max DD −67%
- Buy-the-dip systems re-entering into deepening drawdowns
- Long-only momentum systems failing to exit fast enough
- Dollar-cost-averaging extending drawdown duration without recovery
- Stop-loss systems confirming exits below average entry price
- Long-equity allocation systems persistently underperforming cash
- Trend-following systems with short-side capability
- Defensive allocation rotations
- Cash-heavy or low-net-exposure systems
- Long-only buy-the-dip mean-reversion
- Leveraged long-equity positioning
- Bottom-fishing oscillator strategies
Archetype groupings reflect academic and industry observation — case-specific results are reported on individual strategy pages.
- vs. Slow Bear: Trend Down may include sharp acceleration phases (e.g., late stages of Dotcom); Slow Bear specifically excludes them — it is the grinding-down version.
- vs. Sharp Crash: Trend Down persists across months; Sharp Crash is concentrated in days to weeks.
Sideways
SIDEWAYSSideways describes a market regime characterized by extended range-bound oscillation without decisive directional resolution.
- Total return: roughly −10% to +25% (around zero with mild dispersion)
- Maximum drawdown: typically −10% to −20% — pullbacks happen within the range
- Realized annualized volatility: moderate, 13–30%
- Persistence: at least 4 months without decisive directional resolution; price oscillates within bounded range
- Direction-to-vol ratio: net direction must be small relative to volatility — direction dominated by noise
- SPY China Sideways 2015 H2 — −7.7%, max DD −12%
- BTC Sideways 2023 — −1.6%, max DD −20%
- Gold Sideways 2014 — +8.2%, max DD −10%
- WTI Sideways 2018 — +23.9%, max DD −11% (upper-bound case, mild trend mixed in)
- Trend-following systems generating frequent false breakouts
- Wide stops repeatedly triggered around range edges
- Time-decay eroding options-based directional structures
- Trailing-stop momentum systems exiting at range floors
- Breakout-confirmation systems whipsawed in both directions
- Mean-reversion oscillator systems
- Range-trading and channel-bound structures
- Options-selling within bounded ranges
- Breakout systems
- Fast-trend-following without regime filter
- Momentum-confirmation strategies
Archetype groupings reflect academic and industry observation — case-specific results are reported on individual strategy pages.
- vs. Whipsaw: Sideways captures the regime shape (range-bound); Whipsaw emphasizes the signal-noise problem for trend-followers (frequent reversals through any signal threshold).
- vs. Vol Compression: Vol Compression requires visibly suppressed realized volatility; Sideways allows moderate vol within the range.
Vol Expansion
VOL_EXPANSIONVol Expansion describes a market regime characterized by sustained elevation of realized volatility relative to the asset's recent baseline, decoupled from directional bias.
- Gating (required): Median realized volatility over the case window ≥ 1.5× the asset's normal-window vol
- Gating (required): Persistence — at least 2 distinct (non-overlapping) 30-day windows with vol ≥ 1.5× baseline; differentiates VOL_EXPANSION from a single concentrated SHARP_CRASH window
- Descriptive (not gating): peak rolling-30-day vol ≥ 3× baseline count; vol-of-vol level; max-DD-window characteristics — reported alongside, used for sub-classification ("sustained-elevated" vs "sustained-with-spikes") but not for FM-tag determination
- Direction: decoupled from regime — return can be sharply negative, sharply positive, or near zero
- VIX-specific: observable as VIX index level above its 90th percentile (separate criterion from realized-vol-of-VIX-returns)
Definition history: an earlier version required 2–5× baseline as a 6-month median plus ≥3 distinct windows with peaks ≥3×. Systematic anchor validation (2026-05) revealed that conjunction effectively required Lehman-class duration AND COVID-class peaks combined — rare even among real episodes. The current version separates the persistent-state core (gating) from spike behaviour (descriptive). Spike-character regimes are conceptually a separate axis (VOL_INSTABILITY, planned future failure-mode) rather than a sub-criterion of VOL_EXPANSION. Anchor re-tagging (Volmageddon → SHARP_CRASH; BTC ATH 2017 / DeFi Summer Bull → TREND_UP only) corrected mis-classifications where event-character did not match case-window measurement.
- SPY Vol Shock Feb 2018 — realized vol 17%
- QQQ Vol Shock Feb 2018 — realized vol 21%
- QQQ Tech Bear 2022 — realized vol 34%
- ETH Vol Shock 2018 — realized vol 105%
- BTC 2017 Parabolic ATH — realized vol 92%
- VIX Volmageddon 2018 — realized vol 289%
- VIX COVID 2020 — realized vol 187%
- Fixed-stop systems gapped through unfavorable prints
- Position-sizing models underestimating realized risk
- Mean-reversion entries fading against expanding moves
- Short-volatility carry structures facing uncapped losses
- Risk-parity allocations rebalancing into concentrated positions
- Long-volatility structures and tail-hedge overlays
- Volatility-targeted position sizing
- Low-leverage allocation models
- Short-volatility carry structures
- Fixed-leverage models
- Naive mean-reversion at fixed thresholds
Archetype groupings reflect academic and industry observation — case-specific results are reported on individual strategy pages.
- vs. Sharp Crash: Sharp Crash is direction (down) plus speed; Vol Expansion is direction-agnostic.
- vs. Whipsaw: Whipsaw emphasizes oscillation around levels; Vol Expansion emphasizes magnitude regardless of pattern.
Vol Compression
VOL_COMPRESSIONVol Compression describes a market regime characterized by sustained suppression of realized volatility well below the asset's recent baseline.
- Realized annualized volatility: 0.4–0.7× the asset's baseline (equities 7–12% vs. baseline 16%; gold 8–15%; crypto 30–50% vs. baseline 80%+)
- Persistence: typically >2 months of sustained low realized vol; brief calm windows do not qualify
- Direction: can be slightly positive, neutral, or VIX-specific (low VIX is the regime)
- Drawdowns: shallow, typically <8% for non-VIX assets
- SPY Bull Run 2017 H1 — realized vol 7.1%, max DD −3%
- QQQ Tech Rally 2017 — realized vol 10.1%, max DD −5%
- VIX Low Vol 2017 — VIX index level persistently below 12 (regime defined by the index level itself, not by the realized volatility of VIX returns — which remains structurally high)
Only two robust empirical anchors for non-VIX assets (SPY and QQQ 2017); the regime is otherwise inferred from synthetic profiles. Bullish low-vol periods cluster naturally — this category overlaps with Trend Up by construction.
- Breakout strategies generating few signals, mostly false
- Time-stop systems exiting before signal materializes
- Short-volatility systems accumulating without warning of regime change
- Momentum signals weak relative to noise floor
- Trend-followers underperforming passive allocation
- Short-volatility carry structures
- Slow-trend systems with volatility filters
- Allocation-based portfolios with rebalancing
- Breakout systems without regime filter
- Fast-trend-following with tight stops
- Long-volatility / options-buying structures
Archetype groupings reflect academic and industry observation — case-specific results are reported on individual strategy pages.
- vs. Sideways: Sideways allows moderate volatility within the range; Vol Compression requires visibly suppressed vol.
- vs. Trend Up: Trend Up is about return direction; many Trend Up regimes are also Vol Compression (low-vol bull) but not all.
Sharp Crash
SHARP_CRASHSharp Crash describes a market regime characterized by accelerated downside dislocation over compressed trading windows.
- Gating (required): Peak-to-trough drawdown ≥ 20% in any rolling 30-day window. DD computed as max over all overlapping 30-day windows. Multiple ≥20% windows within 60 days consolidate into a single crash event.
- Gating (required): Realized volatility during the crash window ≥ 1.5× the asset's normal-window vol (asset-agnostic).
- Sub-classification (descriptive, not gating): "tail-intensified" (excess kurtosis ≥ 1.5 OR daily-return 1st percentile ≤ −3%) versus "broad-distribution" (otherwise).
- Reported alongside: vol percentile against asset's own rolling-vol distribution; min-window DD (5–10 days) for flash-event detection.
- Near-crash band: 15–20% rolling-30d DD is reported as "near-crash dynamics observed" without the SHARP_CRASH tag.
- Recovery within the case window is permitted — SHARP_CRASH classifies the dislocation event itself, not the post-event trajectory.
The definition captures observable price dynamics. Underlying microstructure effects (liquidity, gap-structure, order-book stress) are not directly modeled here — see LIQUIDITY_STRESS for that dimension. Anchor validation (Lehman 2008 rolling-30d DD −32%, COVID 2020 −34%, Luna 2022 −40%, FTX 2022 −26%) confirmed all four listed historical SHARP_CRASH anchors satisfy the ≥20% DD-component threshold with margin. Note: while the DD component is met by all four, the full gating (DD ≥20% AND crash-window vol ≥1.5× baseline) is asset-asymmetric — for high-baseline assets like BTC the vol component does not always activate even in clear stress events (e.g. Luna 2022 crash-window vol 0.92×, FTX 2022 0.85×). See /verifiability for the empirical-identifiability table. Definition revised 2026-05.
- Lehman Brothers collapse — Sep–Oct 2008
- COVID-19 onset — Feb–Mar 2020
- Luna / UST collapse — May 2022
- FTX collapse — Nov 2022 (secondary, exchange-failure shock)
- Signal lag exceeding downside acceleration
- Oversold re-entry during unresolved downside continuation
- Liquidity-gap stop execution at unfavorable prints
- Fixed volatility thresholds destabilized by expanding realized volatility
- Position-sizing assumptions invalidated by realized risk
- Long-horizon trend-following systems
- Volatility-scaled allocation models
- Reduced-exposure tactical systems
- Fixed-threshold mean-reversion systems
- High-leverage volatility compression strategies
- Short-volatility carry structures
Archetype groupings reflect academic and industry observation — case-specific results are reported on individual strategy pages.
- vs. Slow Bear: Sharp Crash compresses the drawdown into days or weeks; Slow Bear extends comparable losses over months without the same acceleration.
- vs. Vol Expansion: Vol Expansion can be directionally neutral; Sharp Crash combines directional dislocation with vol expansion.
- vs. Liquidity Stress: Liquidity Stress can occur without large directional moves; Sharp Crash usually carries liquidity stress as a co-symptom.
Slow Bear
SLOW_BEARSlow Bear describes a market regime characterized by gradually accumulating downside losses without a single concentrated crash event.
- Total return: typically −25% to −60% over ~6 months (annualized roughly −50% to −95%)
- Maximum drawdown: −25% to −70%, accumulating gradually rather than concentrated
- Realized annualized volatility: 20–90% — usually elevated but without sharp single-day shocks dominating
- Persistence: drawdown sustains or deepens over 3+ months without recovery
- Distinguishing trait: drawdown growth is monotonic-ish, not punctuated by a single crash event
- SPY Fed-Bear 2022 H2 — 0% return, max DD −17% (slow profile)
- QQQ Tech Bear 2022 — −29.8%, max DD −32%
- WTI Saudi Oil War 2014 — −48%, max DD −53%
- BTC Crypto Winter 2018 — −52.5%, max DD −66%
- QQQ Dotcom Crash 2000 — −58.5%, max DD −67% (also Trend Down + Sharp Crash due to severity)
- Long-equity allocation underperforming cash for extended periods
- Trend-followers cycling in/out with each minor rally
- Mean-reversion bottom-fishing entries repeatedly stopping out
- Drawdown psychology favoring capitulation near terminal lows
- Volatility-targeting under-allocating during deep drawdowns when vol stays moderate
- Trend-following systems with short-side capability
- Defensive sector-rotation strategies
- Volatility-targeted allocation with cash buffer
- Long-only equity exposure
- Dollar-cost-averaging without exit logic
- Mean-reversion bottom-fishing
Archetype groupings reflect academic and industry observation — case-specific results are reported on individual strategy pages.
- vs. Trend Down: Trend Down may include sharp acceleration phases (e.g., late stages of Dotcom); Slow Bear specifically excludes them.
- vs. Sharp Crash: Sharp Crash concentrates loss in days to weeks; Slow Bear distributes loss across 3+ months.
V-Recovery
V_RECOVERYStatus: Diagnostic Path Pattern (not a Failure-Mode bucket)
As of 2026-05, V-Recovery is treated as a composite path pattern, not a Failure-Mode score bucket. The V-shape combines two orthogonal phases: strategies actually fail under the down-leg (which scores under SHARP_CRASH) and miss the up-leg (which scores under TREND_UP). Decomposing the V into its constituent FMs avoids the bimodal-classification problem that intrinsic-conditional outcomes produce. The V-Recovery label is retained as descriptive annotation on cases that exhibit the path pattern; for score aggregation, replicas are classified into SHARP_CRASH + TREND_UP buckets per the empirical phase-decomposition.
V-Recovery describes a path pattern characterized by a meaningful drawdown immediately followed by a rapid retracement to near or above pre-drawdown levels.
- Maximum drawdown during the case: −15% to −35% (peak-to-trough)
- Recovery from low: at least 80% retracement of the drawdown by case-end (price returns to within ~5% of pre-crash peak)
- Time-from-low to substantial recovery: typically 2–4 months (sharp rebound, not slow)
- Final case-window return: typically flat to +25% (the V-shape lands somewhere near or above starting price)
- Realized annualized volatility: 20–50% during the V
- SPY COVID Crash 2020 — +0.7%, max DD −34%
- QQQ COVID + Tech V-Recovery — +19.5%, max DD −29%
- Gold COVID + ATH 2020 — +25%, max DD −12.5%
All three anchors are COVID 2020 across asset classes — this limits regime-diversity validation.
- Trend-following systems exiting near lows, missing the recovery rally
- Drawdown-based de-risking reducing exposure before snap-back
- Cash-on-the-sidelines systems waiting for confirmation that arrives late
- Risk-parity rebalancing dampening recovery participation
- Stop-loss execution at terminal lows preceding the rebound
- Persistent long-equity exposure
- Regime-agnostic strategic allocation
- Simple buy-and-hold
- Drawdown-based de-risking systems
- Trend-following exits without re-entry logic
- Cash-on-the-sidelines systems with late confirmation
Archetype groupings reflect academic and industry observation — case-specific results are reported on individual strategy pages.
- vs. Sharp Crash: Sharp Crash is the down-leg only; V-Recovery is the down-leg plus the rapid up-leg as a single regime.
- vs. Trend Up: V-Recovery starts after a meaningful drawdown; Trend Up starts from neutral ground.
Whipsaw
WHIPSAWWhipsaw describes a market regime characterized by repeated directional reversals at signal-relevant levels, generating false confirmation signals for trend-following systems.
- Multiple directional reversals: at least 3–5 sign-changes of >5% magnitude over 3–6 months
- Net case return: near zero, typically ±10% (no decisive direction)
- Maximum drawdown: typically −10% to −20% — pullbacks happen but recover
- Realized annualized volatility: moderate, 15–30%
- Distinguishing trait: false signals — strategies dependent on directional confirmation are stopped out repeatedly
- SPY China Sideways 2015 H2 — −7.7%, max DD −12%
- BTC Sideways 2023 — −1.6%, max DD −20%
- Gold Sideways 2014 — +8.2%, max DD −10%
- WTI Sideways 2018 — +23.9%, max DD −11%
Whipsaw shares its anchor set with Sideways by design — these modes are not differentiated by separate cases but by perspective: Sideways describes the regime shape, Whipsaw describes the failure of trend-following signals within that shape. Sign-change counts per case are not currently measured in our pipeline.
- Moving-average crossovers generating repeated false signals
- Breakout-confirmation systems stopped at range boundaries
- Trailing-stops triggered at noise levels
- Mean-reversion at extremes works, but with high signal cost
- Time-stop systems exiting just before signal materializes
- Long-horizon allocation strategies
- Options-selling within bounded ranges
- Regime-filtered trend-followers
- Short-horizon trend-following
- Breakout-confirmation systems
- Trailing-stop momentum
Archetype groupings reflect academic and industry observation — case-specific results are reported on individual strategy pages.
- vs. Sideways: Sideways emphasizes the bounded range; Whipsaw emphasizes the failure of trend-following signals within that range.
- vs. Vol Expansion: Whipsaw has moderate volatility with reversals; Vol Expansion can be uni-directional with high volatility.
Liquidity Stress
LIQUIDITY_STRESSLiquidity Stress describes a market regime characterized by deteriorating execution conditions — widening spreads, gap-down opens, and reduced fill quality — typically co-occurring with significant directional drawdowns.
- Bid/ask spread widening: ≥3× the asset's baseline spread during stress windows
- Gap-down opens: 2 or more days with overnight gaps >3% within the case window
- Volatility: typically co-occurs with Vol Expansion (realized vol 50%+)
- Drawdown: typically deep in our anchor set — −25% to −80%
- Distinguishing trait: execution-slippage matters — strategies relying on fill-quality assumptions underperform their model
Spread and gap-open thresholds are indicative microstructure heuristics drawn from market-microstructure literature; they are not directly measured in our case pipeline. Liquidity-stress identification in our catalog relies on historical co-occurrence of these conditions during the named episodes.
- Gold Taper Tantrum 2013 — max DD −25%
- BTC FTX Collapse 2022 — max DD −26%
- SPY Lehman GFC 2008 — max DD −42%
- BTC Luna Collapse 2022 — max DD −55%
- QQQ Dotcom Crash 2000 — max DD −67%
- WTI Negative Oil 2020 — max DD −78.5% (canonical extreme: futures settled negative)
- Limit-order entries missing execution at expected prices
- Stop-loss orders executing at prints far from trigger level
- Scaling-in plans facing partial fills with worse cost basis
- Daily-rebalanced systems facing widening transaction costs
- Multi-leg structures facing execution slippage that breaks the model
- Low-frequency systems with daily or longer rebalancing
- Market-on-close order structures
- Single-leg directional strategies
- High-frequency execution-dependent systems
- Multi-leg arbitrage and spread structures
- Limit-order-dependent mean-reversion
Archetype groupings reflect academic and industry observation — case-specific results are reported on individual strategy pages.
- vs. Sharp Crash: Sharp Crash is about price magnitude; Liquidity Stress is about execution conditions — they often co-occur but are diagnostically different.
- vs. Vol Expansion: Vol Expansion is realized-volatility focused; Liquidity Stress focuses on the microstructure breakdown that accompanies it.
Case Catalog Composition
- 31 empirical anchors: real OHLC slices from 2006–2025 (Lehman 2008, Dotcom 2000, COVID 2020, Luna 2022, Taper Tantrum 2013, etc.). One run per case — the realism comes built-in.
- 32 synthetic stress probes: 50 Monte-Carlo replicas each, used only for failure modes real history rarely produced cleanly — sustained low-vol grind, controlled whipsaw, slow-stagflation, hyperinflation surge, demand-destruction bear, plus the May-2026 setup-profile family (sharp-crash, vol-expansion, liquidity-stress, v-recovery) and two intentionally distinct slow-decline hardness levels (mid-cycle bear correction vs adversarial no-rebound).
- Coverage by failure mode: All 9 failure-mode score buckets (TREND_UP, TREND_DOWN, SIDEWAYS, VOL_EXPANSION, VOL_COMPRESSION, SHARP_CRASH, SLOW_BEAR, WHIPSAW, LIQUIDITY_STRESS) are tested by both empirical anchors and synthetic setup-profiles as of 2026-05. V-Recovery is treated as a diagnostic path pattern (composite, not a score bucket); replicas exhibiting the V-shape are decomposed into SHARP_CRASH + TREND_UP for scoring. LIQUIDITY_STRESS microstructure dimensions (spread, gap-down) remain covered by empirical anchors only (microstructure effects are not yet directly modeled in the synthetic generator) — see the case-catalog table for per-asset coverage.
VIX architectural note: directional regimes (TREND_UP/DOWN, SLOW_BEAR, V_RECOVERY, WHIPSAW, LIQUIDITY_STRESS) don't have an interpretable analogue for a volatility index. VIX strategies are tested only on the regimes that conceptually apply (vol expansion, vol compression, sharp crash). The previously included vix_persistent_high_synthetic profile was disabled in April 2026 because the simulator's VIX-return excess kurtosis (≈ 1.3) is well below the empirical level (≈ 5.5), making synthetic VIX dynamics unreliable. VIX failure modes are validated through the three empirical anchor cases (Low Vol 2017, Volmageddon 2018, COVID 2020) only.
Backtest Engine Correctness
Most backtest engines produce subtly incorrect results due to a fundamental problem: OHLC candles hide intra-bar order of events. When price hits both your stop-loss and take-profit within the same candle, which executed first?
The Ambiguous Candle Problem
Open: $100 → High: $108 → Low: $94 → Close: $102
Stop-Loss at $95 and Take-Profit at $107 — which hit first?
Our engine implements the methodology from Löw, Maier-Paape & Platen (2015) — a rigorous academic treatment of this problem. We default to worst-case execution, which assumes the least favorable outcome when order of events is ambiguous.
Worst-Case (Default)
Assumes stop-loss hits before take-profit. Conservative and realistic.
Best-Case
Assumes take-profit hits first. Optimistic bound.
Ignore
Skips ambiguous trades entirely. Strictest interpretation.
Our conservative default produces lower-bound performance estimates rather than optimistic ones — strategy results reflect realistic execution uncertainty rather than best-case assumptions.
Verifiable Claims
Synthetic stress profiles are accompanied by measurable statistical claims. For each profile-asset combination we publish the claimed properties (drawdown ranges, realized-volatility bounds, sign-change frequency, etc.) alongside the measured aggregate over 50 Monte Carlo replicas. This makes the simulator behavior falsifiable rather than asserted.
Conformance-distribution disclosure
A profile name (e.g. sharp_crash_synthetic) describes the initial conditions imposed on the agent-based simulator — not a guarantee that every replica realises the strict gating threshold of the associated failure-mode definition. Agent-based market dynamics produce a spectrum of outcomes under fixed conditional setups, just as the listed historical SHARP_CRASH anchors themselves spread from −10% (Volmageddon 2018) to −32% (Lehman 2008) over comparable windows.
For each synthetic profile we therefore publish the per-replica conformance distribution — how many of the 50 replicas qualify under the gating definition, how many fall in the descriptive near-band, and how many are sub-threshold. This replaces a binary “profile satisfies definition” claim with the more honest characterisation of emergent variability.
Known limits & off-band calibrations
Not every synthetic probe falls within its methodology-expected conformance band. Some deviations are sampling-marginal at n=50; others are structural — for example, the BTC failure-mode definitions are not identifying for the asset's empirical stress regime, because the relative-baseline definition applied to an 80% annualised baseline does not produce a separable threshold.
We disclose all of this transparently rather than smoothing it away in a calibration loop. The full breakdown — empirical-identifiability verification for BTC, structural vs sampling-marginal off-band classification with Wilson 95% confidence intervals, resolution path, and the full list of methodology limitations including score definition-dependence — lives on the dedicated verifiability page so that the methodology overview here stays focused on the “what” and “why”.
Where to read what:
- /verifiability — off-band tables, Wilson CIs, BTC identifiability verification, 9-point methodology-limitations list, per-dataset claim validation, raw
verifiability_snapshot.json - This page (
/methodology) — conceptual overview, failure-mode taxonomy, operational definitions, scope of claims
A snapshot of claimed-vs-measured properties across all 32 synthetic stress datasets — including the per-replica conformance distribution and the off-band disclosures above — is available as a browsable page and as raw JSON:
- /verifiability — browsable per-dataset claim validation with aggregated metrics
- /verifiability_snapshot.json — raw machine-readable snapshot (CC-BY 4.0)
Where measured values lie outside the claimed operational range, the snapshot reports the deviation transparently. Honest disclosure of where the simulator's behavior is more or less pronounced than its operational thresholds builds the kind of trust that asserted claims cannot.
Scope of Claims
Honest about what this is — and what it isn't:
What We Do
- Replay strategies across real historical OHLC slices from selected high-stress periods
- Generate controllable synthetic stress probes for failure modes history didn't deliver cleanly
- Calibrate simulator agents from actual CFTC CoT data per asset
- Aggregate across cases to produce a robustness score with explicit failure-mode breakdown
What We Don't Claim
- To replicate the full statistical distribution of any real instrument
- To predict future returns or future market behavior
- That synthetic probes match every empirical stylized fact (kurtosis, leverage effect, volume correlation) — they're targeted tools, not market replicas
- That out-of-sample walk-forward validation has been performed (yet)
Synthetic probes are deliberately narrow. A "low-vol grind" simulation reproduces the regime shape (sustained low realised volatility with shallow drift) but is not meant to be statistically indistinguishable from real-market low-vol periods on every higher-order moment. The empirical anchors carry the broad-distribution realism; the synthetic probes carry the controllable-stress dimension.
Academic References
Löw, R.W., Maier-Paape, S. & Platen, E. (2015)
"Correctness of Backtest Engines"
arXiv:1509.08248Foundational paper on handling ambiguous intra-bar events in backtesting. Our engine implements their worst-case execution model.
Cont, R. (2001)
"Empirical Properties of Asset Returns: Stylized Facts and Statistical Issues"
Quantitative Finance, 1(2), 223-236. Catalog of empirical regularities that inform how we shape synthetic stress probes (volatility clustering, fat tails, regime persistence) — without claiming to reproduce them all to publication standard.
Farmer, J.D. & Foley, D. (2009)
"The Economy Needs Agent-Based Modelling"
Nature, 460(7256), 685-686. Theoretical foundation for agent-based market simulation that generates emergent regime transitions and crisis dynamics.
LeBaron, B. (2006)
"Agent-based Computational Finance"
Handbook of Computational Economics, Vol. 2. Survey of heterogeneous agent models that inform our market participant observer architecture.
Want to know how to use the platform?
See the Documentation for practical guidance on interpreting results, understanding metrics, and known limitations.
Regulatory Note: All outputs are analytical simulations for research purposes only. They do not constitute investment advice, financial planning, or performance projections.