Scientific Methodology

Built on published academic foundations underlying our market simulator, backtest-engine correctness, and stress-case framework.

Core Approach: Targeted Stress Testing

Traditional backtesting produces a single result: "This strategy returned +47%." That result is conditional on one specific sequence of market events that will never repeat exactly — and tells you nothing about how the strategy would behave when the market does something different.

Our approach runs the strategy against a curated catalog of stress cases: real historical market slices (e.g. Lehman 2008, COVID 2020, Luna 2022) combined with targeted synthetic probes for failure modes that history rarely produces cleanly (sustained whipsaw, low-vol grind, slow-decline-without-rebound). Each case isolates a specific way a strategy can break.

"The point is not to predict the future — it is to find out how the strategy fails before the market does."

Hybrid Field-Agent Simulator (Synthetic Probes)

For the synthetic stress probes — and only for those — we use an agent-based market simulator. The goal is controllable structural realism: we want data that behaves like a market (microstructure, herding, regime persistence) while letting us dial in specific failure-mode parameters that real history did not deliver cleanly.

Belief Field Layer

A continuous field representing market-wide sentiment, evolving via diffusion dynamics. Drives the smooth, persistent regime structure that distinguishes real markets from random walks.

Observer Layer

Heterogeneous agents (asset managers, trend followers, dealers, mean reverters, retail) observe the field and submit orders. Their relative weights and net biases come from real CoT data — see next section.

This architecture is a test generator, not a market replica. It produces structurally plausible price action that we can shape to match a specific stress scenario (e.g. "a 6-month bear without a rebound spike"). The realism check is the case catalog itself — empirical anchors carry the actual realism, the simulator fills the gaps.

State Initialization Regime

Each synthetic stress case is preceded by a 250-day antecedent pre-period, followed by a short 5–10 day transition window into the stress regime. The pre-period exists so that long-lookback indicators (SMA-200, EMA-200, ATR-200, etc.) have stable values when the stress phase begins. Performance attribution starts only after the transition; the pre-period itself is excluded from trade counts, returns, and drawdowns.

The antecedent regime is not the stress regime. It is drawn deterministically from the simulation seed, using an asset-class-typical distribution chosen so that indicators enter the stress phase from a representative, rather than artificial, prior state.

Asset class	Antecedent distribution
Equity (SPY, QQQ)	40% bull-trend · 35% sideways-low-vol · 25% weak-bear
Volatility (VIX)	45% suppressed-vol · 40% elevated-transition · 15% panic-decay
Crypto (BTC, ETH)	35% parabolic-bull · 35% high-vol-sideways · 30% unstable-decay
Metal (GOLD)	50% range · 30% inflationary-upcycle · 20% demand-weakness
Energy (WTI)	50% range-high-vol · 30% demand-weakness · 20% inflationary-upcycle

Calibration of the antecedent regimes was validated against 750 simulated replicas (50 per profile-asset combination). Realized pre-period statistics fall within the specified target bands for 7 of 10 antecedent types; the remaining three —high_vol_sideways,parabolic_bull, andunstable_decay, all crypto regimes — show systematically lower micro-volatility than their target bands. This is a known limit of the current observer calibration for crypto markets and is reflected in the simulator code as an explicit disclosure.

Antecedent regimes are deterministically assigned per simulation seed, ensuring reproducibility across runs. The chosen distributions are a modeling assumption: results may depend on which antecedent regime is drawn for a given case, and the distribution itself reflects an assumption about typical pre-stress conditions for each asset class. This is a methodological choice that should be considered when interpreting results.

Empirical historical cases carry their natural pre-period from the surrounding real market data and do not require this construction.

CoT-Calibrated Agents

The agents in our simulator are not arbitrary. Their relative weights and net positioning biases are derived from real CFTC Commitment of Traders (CoT) reports — weekly disclosures of how each trader category (asset managers, hedge funds, dealers, retail, etc.) is positioned in major futures markets.

Each of the seven supported assets has its own calibration, drawn from the corresponding CoT report type and historical depth:

Asset	CoT Report Type	Weekly Reports	Period
SPY	TFF (Financial Futures)	342	2010–2016
QQQ	TFF (Financial Futures)	342	2010–2016
VIX	TFF (Financial Futures)	510	2006–2016
GOLD	DCOT (Disaggregated)	551	2006–2016
WTI	DCOT (Disaggregated)	551	2006–2016
BTC	TFF (Financial Futures)	420	2017–2026
ETH	TFF (Financial Futures)	264	2021–2026
Total		2,980

TFF — Traders in Financial Futures

CoT report variant for financial markets (equity indices, FX, crypto futures). Categories: Dealer Intermediaries, Asset Managers, Leveraged Money, Other Reportables, Non-Reportables.

DCOT — Disaggregated CoT

CoT report variant for commodities (metals, energy, agriculture). Categories: Producers/Merchants, Swap Dealers, Managed Money, Other Reportables, Non-Reportables.

This means every simulated agent's tendency to go long or short is calibrated against what the corresponding real-market participants actually did, averaged over hundreds of weekly snapshots. A simulated “Producer/Merchant” agent in the GOLD market carries the empirical net-short hedging pressure observed across 551 weeks of COMEX positioning data — not an arbitrary assumption.

Source: CFTC Commitments of Traders Historical Compressed reports. cftc.gov/MarketReports/CommitmentsofTraders →

Note: BTC futures launched on CME in December 2017; ETH cash-settled futures in February 2021 — those calibration windows start from contract inception. The CFTC TFF report format was published from June 2010 with backfilled coverage to 2006 for selected markets including VIX; SPY and QQQ data are available from 2010 onward. GOLD and WTI use the DCOT (Disaggregated CoT) report, which spans the full 2006–2016 window.

Failure-Mode Taxonomy & Case Catalog

Strategies don't fail the same way in every bad market — they have specific vulnerabilities. We organize stress tests around nine orthogonal failure-mode axes, each isolating one class of strategy weakness, plus V-Recovery as a diagnostic path pattern (composite of SHARP_CRASH down-leg and TREND_UP recovery — not a separate score bucket since 2026-05):

Click any mode to jump to its detailed definition below.

Trend Up — Bull: momentum wins, mean-reversion loses Trend Down — Bear: momentum loses Sideways — Range: mean-reversion wins, momentum loses Vol Expansion — Vol-traders win, fixed stops lose Vol Compression — Trend-traders stagnate, whipsaws hurt Sharp Crash — Drawdown / stop-loss test Slow Bear — Patience / persistence test V-Recovery — Re-entry timing test Whipsaw — False-signal robustness Liquidity Stress — Spread / slippage assumptions

Each case is tagged with the failure mode(s) it stresses. A single case can carry multiple tags — Lehman 2008 contributes to TREND_DOWN, SHARP_CRASH and LIQUIDITY_STRESS simultaneously, because that historical episode genuinely combined all three. The robustness score aggregates strategy performance across the entire catalog.

Detailed Failure-Mode Definitions

Each of the 10 modes is defined with operational thresholds (verified against our case catalog), historical examples, common strategy failure patterns, strategy archetypes, and distinctions from similar regimes. The Confidence tag in each header reflects the depth of empirical anchoring in our catalog.

Trend Up

TREND_UP

Definition

Trend Up describes a market regime characterized by persistent positive directional movement with shallow drawdowns over extended windows.

Operational Definition

Total return: typically +8% to +25% over 4–6 months (annualized roughly +15% to +50%)
Maximum drawdown: shallow, typically less than 10%, almost never exceeding 15%
Realized annualized volatility: 8–25% for equities and commodities; crypto runs significantly higher (60–100%+)
Persistence: at least 3 trading months of dominant directional movement, no decisive mid-case reversal

Historical Examples

SPY Bull Run 2017 H1 — +7.5%, max DD −3%
QQQ Tech Rally 2017 — +13.9%, max DD −5%
Gold Post-GFC Inflation Hedge 2010 — +7.9%, max DD −7.8%
WTI Pre-GFC Oil Boom — +23.3%, max DD −10.6%
BTC 2017 Parabolic ATH — +216%, max DD −34.9% (crypto outlier, broadens upper bound)

Common Strategy Failure Patterns

Mean-reversion entries fading against persistent direction
Counter-trend stops triggered repeatedly
Cash-allocation systems lagging benchmark performance
Volatility-targeting underweighting during low-vol bull runs
Short-bias structures bleeding continuously

Historically More Stable Archetypes

Long-bias trend-following systems
Momentum allocation models
Buy-and-hold equity exposure

Historically Fragile Archetypes

Short-bias structures
Naive mean-reversion against trend
Volatility-shorting in directionally ambiguous setups

Archetype groupings reflect academic and industry observation — case-specific results are reported on individual strategy pages.

Distinction from Similar Modes

vs. V-Recovery: Trend Up starts from neutral ground; V-Recovery starts after a drawdown of 15% or more.
vs. Vol Compression: Vol Compression is about low realized volatility regardless of direction; Trend Up is about direction regardless of volatility.

Strategies under Trend Up →Related: V-Recovery →Related: Vol Compression →

Trend Down

TREND_DOWN

Definition

Trend Down describes a market regime characterized by persistent negative directional movement that sustains or deepens across the case window without full recovery.

Operational Definition

Total return: typically −20% to −60% over 4–6 months (annualized roughly −40% to −95%)
Maximum drawdown: −25% to −70%, with median around −50%
Realized annualized volatility: elevated, typically 25–90%
Persistence: dominant downward trajectory across the full window — drawdown deepens or sustains, no full V-recovery

Historical Examples

Gold Taper Tantrum 2013 — −19.4%, max DD −25%
SPY Lehman GFC 2008 — −36.4%, max DD −42%
WTI Saudi Oil War 2014 — −48%, max DD −53%
BTC Crypto Winter 2018 — −52.5%, max DD −66%
QQQ Dotcom Crash 2000 — −58.5%, max DD −67%

Common Strategy Failure Patterns

Buy-the-dip systems re-entering into deepening drawdowns
Long-only momentum systems failing to exit fast enough
Dollar-cost-averaging extending drawdown duration without recovery
Stop-loss systems confirming exits below average entry price
Long-equity allocation systems persistently underperforming cash

Historically More Stable Archetypes

Trend-following systems with short-side capability
Defensive allocation rotations
Cash-heavy or low-net-exposure systems

Historically Fragile Archetypes

Long-only buy-the-dip mean-reversion
Leveraged long-equity positioning
Bottom-fishing oscillator strategies

Archetype groupings reflect academic and industry observation — case-specific results are reported on individual strategy pages.

Distinction from Similar Modes

vs. Slow Bear: Trend Down may include sharp acceleration phases (e.g., late stages of Dotcom); Slow Bear specifically excludes them — it is the grinding-down version.
vs. Sharp Crash: Trend Down persists across months; Sharp Crash is concentrated in days to weeks.

Strategies under Trend Down →Related: Slow Bear →Related: Sharp Crash →

Sideways

SIDEWAYS

Definition

Sideways describes a market regime characterized by extended range-bound oscillation without decisive directional resolution.

Operational Definition

Total return: roughly −10% to +25% (around zero with mild dispersion)
Maximum drawdown: typically −10% to −20% — pullbacks happen within the range
Realized annualized volatility: moderate, 13–30%
Persistence: at least 4 months without decisive directional resolution; price oscillates within bounded range
Direction-to-vol ratio: net direction must be small relative to volatility — direction dominated by noise

Historical Examples

SPY China Sideways 2015 H2 — −7.7%, max DD −12%
BTC Sideways 2023 — −1.6%, max DD −20%
Gold Sideways 2014 — +8.2%, max DD −10%
WTI Sideways 2018 — +23.9%, max DD −11% (upper-bound case, mild trend mixed in)

Common Strategy Failure Patterns

Trend-following systems generating frequent false breakouts
Wide stops repeatedly triggered around range edges
Time-decay eroding options-based directional structures
Trailing-stop momentum systems exiting at range floors
Breakout-confirmation systems whipsawed in both directions

Historically More Stable Archetypes

Mean-reversion oscillator systems
Range-trading and channel-bound structures
Options-selling within bounded ranges

Historically Fragile Archetypes

Breakout systems
Fast-trend-following without regime filter
Momentum-confirmation strategies

Archetype groupings reflect academic and industry observation — case-specific results are reported on individual strategy pages.

Distinction from Similar Modes

vs. Whipsaw: Sideways captures the regime shape (range-bound); Whipsaw emphasizes the signal-noise problem for trend-followers (frequent reversals through any signal threshold).
vs. Vol Compression: Vol Compression requires visibly suppressed realized volatility; Sideways allows moderate vol within the range.

Strategies under Sideways →Related: Whipsaw →Related: Vol Compression →

Vol Expansion

VOL_EXPANSION

Definition

Vol Expansion describes a market regime characterized by sustained elevation of realized volatility relative to the asset's recent baseline, decoupled from directional bias.

Operational Definition

Gating (required): Median realized volatility over the case window ≥ 1.5× the asset's normal-window vol
Gating (required): Persistence — at least 2 distinct (non-overlapping) 30-day windows with vol ≥ 1.5× baseline; differentiates VOL_EXPANSION from a single concentrated SHARP_CRASH window
Descriptive (not gating): peak rolling-30-day vol ≥ 3× baseline count; vol-of-vol level; max-DD-window characteristics — reported alongside, used for sub-classification ("sustained-elevated" vs "sustained-with-spikes") but not for FM-tag determination
Direction: decoupled from regime — return can be sharply negative, sharply positive, or near zero
VIX-specific: observable as VIX index level above its 90th percentile (separate criterion from realized-vol-of-VIX-returns)

Definition history: an earlier version required 2–5× baseline as a 6-month median plus ≥3 distinct windows with peaks ≥3×. Systematic anchor validation (2026-05) revealed that conjunction effectively required Lehman-class duration AND COVID-class peaks combined — rare even among real episodes. The current version separates the persistent-state core (gating) from spike behaviour (descriptive). Spike-character regimes are conceptually a separate axis (VOL_INSTABILITY, planned future failure-mode) rather than a sub-criterion of VOL_EXPANSION. Anchor re-tagging (Volmageddon → SHARP_CRASH; BTC ATH 2017 / DeFi Summer Bull → TREND_UP only) corrected mis-classifications where event-character did not match case-window measurement.

Historical Examples

SPY Vol Shock Feb 2018 — realized vol 17%
QQQ Vol Shock Feb 2018 — realized vol 21%
QQQ Tech Bear 2022 — realized vol 34%
ETH Vol Shock 2018 — realized vol 105%
BTC 2017 Parabolic ATH — realized vol 92%
VIX Volmageddon 2018 — realized vol 289%
VIX COVID 2020 — realized vol 187%

Common Strategy Failure Patterns

Fixed-stop systems gapped through unfavorable prints
Position-sizing models underestimating realized risk
Mean-reversion entries fading against expanding moves
Short-volatility carry structures facing uncapped losses
Risk-parity allocations rebalancing into concentrated positions

Historically More Stable Archetypes

Long-volatility structures and tail-hedge overlays
Volatility-targeted position sizing
Low-leverage allocation models

Historically Fragile Archetypes

Short-volatility carry structures
Fixed-leverage models
Naive mean-reversion at fixed thresholds

Archetype groupings reflect academic and industry observation — case-specific results are reported on individual strategy pages.

Distinction from Similar Modes

vs. Sharp Crash: Sharp Crash is direction (down) plus speed; Vol Expansion is direction-agnostic.
vs. Whipsaw: Whipsaw emphasizes oscillation around levels; Vol Expansion emphasizes magnitude regardless of pattern.

Strategies under Vol Expansion →Related: Sharp Crash →Related: Whipsaw →

Vol Compression

VOL_COMPRESSION

Definition

Vol Compression describes a market regime characterized by sustained suppression of realized volatility well below the asset's recent baseline.

Operational Definition

Realized annualized volatility: 0.4–0.7× the asset's baseline (equities 7–12% vs. baseline 16%; gold 8–15%; crypto 30–50% vs. baseline 80%+)
Persistence: typically >2 months of sustained low realized vol; brief calm windows do not qualify
Direction: can be slightly positive, neutral, or VIX-specific (low VIX is the regime)
Drawdowns: shallow, typically <8% for non-VIX assets

Historical Examples

SPY Bull Run 2017 H1 — realized vol 7.1%, max DD −3%
QQQ Tech Rally 2017 — realized vol 10.1%, max DD −5%
VIX Low Vol 2017 — VIX index level persistently below 12 (regime defined by the index level itself, not by the realized volatility of VIX returns — which remains structurally high)

Only two robust empirical anchors for non-VIX assets (SPY and QQQ 2017); the regime is otherwise inferred from synthetic profiles. Bullish low-vol periods cluster naturally — this category overlaps with Trend Up by construction.

Common Strategy Failure Patterns

Breakout strategies generating few signals, mostly false
Time-stop systems exiting before signal materializes
Short-volatility systems accumulating without warning of regime change
Momentum signals weak relative to noise floor
Trend-followers underperforming passive allocation

Historically More Stable Archetypes

Short-volatility carry structures
Slow-trend systems with volatility filters
Allocation-based portfolios with rebalancing

Historically Fragile Archetypes

Breakout systems without regime filter
Fast-trend-following with tight stops
Long-volatility / options-buying structures

Archetype groupings reflect academic and industry observation — case-specific results are reported on individual strategy pages.

Distinction from Similar Modes

vs. Sideways: Sideways allows moderate volatility within the range; Vol Compression requires visibly suppressed vol.
vs. Trend Up: Trend Up is about return direction; many Trend Up regimes are also Vol Compression (low-vol bull) but not all.

Strategies under Vol Compression →Related: Sideways →Related: Trend Up →

Sharp Crash

SHARP_CRASH

Definition

Sharp Crash describes a market regime characterized by accelerated downside dislocation over compressed trading windows.

Operational Definition

Gating (required): Peak-to-trough drawdown ≥ 20% in any rolling 30-day window. DD computed as max over all overlapping 30-day windows. Multiple ≥20% windows within 60 days consolidate into a single crash event.
Gating (required): Realized volatility during the crash window ≥ 1.5× the asset's normal-window vol (asset-agnostic).
Sub-classification (descriptive, not gating): "tail-intensified" (excess kurtosis ≥ 1.5 OR daily-return 1st percentile ≤ −3%) versus "broad-distribution" (otherwise).
Reported alongside: vol percentile against asset's own rolling-vol distribution; min-window DD (5–10 days) for flash-event detection.
Near-crash band: 15–20% rolling-30d DD is reported as "near-crash dynamics observed" without the SHARP_CRASH tag.
Recovery within the case window is permitted — SHARP_CRASH classifies the dislocation event itself, not the post-event trajectory.

The definition captures observable price dynamics. Underlying microstructure effects (liquidity, gap-structure, order-book stress) are not directly modeled here — see LIQUIDITY_STRESS for that dimension. Anchor validation (Lehman 2008 rolling-30d DD −32%, COVID 2020 −34%, Luna 2022 −40%, FTX 2022 −26%) confirmed all four listed historical SHARP_CRASH anchors satisfy the ≥20% DD-component threshold with margin. Note: while the DD component is met by all four, the full gating (DD ≥20% AND crash-window vol ≥1.5× baseline) is asset-asymmetric — for high-baseline assets like BTC the vol component does not always activate even in clear stress events (e.g. Luna 2022 crash-window vol 0.92×, FTX 2022 0.85×). See /verifiability for the empirical-identifiability table. Definition revised 2026-05.

Historical Examples

Lehman Brothers collapse — Sep–Oct 2008
COVID-19 onset — Feb–Mar 2020
Luna / UST collapse — May 2022
FTX collapse — Nov 2022 (secondary, exchange-failure shock)

Common Strategy Failure Patterns

Signal lag exceeding downside acceleration
Oversold re-entry during unresolved downside continuation
Liquidity-gap stop execution at unfavorable prints
Fixed volatility thresholds destabilized by expanding realized volatility
Position-sizing assumptions invalidated by realized risk

Historically More Stable Archetypes

Long-horizon trend-following systems
Volatility-scaled allocation models
Reduced-exposure tactical systems

Historically Fragile Archetypes

Fixed-threshold mean-reversion systems
High-leverage volatility compression strategies
Short-volatility carry structures

Archetype groupings reflect academic and industry observation — case-specific results are reported on individual strategy pages.

Distinction from Similar Modes

vs. Slow Bear: Sharp Crash compresses the drawdown into days or weeks; Slow Bear extends comparable losses over months without the same acceleration.
vs. Vol Expansion: Vol Expansion can be directionally neutral; Sharp Crash combines directional dislocation with vol expansion.
vs. Liquidity Stress: Liquidity Stress can occur without large directional moves; Sharp Crash usually carries liquidity stress as a co-symptom.

Strategies under Sharp Crash →Related: Vol Expansion →Related: Liquidity Stress →

Slow Bear

SLOW_BEAR

Definition

Slow Bear describes a market regime characterized by gradually accumulating downside losses without a single concentrated crash event.

Operational Definition

Total return: typically −25% to −60% over ~6 months (annualized roughly −50% to −95%)
Maximum drawdown: −25% to −70%, accumulating gradually rather than concentrated
Realized annualized volatility: 20–90% — usually elevated but without sharp single-day shocks dominating
Persistence: drawdown sustains or deepens over 3+ months without recovery
Distinguishing trait: drawdown growth is monotonic-ish, not punctuated by a single crash event

Historical Examples

SPY Fed-Bear 2022 H2 — 0% return, max DD −17% (slow profile)
QQQ Tech Bear 2022 — −29.8%, max DD −32%
WTI Saudi Oil War 2014 — −48%, max DD −53%
BTC Crypto Winter 2018 — −52.5%, max DD −66%
QQQ Dotcom Crash 2000 — −58.5%, max DD −67% (also Trend Down + Sharp Crash due to severity)

Common Strategy Failure Patterns

Long-equity allocation underperforming cash for extended periods
Trend-followers cycling in/out with each minor rally
Mean-reversion bottom-fishing entries repeatedly stopping out
Drawdown psychology favoring capitulation near terminal lows
Volatility-targeting under-allocating during deep drawdowns when vol stays moderate

Historically More Stable Archetypes

Trend-following systems with short-side capability
Defensive sector-rotation strategies
Volatility-targeted allocation with cash buffer

Historically Fragile Archetypes

Long-only equity exposure
Dollar-cost-averaging without exit logic
Mean-reversion bottom-fishing

Archetype groupings reflect academic and industry observation — case-specific results are reported on individual strategy pages.

Distinction from Similar Modes

vs. Trend Down: Trend Down may include sharp acceleration phases (e.g., late stages of Dotcom); Slow Bear specifically excludes them.
vs. Sharp Crash: Sharp Crash concentrates loss in days to weeks; Slow Bear distributes loss across 3+ months.

Strategies under Slow Bear →Related: Trend Down →Related: Sharp Crash →

V-Recovery

V_RECOVERY

Status: Diagnostic Path Pattern (not a Failure-Mode bucket)

As of 2026-05, V-Recovery is treated as a composite path pattern, not a Failure-Mode score bucket. The V-shape combines two orthogonal phases: strategies actually fail under the down-leg (which scores under SHARP_CRASH) and miss the up-leg (which scores under TREND_UP). Decomposing the V into its constituent FMs avoids the bimodal-classification problem that intrinsic-conditional outcomes produce. The V-Recovery label is retained as descriptive annotation on cases that exhibit the path pattern; for score aggregation, replicas are classified into SHARP_CRASH + TREND_UP buckets per the empirical phase-decomposition.

Definition

V-Recovery describes a path pattern characterized by a meaningful drawdown immediately followed by a rapid retracement to near or above pre-drawdown levels.

Operational Pattern Definition (diagnostic)

Maximum drawdown during the case: −15% to −35% (peak-to-trough)
Recovery from low: at least 80% retracement of the drawdown by case-end (price returns to within ~5% of pre-crash peak)
Time-from-low to substantial recovery: typically 2–4 months (sharp rebound, not slow)
Final case-window return: typically flat to +25% (the V-shape lands somewhere near or above starting price)
Realized annualized volatility: 20–50% during the V

Historical Examples

SPY COVID Crash 2020 — +0.7%, max DD −34%
QQQ COVID + Tech V-Recovery — +19.5%, max DD −29%
Gold COVID + ATH 2020 — +25%, max DD −12.5%

All three anchors are COVID 2020 across asset classes — this limits regime-diversity validation.

Common Strategy Failure Patterns

Trend-following systems exiting near lows, missing the recovery rally
Drawdown-based de-risking reducing exposure before snap-back
Cash-on-the-sidelines systems waiting for confirmation that arrives late
Risk-parity rebalancing dampening recovery participation
Stop-loss execution at terminal lows preceding the rebound

Historically More Stable Archetypes

Persistent long-equity exposure
Regime-agnostic strategic allocation
Simple buy-and-hold

Historically Fragile Archetypes

Drawdown-based de-risking systems
Trend-following exits without re-entry logic
Cash-on-the-sidelines systems with late confirmation

Archetype groupings reflect academic and industry observation — case-specific results are reported on individual strategy pages.

Distinction from Similar Modes

vs. Sharp Crash: Sharp Crash is the down-leg only; V-Recovery is the down-leg plus the rapid up-leg as a single regime.
vs. Trend Up: V-Recovery starts after a meaningful drawdown; Trend Up starts from neutral ground.

Strategies under V-Recovery →Related: Sharp Crash →Related: Trend Up →

Whipsaw

WHIPSAW

Definition

Whipsaw describes a market regime characterized by repeated directional reversals at signal-relevant levels, generating false confirmation signals for trend-following systems.

Operational Definition

Multiple directional reversals: at least 3–5 sign-changes of >5% magnitude over 3–6 months
Net case return: near zero, typically ±10% (no decisive direction)
Maximum drawdown: typically −10% to −20% — pullbacks happen but recover
Realized annualized volatility: moderate, 15–30%
Distinguishing trait: false signals — strategies dependent on directional confirmation are stopped out repeatedly

Historical Examples

SPY China Sideways 2015 H2 — −7.7%, max DD −12%
BTC Sideways 2023 — −1.6%, max DD −20%
Gold Sideways 2014 — +8.2%, max DD −10%
WTI Sideways 2018 — +23.9%, max DD −11%

Whipsaw shares its anchor set with Sideways by design — these modes are not differentiated by separate cases but by perspective: Sideways describes the regime shape, Whipsaw describes the failure of trend-following signals within that shape. Sign-change counts per case are not currently measured in our pipeline.

Common Strategy Failure Patterns

Moving-average crossovers generating repeated false signals
Breakout-confirmation systems stopped at range boundaries
Trailing-stops triggered at noise levels
Mean-reversion at extremes works, but with high signal cost
Time-stop systems exiting just before signal materializes

Historically More Stable Archetypes

Long-horizon allocation strategies
Options-selling within bounded ranges
Regime-filtered trend-followers

Historically Fragile Archetypes

Short-horizon trend-following
Breakout-confirmation systems
Trailing-stop momentum

Archetype groupings reflect academic and industry observation — case-specific results are reported on individual strategy pages.

Distinction from Similar Modes

vs. Sideways: Sideways emphasizes the bounded range; Whipsaw emphasizes the failure of trend-following signals within that range.
vs. Vol Expansion: Whipsaw has moderate volatility with reversals; Vol Expansion can be uni-directional with high volatility.

Strategies under Whipsaw →Related: Sideways →Related: Vol Expansion →

Liquidity Stress

LIQUIDITY_STRESS

Definition

Liquidity Stress describes a market regime characterized by deteriorating execution conditions — widening spreads, gap-down opens, and reduced fill quality — typically co-occurring with significant directional drawdowns.

Operational Definition

Bid/ask spread widening: ≥3× the asset's baseline spread during stress windows
Gap-down opens: 2 or more days with overnight gaps >3% within the case window
Volatility: typically co-occurs with Vol Expansion (realized vol 50%+)
Drawdown: typically deep in our anchor set — −25% to −80%
Distinguishing trait: execution-slippage matters — strategies relying on fill-quality assumptions underperform their model

Spread and gap-open thresholds are indicative microstructure heuristics drawn from market-microstructure literature; they are not directly measured in our case pipeline. Liquidity-stress identification in our catalog relies on historical co-occurrence of these conditions during the named episodes.

Historical Examples

Gold Taper Tantrum 2013 — max DD −25%
BTC FTX Collapse 2022 — max DD −26%
SPY Lehman GFC 2008 — max DD −42%
BTC Luna Collapse 2022 — max DD −55%
QQQ Dotcom Crash 2000 — max DD −67%
WTI Negative Oil 2020 — max DD −78.5% (canonical extreme: futures settled negative)

Common Strategy Failure Patterns

Limit-order entries missing execution at expected prices
Stop-loss orders executing at prints far from trigger level
Scaling-in plans facing partial fills with worse cost basis
Daily-rebalanced systems facing widening transaction costs
Multi-leg structures facing execution slippage that breaks the model

Historically More Stable Archetypes

Low-frequency systems with daily or longer rebalancing
Market-on-close order structures
Single-leg directional strategies

Historically Fragile Archetypes

High-frequency execution-dependent systems
Multi-leg arbitrage and spread structures
Limit-order-dependent mean-reversion

Archetype groupings reflect academic and industry observation — case-specific results are reported on individual strategy pages.

Distinction from Similar Modes

vs. Sharp Crash: Sharp Crash is about price magnitude; Liquidity Stress is about execution conditions — they often co-occur but are diagnostically different.
vs. Vol Expansion: Vol Expansion is realized-volatility focused; Liquidity Stress focuses on the microstructure breakdown that accompanies it.

Strategies under Liquidity Stress →Related: Sharp Crash →Related: Vol Expansion →

Case Catalog Composition

31 empirical anchors: real OHLC slices from 2006–2025 (Lehman 2008, Dotcom 2000, COVID 2020, Luna 2022, Taper Tantrum 2013, etc.). One run per case — the realism comes built-in.
32 synthetic stress probes: 50 Monte-Carlo replicas each, used only for failure modes real history rarely produced cleanly — sustained low-vol grind, controlled whipsaw, slow-stagflation, hyperinflation surge, demand-destruction bear, plus the May-2026 setup-profile family (sharp-crash, vol-expansion, liquidity-stress, v-recovery) and two intentionally distinct slow-decline hardness levels (mid-cycle bear correction vs adversarial no-rebound).
Coverage by failure mode: All 9 failure-mode score buckets (TREND_UP, TREND_DOWN, SIDEWAYS, VOL_EXPANSION, VOL_COMPRESSION, SHARP_CRASH, SLOW_BEAR, WHIPSAW, LIQUIDITY_STRESS) are tested by both empirical anchors and synthetic setup-profiles as of 2026-05. V-Recovery is treated as a diagnostic path pattern (composite, not a score bucket); replicas exhibiting the V-shape are decomposed into SHARP_CRASH + TREND_UP for scoring. LIQUIDITY_STRESS microstructure dimensions (spread, gap-down) remain covered by empirical anchors only (microstructure effects are not yet directly modeled in the synthetic generator) — see the case-catalog table for per-asset coverage.

VIX architectural note: directional regimes (TREND_UP/DOWN, SLOW_BEAR, V_RECOVERY, WHIPSAW, LIQUIDITY_STRESS) don't have an interpretable analogue for a volatility index. VIX strategies are tested only on the regimes that conceptually apply (vol expansion, vol compression, sharp crash). The previously included vix_persistent_high_synthetic profile was disabled in April 2026 because the simulator's VIX-return excess kurtosis (≈ 1.3) is well below the empirical level (≈ 5.5), making synthetic VIX dynamics unreliable. VIX failure modes are validated through the three empirical anchor cases (Low Vol 2017, Volmageddon 2018, COVID 2020) only.

Backtest Engine Correctness

Most backtest engines produce subtly incorrect results due to a fundamental problem: OHLC candles hide intra-bar order of events. When price hits both your stop-loss and take-profit within the same candle, which executed first?

The Ambiguous Candle Problem

Open: $100 → High: $108 → Low: $94 → Close: $102

Stop-Loss at $95 and Take-Profit at $107 — which hit first?

Our engine implements the methodology from Löw, Maier-Paape & Platen (2015) — a rigorous academic treatment of this problem. We default to worst-case execution, which assumes the least favorable outcome when order of events is ambiguous.

Worst-Case (Default)

Assumes stop-loss hits before take-profit. Conservative and realistic.

Best-Case

Assumes take-profit hits first. Optimistic bound.

Ignore

Skips ambiguous trades entirely. Strictest interpretation.

Our conservative default produces lower-bound performance estimates rather than optimistic ones — strategy results reflect realistic execution uncertainty rather than best-case assumptions.

Verifiable Claims

Synthetic stress profiles are accompanied by measurable statistical claims. For each profile-asset combination we publish the claimed properties (drawdown ranges, realized-volatility bounds, sign-change frequency, etc.) alongside the measured aggregate over 50 Monte Carlo replicas. This makes the simulator behavior falsifiable rather than asserted.

Conformance-distribution disclosure

A profile name (e.g. sharp_crash_synthetic) describes the initial conditions imposed on the agent-based simulator — not a guarantee that every replica realises the strict gating threshold of the associated failure-mode definition. Agent-based market dynamics produce a spectrum of outcomes under fixed conditional setups, just as the listed historical SHARP_CRASH anchors themselves spread from −10% (Volmageddon 2018) to −32% (Lehman 2008) over comparable windows.

For each synthetic profile we therefore publish the per-replica conformance distribution — how many of the 50 replicas qualify under the gating definition, how many fall in the descriptive near-band, and how many are sub-threshold. This replaces a binary “profile satisfies definition” claim with the more honest characterisation of emergent variability.

Known limits & off-band calibrations

Not every synthetic probe falls within its methodology-expected conformance band. Some deviations are sampling-marginal at n=50; others are structural — for example, the BTC failure-mode definitions are not identifying for the asset's empirical stress regime, because the relative-baseline definition applied to an 80% annualised baseline does not produce a separable threshold.

We disclose all of this transparently rather than smoothing it away in a calibration loop. The full breakdown — empirical-identifiability verification for BTC, structural vs sampling-marginal off-band classification with Wilson 95% confidence intervals, resolution path, and the full list of methodology limitations including score definition-dependence — lives on the dedicated verifiability page so that the methodology overview here stays focused on the “what” and “why”.

Where to read what:

/verifiability — off-band tables, Wilson CIs, BTC identifiability verification, 9-point methodology-limitations list, per-dataset claim validation, raw verifiability_snapshot.json
This page (/methodology) — conceptual overview, failure-mode taxonomy, operational definitions, scope of claims

A snapshot of claimed-vs-measured properties across all 32 synthetic stress datasets — including the per-replica conformance distribution and the off-band disclosures above — is available as a browsable page and as raw JSON:

/verifiability — browsable per-dataset claim validation with aggregated metrics
/verifiability_snapshot.json — raw machine-readable snapshot (CC-BY 4.0)

Where measured values lie outside the claimed operational range, the snapshot reports the deviation transparently. Honest disclosure of where the simulator's behavior is more or less pronounced than its operational thresholds builds the kind of trust that asserted claims cannot.

Scope of Claims

Honest about what this is — and what it isn't:

What We Do

Replay strategies across real historical OHLC slices from selected high-stress periods
Generate controllable synthetic stress probes for failure modes history didn't deliver cleanly
Calibrate simulator agents from actual CFTC CoT data per asset
Aggregate across cases to produce a robustness score with explicit failure-mode breakdown

What We Don't Claim

To replicate the full statistical distribution of any real instrument
To predict future returns or future market behavior
That synthetic probes match every empirical stylized fact (kurtosis, leverage effect, volume correlation) — they're targeted tools, not market replicas
That out-of-sample walk-forward validation has been performed (yet)

Synthetic probes are deliberately narrow. A "low-vol grind" simulation reproduces the regime shape (sustained low realised volatility with shallow drift) but is not meant to be statistically indistinguishable from real-market low-vol periods on every higher-order moment. The empirical anchors carry the broad-distribution realism; the synthetic probes carry the controllable-stress dimension.

Academic References

Löw, R.W., Maier-Paape, S. & Platen, E. (2015)

"Correctness of Backtest Engines"

arXiv:1509.08248

Foundational paper on handling ambiguous intra-bar events in backtesting. Our engine implements their worst-case execution model.

Cont, R. (2001)

"Empirical Properties of Asset Returns: Stylized Facts and Statistical Issues"

Quantitative Finance, 1(2), 223-236. Catalog of empirical regularities that inform how we shape synthetic stress probes (volatility clustering, fat tails, regime persistence) — without claiming to reproduce them all to publication standard.

Farmer, J.D. & Foley, D. (2009)

"The Economy Needs Agent-Based Modelling"

Nature, 460(7256), 685-686. Theoretical foundation for agent-based market simulation that generates emergent regime transitions and crisis dynamics.

LeBaron, B. (2006)

"Agent-based Computational Finance"

Handbook of Computational Economics, Vol. 2. Survey of heterogeneous agent models that inform our market participant observer architecture.

Want to know how to use the platform?

See the Documentation for practical guidance on interpreting results, understanding metrics, and known limitations.

Regulatory Note: All outputs are analytical simulations for research purposes only. They do not constitute investment advice, financial planning, or performance projections.