# CrashTestYourStrategy

> Stress-regime intelligence layer for trading strategies.
> Tests strategy behavior against a curated catalog of historical and synthetic stress cases,
> classified along an explicit 10-axis failure-mode taxonomy.

## Overview

CrashTestYourStrategy is a research-grade backtest framework that evaluates how trading
strategies behave under specific market stress conditions. Rather than producing a single
backtest number conditional on one historical path, it runs each strategy against a curated
catalog of stress cases — real historical OHLC slices combined with controllable synthetic
probes — and reports performance by failure mode.

This is NOT a financial advisory tool. It does not make forecasts, recommendations, or
predictions. All outputs are model-based simulations under simplified assumptions.

## Architecture

- **Case catalog**: 63 stress cases — 31 empirical anchors (real historical OHLC: Lehman
  2008, Dotcom 2000, COVID 2020, Luna 2022, Volmageddon 2018, Taper Tantrum 2013, etc.)
  plus 32 synthetic stress probes (50 Monte-Carlo replicas each, across profile families:
  low-vol grind, controlled whipsaw, slow-stagflation, hyperinflation, demand-destruction,
  sharp-crash setup, vol-expansion setup, liquidity-stress setup, v-recovery setup,
  slow-decline-with-partial-recovery, slow-crash-no-recovery — the last two are intentionally
  distinct stress hardness levels).
- **Failure-mode taxonomy (10 axes; 9 score buckets since May 2026)**: TREND_UP, TREND_DOWN,
  SIDEWAYS, VOL_EXPANSION, VOL_COMPRESSION, SHARP_CRASH, SLOW_BEAR, V_RECOVERY, WHIPSAW,
  LIQUIDITY_STRESS. V_RECOVERY is treated as a diagnostic path pattern (decomposed into
  SHARP_CRASH down-leg + TREND_UP up-leg) for score aggregation, leaving 9 effective
  score buckets. The 10-axis label is retained for case-tagging and UI continuity.
  Every case is tagged with one or more failure modes; a single case (e.g. Lehman 2008)
  can contribute to multiple buckets simultaneously.
- **Conditional-Distribution-Engine (May 2026 — Phase 17/18a)**: profile names describe
  *imposed conditions* on the agent-based simulator, not output guarantees. Per-replica
  ex-post FM-classification against operational gating definitions; replicas distribute
  across the FM spectrum, may be multi-tagged, may be sub-threshold. Per-FM-bucket
  scoring with shrinkage and dominance-weighted multi-FM attribution is implemented in
  V2; production engine still runs V1 (no FM-bucketing in composite) — see
  `/verifiability` for full architecture-status disclosure.
- **Aggregation**: Strategy performance is computed per case, then aggregated by
  failure mode to produce a robustness score (0-100) plus a per-mode breakdown.
- **Synthetic-probe simulator**: Hybrid Field-SME agent-based model with CFTC
  Commitment-of-Traders calibration (~3,000 weekly reports). Used only for the 32
  synthetic stress probes, not for replicating full instruments.
- **Backtest correctness**: Implements the Löw, Maier-Paape & Platen (2015) methodology
  for ambiguous OHLC candles, defaulting to worst-case execution.

## Supported Assets

SPY (S&P 500), QQQ (Nasdaq 100), BTC (Bitcoin), ETH (Ethereum), GOLD, WTI (Crude Oil),
VIX (Volatility Index).

Each asset has its own case set and CoT-derived agent calibration where applicable.

## Supported Indicators

SMA, EMA, RSI, MACD, Bollinger Bands, ATR, Stochastic, CCI, Williams %R, Supertrend,
Parabolic SAR, Aroon, Donchian, and others. Full list at /strategy-guide.

Not yet supported: volume-based indicators (OBV, VWAP, MFI, CMF). Strategies
referencing volume parse correctly, but volume-dependent signals do not fire.
Synthetic case OHLCV data includes volume columns; the engine does not generate
signals from volume.

## Pre-Computed Strategy Reports

Nine curated strategies have pre-computed reports available at /s/{slug}, designed to
span contrasting failure-mode profiles rather than indicator redundancy:

- /s/buy-hold-spy — Buy and Hold S&P 500 (passive equity baseline)
- /s/buy-hold-btc — Buy and Hold Bitcoin (passive crypto baseline)
- /s/sma-200-crossover — SMA 200 Crossover (long-term trend-following)
- /s/ema-12-26-crossover — EMA 12/26 Crossover (short-term trend)
- /s/macd-crossover — MACD Signal Line Crossover (composite trend + momentum)
- /s/rsi-30-70 — RSI 30/70 Mean Reversion (oscillator-based mean-reversion)
- /s/bollinger-bounce — Bollinger Band Bounce (volatility-band mean-reversion)
- /s/rsi-sma-trend-filter — RSI with SMA-200 Trend Filter (hybrid)
- /s/supertrend — Supertrend Strategy (volatility-adaptive trend)

Each report contains: robustness score (0-100), failure-mode breakdown per regime,
worst-case simulated drawdown (WCDD-95), distribution statistics, score decomposition
(drawdown resilience, failure rate, regime consistency, return stability, path
sensitivity), best/median/worst path equity curves, and example trades per regime.

For long-warm-up indicators (e.g. SMA-200, requiring 300 days), synthetic cases are
automatically skipped if the case duration cannot accommodate warm-up plus a tradeable
window. The skip is reported transparently in the response.

JSON access: `GET /api/v1/strategies/{slug}`.

## Per-Asset Case Coverage

- **SPY**: Bull Run 2017, China Sideways 2015, Fed Bear 2022 H2, Lehman GFC 2008,
  COVID 2020, Vol Shock Feb 2018, plus 4 synthetic probes
- **QQQ**: Dotcom Crash 2000, Tech Rally 2017, COVID + Tech V-Recovery, Tech Bear
  2022, Vol Shock Feb 2018, plus 4 synthetic probes
- **VIX**: Low Vol 2017, Volmageddon 2018, COVID Spike 2020
- **BTC**: 2017 Parabolic ATH, Crypto Winter 2018, Luna Collapse 2022, FTX Collapse
  2022, Sideways 2023, plus 1 synthetic probe
- **ETH**: Crypto Winter 2018, Vol Shock 2018, COVID + DeFi Recovery 2020, plus 1
  synthetic probe
- **WTI**: Pre-GFC Oil Boom, Saudi Oil War 2014-2015, Sideways 2018, Negative Oil
  2020, plus 1 synthetic probe
- **GOLD**: Post-GFC Inflation Hedge 2010, Sideways 2014, Taper Tantrum 2013, COVID
  + ATH 2020, plus 2 synthetic probes

API endpoints:
- `GET /api/v1/strategies/{slug}/cases?asset={asset}` — all cases for an asset
  (Strategy vs Buy-and-Hold reference)
- `GET /api/v1/strategies/{slug}/cases/{asset}/{case_id}` — single case with
  equity curves
- `GET /api/v1/strategies/case-assets/list` — assets with case studies

## Example Queries This Site Answers

- "How does SMA 200 crossover perform in the 2008 financial crisis?"
- "Is RSI 30/70 robust during the Luna collapse?"
- "Which trading strategies survive sharp crashes historically?"
- "How fragile are mean-reversion strategies under volatility expansion?"
- "Strategy behavior during whipsaw markets — what works?"
- "Worst-case drawdown for momentum strategies in COVID 2020?"
- "Buy-and-hold vs trend-following during slow bear regimes?"
- "Cross-asset robustness: does an SPY-tuned strategy work on Bitcoin?"

Each case-study response provides: strategy total return, maximum drawdown,
Sharpe ratio, equity curve (single curve for empirical anchors; median plus
p25/p75 band for synthetic cases), and buy-and-hold reference for direct
comparison.

## Failure-Mode Definitions

The 10-axis failure-mode taxonomy is defined at `/methodology` with operational
thresholds, historical examples, common strategy failure patterns, and
distinctions between similar regimes. Each definition is published as a
Schema.org DefinedTerm to support machine extraction.

Direct anchors: `/methodology#failure-mode-{slug}` (e.g. sharp-crash, vol-expansion,
whipsaw, liquidity-stress).

## Verifiability — Falsifiable Model Claims

Synthetic stress profiles are accompanied by measurable statistical claims.
For each profile-asset combination, the site publishes claimed operational
properties (drawdown ranges, realized-volatility bounds, sign-change
frequency, etc.) alongside the measured aggregate over 50 Monte Carlo
replicas — including transparent disclosure of where measured values
deviate from the claimed range.

- `/verifiability` — browsable per-dataset claim validation with aggregated
  metrics across all 32 synthetic stress datasets, plus the full off-band
  calibration disclosure (BTC empirical-identifiability verification, Wilson
  95% CI for marginal cases, 9-point methodology-limitations list including
  score definition-dependence)
- `/verifiability_snapshot.json` — raw machine-readable snapshot
  (CC-BY 4.0, refreshed on data regeneration)
- Schema.org Dataset markup published on the verifiability page

This is the falsifiability layer: model behavior is not asserted but
measured against published claims.

## Pages

- `/` — Strategy Lab: parse a natural-language strategy and run it against the
  case catalog
- `/s/{slug}` — Pre-computed strategy reports
- `/methodology` — Failure-mode taxonomy, simulator architecture, CoT calibration,
  backtest-correctness methodology, scope of claims
- `/strategy-guide` — Parser syntax: supported indicators, operators, examples
- `/demo` — Interactive walkthrough
- `/documentation` — User guide and metric interpretation
- `/legal/impressum` — Impressum (German)
- `/legal/datenschutz` — DSGVO/GDPR privacy policy
- `/legal/disclaimer` — Legal disclaimer

## API Endpoints (read-only, public)

- `GET /api/v1/strategies/` — list of pre-computed strategies
- `GET /api/v1/strategies/{slug}` — full strategy report (JSON)
- `GET /api/v1/strategies/{slug}/cases?asset={asset}` — case-study results per asset
- `GET /api/v1/strategies/{slug}/cases/{asset}/{case_id}` — single case detail
- `POST /api/v1/nl/parse` — parse a natural-language strategy description
- `POST /api/v1/backtest/robustness/quick` — run a strategy against the case catalog

## Technical Details

- Backend: FastAPI (Python), async SQLAlchemy, deployed via Railway
- Frontend: React 19 + TypeScript + Tailwind CSS + Vite
- Strategy parsing: LLM-based with structured-output JSON-Schema enforcement
  (GPT-4o-mini via OpenRouter, Llama via Groq, Claude as fallback)
- Synthetic-probe simulator: Numba JIT-compiled, per-asset CoT calibration
- Backtest engine: Löw, Maier-Paape & Platen (2015) ambiguous-candle methodology

## Important Legal Notice

This tool is for informational and educational purposes only. It does not
constitute financial advice, investment recommendations, or forecasts. Results
are based on model-driven simulations under simplified assumptions. Real market
outcomes may differ significantly. Individual financial circumstances are not
considered.

Operated under German jurisdiction (BaFin/WpHG framework). All published
analyses use neutral framing — no rankings, no directive language, no buy/sell
signals.

## Contact

Website: https://crashtestyourstrategy.com
Legal: /legal/impressum