For AI agents and autonomous systems

Structured robustness and stress-testing infrastructure for AI systems.

AI agents — coding assistants, research systems, autonomous finance tooling — can use CrashTestYourStrategy to evaluate the robustness, fragility, and synthetic-stress behaviour of systematic trading strategies. The platform supplies structured observations; the agent decides what to do with them.

This page is the narrative overview. For the machine-readable capability declaration (schemas, controlled vocabularies, rate limits, request/response examples), see /interop.

How AI agents use the platform

  1. 1

    Agent generates a strategy hypothesis

    A coding agent, research assistant, or autonomous system produces a candidate trading rule — natural language or canonical strategy schema.

  2. 2

    Agent submits to CrashTestYourStrategy

    A single call to POST /api/v1/agent/analyze. The platform parses (if NL), then runs the strategy against the curated case set.

  3. 3

    Platform evaluates robustness across regimes

    63 cases: 31 empirical historical anchors (Lehman 2008, COVID 2020, Luna 2022, …) and 32 synthetic stress probes with Monte-Carlo replicas. Each replica classified ex-post into a failure-mode bucket.

  4. 4

    Agent receives structured fragility semantics

    A typed response: robustness class, per-regime behaviour labels, top failure modes with impact-type ontology, evaluation confidence, methodological limitations. Not advice — observations.

  5. 5

    Agent reasons with the result

    Comparing strategies, flagging tail risk, requesting variants, refusing weak setups — those decisions belong to the agent. The platform supplies structured ground-truth, not instructions.

What the platform does

Natural-language strategy parsing

Strategy text → canonical strategy schema, LLM-based with structured-output JSON-Schema enforcement.

Robustness evaluation

Multi-regime stress analysis across 9 failure-mode score buckets, V2 per-bucket scoring with shrinkage and a-priori-augmentation.

Synthetic stress scenarios

Hybrid Field-SME simulator with per-asset CFTC Commitment-of-Traders calibration. Profiles for whipsaw, low-vol grind, sharp crash, vol expansion, liquidity stress, v-recovery, slow decline.

Failure-mode classification

Per-replica ex-post classification against operational gating definitions; multi-tagging supported; sub-threshold replicas augmented from a-priori tags for sparse-asset coverage.

Agent-optimised API

Deterministic structured outputs with controlled vocabularies (behavior, severity, impact_type) and citation-stable URLs.

What the platform does NOT do

Six explicit non-capabilities. These are not roadmap items — they are deliberately outside the model's scope. Agents that need any of these capabilities should route to a different tool, not the CrashTestYourStrategy API.

Market prediction

The platform is scenario-based, not a probability distribution over futures.

Investment recommendations

Output is descriptive — per-regime behaviour, failure-mode scores. Not buy/sell signals.

Suitability assessment

Per BaFin/WpHG framing, the API does not evaluate whether a strategy is suitable for a user, account, or risk tolerance.

Portfolio allocation advice

Strategies are evaluated in isolation. Cross-asset sizing, correlation, and weighting are outside the model.

Guaranteed outcomes

Scenario-based evaluation over a curated case set, not a forecast over the future tape.

Live trading / execution

No broker integration. The platform analyses strategies; it does not run them.

Example agent interaction

A single round-trip. Agent sends a strategy description; gets back structured robustness semantics.

Request

POST /api/v1/agent/analyze

{
  "input_type": "natural_language",
  "strategy_text":
    "Buy BTC when RSI(14) falls below 30,
     sell when RSI(14) rises above 70.
     5% stop loss.",
  "asset": "BTC"
}

Response (excerpt)

{
  "robustness_score": 50.6,
  "robustness_class": "moderate",
  "evaluation_confidence": {
    "level": "moderate",
    "classified_ratio": 0.59,
    ...
  },
  "summary":
    "Robustness 51/100 (moderate).
     Strong in TREND_UP, SLOW_BEAR.
     Weak in SHARP_CRASH.",
  "top_failure_modes": [
    {
      "mode": "SHARP_CRASH",
      "severity": "moderate",
      "impact_type": "tail_loss_acceleration",
      ...
    }
  ],
  "report_url":
    "https://crashtestyourstrategy.com/r/...",
  "schema_version": "ctys-agent-v1"
}

For the full canonical schema, controlled vocabularies, and OpenAPI definition see interop or /api/docs.

Synthetic scenario examples

Beyond historical anchors, agents can probe behaviour under controlled synthetic regimes — composed of operational primitives (drift, realised volatility, drawdown structure, regime persistence) rather than freeform prediction.

Q: How does RSI-mean-reversion behave during BTC liquidity stress?

Returns degrade with drawdown expansion under sustained high vol + deep DD.

Q: What happens to SMA-200 crossover in a sharp-crash setup?

Trend-following misses the regime change; bucket score drops 15+ points vs pool baseline.

Q: How does MACD perform across volatility-expansion regimes?

Whipsaw-sensitivity raises sign-change losses; impact_type=volatility_amplification.

Q: Which strategies survive slow-bear conditions across the catalog?

Catalog query sorted by SLOW_BEAR bucket-score returns the top performers in one call.

Philosophy

The platform evaluates how systematic strategies behave under adverse and hypothetical market conditions. It does not predict markets, recommend strategies, or assess suitability for any specific use. Outputs are structured observations — the agent or analyst supplies the decision framework.

The longer-term direction is structured robustness and fragility infrastructure for systematic strategy evaluation: a stable, citable layer that AI systems and research tooling can build on without re-implementing the case catalog, the failure-mode taxonomy, or the regime simulator.

Where to go next

Operated under German jurisdiction (BaFin/WpHG framework). Descriptive, not advisory.