Interop

Structured robustness and stress-testing semantics for systematic trading strategies.

This page is a capability declaration — a stable reference for LLM agents, research tooling, and developers integrating the CrashTestYourStrategy API. Outputs are descriptive (regime behaviour, failure-mode classification, model boundaries) — not advisory. Suitability and decision-making remain with the consumer.

Capabilities

CapabilityDescriptionEndpoint
Strategy parsingNatural-language description → canonical strategy schema (entry/exit conditions, indicators, stops).POST /api/v1/agent/analyze (input_type=natural_language)
Robustness evaluationRun a strategy against a curated case set (31 empirical + 32 synthetic stress cases, 9 failure-mode buckets).POST /api/v1/agent/analyze
Synthetic scenario modellingPre-computed synthetic stress probes (Hybrid Field-SME simulator, per-asset CoT calibration, 50 Monte-Carlo replicas per profile×asset).Embedded in /agent/analyze responses (per_regime block).
Failure-mode classificationPer-replica ex-post FM-classification against operational gating definitions; per-FM-bucket scoring with shrinkage and a-priori-augmentation.top_failure_modes + per_regime fields in /agent/analyze response.
Catalog discoverySemantic search across pre-computed reports by indicator, asset, failure-mode coverage, regime class, or robustness class.GET /api/v1/catalog/query

Canonical input schema

The /agent/analyze endpoint accepts a discriminated union via the input_type field. Either pass a natural-language description (parsed via the same LLM-pipeline used by the web UI) or a canonical strategy JSON (skip the parse step, sometimes preferred when one agent generates and another verifies).

Variant A — natural language

{
  "input_type": "natural_language",
  "strategy_text": "Buy BTC when RSI(14) falls below 30, sell when above 70. 5% stop loss.",
  "asset": "BTC",
  "num_runs": 30
}

Variant B — canonical strategy JSON

{
  "input_type": "strategy_json",
  "asset": "SPY",
  "num_runs": 30,
  "strategy": {
    "position_sizing": "percent",
    "position_size": 95,
    "max_positions": 1,
    "direction": "long_only",
    "entry_conditions": [
      { "indicator": "rsi_14", "operator": "less_than", "value": 30 }
    ],
    "exit_conditions": [
      { "indicator": "rsi_14", "operator": "greater_than", "value": 70 }
    ],
    "stop_loss": 5.0
  }
}

Canonical output schema

The response is shaped for downstream reasoning, not UI rendering. Numeric, categorical, and narrative fields are separated; controlled vocabularies (behavior, severity, impact_type) are documented below. Schema family ctys-agent, current version ctys-agent-v1.

{
  "robustness_score": 53.2,
  "robustness_class": "moderate",
  "evaluation_confidence": {
    "level": "moderate",
    "classified_ratio": 0.59,
    "sparse_bucket_count": 5,
    "augmented_replicas_total": 305,
    "rationale": "59% of replicas classified ex-post; 5 bucket(s) augmented from a-priori tags."
  },
  "summary": "Robustness 53/100 (moderate). Strong in TREND_UP, SLOW_BEAR. Weak in LIQUIDITY_STRESS.",
  "interpretation_text": "Returns vary substantially across the curated regimes — the model shows very different outcomes in bull versus bear and stress scenarios.",
  "headline_metrics": {
    "median_return": 0.05,
    "p5_return": -0.541,
    "wcdd_95": 0.57,
    "median_drawdown": 0.19,
    "failure_rate_pct": 12.0,
    "n_replicas": 255,
    "asset": "BTC"
  },
  "per_regime": [
    {
      "regime": "TREND_UP",
      "label": "persistent directional uptrend",
      "behavior": "stable",
      "score": 81.6,
      "n_replicas": 39,
      "shrinkage_alpha": 1.0,
      "n_augmented": 0
    }
    // ... 8 more buckets
  ],
  "top_failure_modes": [
    {
      "mode": "LIQUIDITY_STRESS",
      "severity": "high",
      "impact_type": "drawdown_expansion",
      "expected_behavior": "drawdowns deeper than the pool baseline in these scenarios",
      "bucket_score": 37.0,
      "delta_from_pool": -16.3
    }
  ],
  "regime_class_summary": {
    "trend_regimes":         { "avg_score": 73.7, "behavior": "stable",    "buckets": ["TREND_UP", "TREND_DOWN", "SLOW_BEAR"] },
    "tail_event_regimes":    { "avg_score": 44.0, "behavior": "degrading", "buckets": ["SHARP_CRASH", "LIQUIDITY_STRESS"] }
  },
  "caveats": ["..."],
  "methodological_limitations": ["..."],
  "report_url": "https://crashtestyourstrategy.com/s/buy-hold-btc",
  "methodology_url": "https://crashtestyourstrategy.com/methodology",
  "schema_family": "ctys-agent",
  "schema_version": "ctys-agent-v1"
}

Semantics

Three strict controlled vocabularies and one regime-class taxonomy. Values are stable across schema versions within the same family; new values are additive and announced.

Behavior (per regime)

ValueMeaning
stableBucket score matches or exceeds the overall-pool score; no meaningful regime-specific weakness.
sensitiveBucket score below pool but within ~10 points; mild underperformance vs the pool baseline.
degradingBucket score >10 points below pool; meaningful regime-specific weakness.
inactiveBucket contains <3 replicas (after augmentation); not enough data to characterise behavior.

Severity (top failure modes)

ValueMeaning
lowBucket score within 5 points of pool baseline.
moderateBucket score 5–15 points below pool baseline.
highBucket score 15–25 points below pool baseline.
criticalBucket score >25 points below pool baseline.

Impact type (failure-mode ontology)

ValueMeaning
drawdown_expansionTail drawdowns deepen under the regime conditions.
return_instabilityMean/median returns swing widely; high variance vs the pool.
volatility_amplificationModel outputs amplify when realised volatility crosses the regime threshold.
whipsaw_sensitivityFrequent sign-changes consume capital through entry/exit churn.
trend_dependencyPerformance contingent on persistent directional moves; degrades in their absence.
tail_loss_accelerationLosses accelerate past the configured failure-drawdown threshold.

Regime classes (for aggregation)

ValueFM buckets includedMeaning
trend_regimesTREND_UP · TREND_DOWN · SLOW_BEARPersistent directional moves.
mean_reverting_regimesSIDEWAYS · WHIPSAWRange-bound or sign-changing markets without strong direction.
volatility_regimesVOL_EXPANSION · VOL_COMPRESSIONRealised-volatility regimes (above/below baseline).
tail_event_regimesSHARP_CRASH · LIQUIDITY_STRESSAbrupt-loss / stress-event regimes.

The individual failure-mode definitions (TREND_UP, SHARP_CRASH, LIQUIDITY_STRESS, etc.) are documented as DefinedTerm JSON-LD on the Methodology page.

Example agent workflows

Workflow A — natural-language evaluation

User describes strategy in plain text
   ↓
POST /api/v1/agent/analyze   { input_type: "natural_language", strategy_text: "..." }
   ↓
Agent reads { robustness_class, top_failure_modes, regime_class_summary }
   ↓
Agent generates a downstream analysis or summarisation

Workflow B — generator + verifier (multi-agent)

Agent A generates a canonical strategy JSON
   ↓
POST /api/v1/agent/analyze   { input_type: "strategy_json", strategy: {...} }
   ↓
Agent B inspects evaluation_confidence + top_failure_modes
   ↓
Agent B accepts / rejects / requests revision

Workflow C — discovery before execution

Agent searches existing reports
   ↓
GET /api/v1/catalog/query?indicator=RSI&asset=BTC
   ↓
Agent picks a matching slug
   ↓
GET /api/v1/strategies/{slug}   (full pre-computed report)

Catalog query examples

Query stringReturns
/api/v1/catalog/query?indicator=RSI&asset=BTCRSI strategies on BTC.
/api/v1/catalog/query?failure_mode=SHARP_CRASHStrategies scored against the SHARP_CRASH bucket.
/api/v1/catalog/query?robustness_class=robust&min_score=70Top-tier robustness reports.
/api/v1/catalog/query?regime_class=tail_event_regimesStrategies with coverage in tail-event regimes.

Methodological boundaries

Every /agent/analyze response echoes these as methodological_limitations. They are part of the schema, not a separate disclosure document.

  • Synthetic stress probes are not probabilistic forecasts — they impose conditions on an agent-based simulator, not draw from a distribution of futures.
  • Intraday liquidity effects and execution slippage are simplified to a single per-bar assumption.
  • Per-FM-bucket scores aggregate replicas across multiple cases via dominance-weighting; single-case attribution is approximate.
  • Ex-post FM-classification gates use absolute thresholds (e.g. 1.5× baseline volatility) that may be asymmetric across assets.
  • WCDD₉₅ reports the 95th-percentile across the curated case set, not a true out-of-sample tail estimate.
  • Outputs describe model behaviour under the curated case set — they are not predictions, recommendations, or suitability assessments.

References

  • /llms.txt — compact overview for LLM crawlers.
  • /api/docs — OpenAPI / Swagger UI with full request & response schemas.
  • /methodology— failure-mode taxonomy, case-catalog details, calibration sources.
  • /verifiability— per-profile claim-vs-measurement validation snapshot.
  • /sitemap.xml — full URL list for crawlers.

Operated under German jurisdiction (BaFin/WpHG framework). All outputs use neutral framing — no rankings, no directive language, no buy/sell signals, no suitability assessment.

Schema family: ctys-agent · current version: ctys-agent-v1 · catalog schema: ctys-catalog-v1