Interop
Structured robustness and stress-testing semantics for systematic trading strategies.
This page is a capability declaration — a stable reference for LLM agents, research tooling, and developers integrating the CrashTestYourStrategy API. Outputs are descriptive (regime behaviour, failure-mode classification, model boundaries) — not advisory. Suitability and decision-making remain with the consumer.
Capabilities
| Capability | Description | Endpoint |
|---|---|---|
| Strategy parsing | Natural-language description → canonical strategy schema (entry/exit conditions, indicators, stops). | POST /api/v1/agent/analyze (input_type=natural_language) |
| Robustness evaluation | Run a strategy against a curated case set (31 empirical + 32 synthetic stress cases, 9 failure-mode buckets). | POST /api/v1/agent/analyze |
| Synthetic scenario modelling | Pre-computed synthetic stress probes (Hybrid Field-SME simulator, per-asset CoT calibration, 50 Monte-Carlo replicas per profile×asset). | Embedded in /agent/analyze responses (per_regime block). |
| Failure-mode classification | Per-replica ex-post FM-classification against operational gating definitions; per-FM-bucket scoring with shrinkage and a-priori-augmentation. | top_failure_modes + per_regime fields in /agent/analyze response. |
| Catalog discovery | Semantic search across pre-computed reports by indicator, asset, failure-mode coverage, regime class, or robustness class. | GET /api/v1/catalog/query |
Canonical input schema
The /agent/analyze endpoint accepts a discriminated union via the input_type field. Either pass a natural-language description (parsed via the same LLM-pipeline used by the web UI) or a canonical strategy JSON (skip the parse step, sometimes preferred when one agent generates and another verifies).
Variant A — natural language
{
"input_type": "natural_language",
"strategy_text": "Buy BTC when RSI(14) falls below 30, sell when above 70. 5% stop loss.",
"asset": "BTC",
"num_runs": 30
}Variant B — canonical strategy JSON
{
"input_type": "strategy_json",
"asset": "SPY",
"num_runs": 30,
"strategy": {
"position_sizing": "percent",
"position_size": 95,
"max_positions": 1,
"direction": "long_only",
"entry_conditions": [
{ "indicator": "rsi_14", "operator": "less_than", "value": 30 }
],
"exit_conditions": [
{ "indicator": "rsi_14", "operator": "greater_than", "value": 70 }
],
"stop_loss": 5.0
}
}Canonical output schema
The response is shaped for downstream reasoning, not UI rendering. Numeric, categorical, and narrative fields are separated; controlled vocabularies (behavior, severity, impact_type) are documented below. Schema family ctys-agent, current version ctys-agent-v1.
{
"robustness_score": 53.2,
"robustness_class": "moderate",
"evaluation_confidence": {
"level": "moderate",
"classified_ratio": 0.59,
"sparse_bucket_count": 5,
"augmented_replicas_total": 305,
"rationale": "59% of replicas classified ex-post; 5 bucket(s) augmented from a-priori tags."
},
"summary": "Robustness 53/100 (moderate). Strong in TREND_UP, SLOW_BEAR. Weak in LIQUIDITY_STRESS.",
"interpretation_text": "Returns vary substantially across the curated regimes — the model shows very different outcomes in bull versus bear and stress scenarios.",
"headline_metrics": {
"median_return": 0.05,
"p5_return": -0.541,
"wcdd_95": 0.57,
"median_drawdown": 0.19,
"failure_rate_pct": 12.0,
"n_replicas": 255,
"asset": "BTC"
},
"per_regime": [
{
"regime": "TREND_UP",
"label": "persistent directional uptrend",
"behavior": "stable",
"score": 81.6,
"n_replicas": 39,
"shrinkage_alpha": 1.0,
"n_augmented": 0
}
// ... 8 more buckets
],
"top_failure_modes": [
{
"mode": "LIQUIDITY_STRESS",
"severity": "high",
"impact_type": "drawdown_expansion",
"expected_behavior": "drawdowns deeper than the pool baseline in these scenarios",
"bucket_score": 37.0,
"delta_from_pool": -16.3
}
],
"regime_class_summary": {
"trend_regimes": { "avg_score": 73.7, "behavior": "stable", "buckets": ["TREND_UP", "TREND_DOWN", "SLOW_BEAR"] },
"tail_event_regimes": { "avg_score": 44.0, "behavior": "degrading", "buckets": ["SHARP_CRASH", "LIQUIDITY_STRESS"] }
},
"caveats": ["..."],
"methodological_limitations": ["..."],
"report_url": "https://crashtestyourstrategy.com/s/buy-hold-btc",
"methodology_url": "https://crashtestyourstrategy.com/methodology",
"schema_family": "ctys-agent",
"schema_version": "ctys-agent-v1"
}Semantics
Three strict controlled vocabularies and one regime-class taxonomy. Values are stable across schema versions within the same family; new values are additive and announced.
Behavior (per regime)
| Value | Meaning |
|---|---|
| stable | Bucket score matches or exceeds the overall-pool score; no meaningful regime-specific weakness. |
| sensitive | Bucket score below pool but within ~10 points; mild underperformance vs the pool baseline. |
| degrading | Bucket score >10 points below pool; meaningful regime-specific weakness. |
| inactive | Bucket contains <3 replicas (after augmentation); not enough data to characterise behavior. |
Severity (top failure modes)
| Value | Meaning |
|---|---|
| low | Bucket score within 5 points of pool baseline. |
| moderate | Bucket score 5–15 points below pool baseline. |
| high | Bucket score 15–25 points below pool baseline. |
| critical | Bucket score >25 points below pool baseline. |
Impact type (failure-mode ontology)
| Value | Meaning |
|---|---|
| drawdown_expansion | Tail drawdowns deepen under the regime conditions. |
| return_instability | Mean/median returns swing widely; high variance vs the pool. |
| volatility_amplification | Model outputs amplify when realised volatility crosses the regime threshold. |
| whipsaw_sensitivity | Frequent sign-changes consume capital through entry/exit churn. |
| trend_dependency | Performance contingent on persistent directional moves; degrades in their absence. |
| tail_loss_acceleration | Losses accelerate past the configured failure-drawdown threshold. |
Regime classes (for aggregation)
| Value | FM buckets included | Meaning |
|---|---|---|
| trend_regimes | TREND_UP · TREND_DOWN · SLOW_BEAR | Persistent directional moves. |
| mean_reverting_regimes | SIDEWAYS · WHIPSAW | Range-bound or sign-changing markets without strong direction. |
| volatility_regimes | VOL_EXPANSION · VOL_COMPRESSION | Realised-volatility regimes (above/below baseline). |
| tail_event_regimes | SHARP_CRASH · LIQUIDITY_STRESS | Abrupt-loss / stress-event regimes. |
The individual failure-mode definitions (TREND_UP, SHARP_CRASH, LIQUIDITY_STRESS, etc.) are documented as DefinedTerm JSON-LD on the Methodology page.
Example agent workflows
Workflow A — natural-language evaluation
User describes strategy in plain text
↓
POST /api/v1/agent/analyze { input_type: "natural_language", strategy_text: "..." }
↓
Agent reads { robustness_class, top_failure_modes, regime_class_summary }
↓
Agent generates a downstream analysis or summarisationWorkflow B — generator + verifier (multi-agent)
Agent A generates a canonical strategy JSON
↓
POST /api/v1/agent/analyze { input_type: "strategy_json", strategy: {...} }
↓
Agent B inspects evaluation_confidence + top_failure_modes
↓
Agent B accepts / rejects / requests revisionWorkflow C — discovery before execution
Agent searches existing reports
↓
GET /api/v1/catalog/query?indicator=RSI&asset=BTC
↓
Agent picks a matching slug
↓
GET /api/v1/strategies/{slug} (full pre-computed report)Catalog query examples
| Query string | Returns |
|---|---|
| /api/v1/catalog/query?indicator=RSI&asset=BTC | RSI strategies on BTC. |
| /api/v1/catalog/query?failure_mode=SHARP_CRASH | Strategies scored against the SHARP_CRASH bucket. |
| /api/v1/catalog/query?robustness_class=robust&min_score=70 | Top-tier robustness reports. |
| /api/v1/catalog/query?regime_class=tail_event_regimes | Strategies with coverage in tail-event regimes. |
Methodological boundaries
Every /agent/analyze response echoes these as methodological_limitations. They are part of the schema, not a separate disclosure document.
- Synthetic stress probes are not probabilistic forecasts — they impose conditions on an agent-based simulator, not draw from a distribution of futures.
- Intraday liquidity effects and execution slippage are simplified to a single per-bar assumption.
- Per-FM-bucket scores aggregate replicas across multiple cases via dominance-weighting; single-case attribution is approximate.
- Ex-post FM-classification gates use absolute thresholds (e.g. 1.5× baseline volatility) that may be asymmetric across assets.
- WCDD₉₅ reports the 95th-percentile across the curated case set, not a true out-of-sample tail estimate.
- Outputs describe model behaviour under the curated case set — they are not predictions, recommendations, or suitability assessments.
References
- /llms.txt — compact overview for LLM crawlers.
- /api/docs — OpenAPI / Swagger UI with full request & response schemas.
- /methodology— failure-mode taxonomy, case-catalog details, calibration sources.
- /verifiability— per-profile claim-vs-measurement validation snapshot.
- /sitemap.xml — full URL list for crawlers.
Operated under German jurisdiction (BaFin/WpHG framework). All outputs use neutral framing — no rankings, no directive language, no buy/sell signals, no suitability assessment.
Schema family: ctys-agent · current version: ctys-agent-v1 · catalog schema: ctys-catalog-v1