Agent API · capability declaration
Build on the crash-test gate
A capability declaration for stress diagnostics that confront a proposed strategy or portfolio with the stress conditions it ignored, before the recommendation ships. Available on request via a form.
This page is the machine-readable capability declaration: the primitives, the response envelope, and the controlled vocabularies. Everything here is descriptive, not advisory — it grounds risk; what to do with that grounding stays with the consumer. The narrative overview is on the agent home; how the simulator works is in the methodology and reference.
Access
On request. Describe a strategy or a portfolio; it is run internally on the engine and the structured result — the schemas and vocabularies declared below — is returned by email.
crashtestyourstrategy.com/contact →There is no public compute endpoint. The schemas, controlled vocabularies and response envelope below describe what a request returns.
Primitives — tools
Portfolio stress
portfolio_stress_test(holdings)
Stress a portfolio across baseline, risk-off-crisis and rate-shock regimes over the SPY/TLT/GOLD/BTC substrate. Returns per-scenario drawdown, VaR / expected shortfall, leg decomposition, a cross-asset finding (does the hedge hold or break), the full per-path drawdown distribution (quantiles p50–p95/worst), probability-weighted scenario summaries (unconditional substrate shares + a nowcast tilt from the validated regime layer), and a buy-and-hold-vs-drawdown-stop comparison.
factor_decomposition(holdings)
Euler risk-contribution decomposition — where the risk actually sits versus the capital weights. A 60/40 is ~83% equity risk; a 50/50 SPY/BTC is ~86% BTC risk despite equal capital.
ips_gate(holdings, max_drawdown_tolerance, time_horizon_years, liquidity_need)
A hard gate against an Investment Policy Statement, run before a portfolio is accepted — checked against the full drawdown distribution, not just the typical path: reports the breach probability (share of simulated paths exceeding the stated tolerance) and flags material tail risk even when the median path passes; plus horizon and liquidity checks.
portfolio_compare(holdings_a, holdings_b)
Paired comparison of a reference portfolio vs a candidate revision on identical simulated paths — every delta is attributable to the weights, not seed noise. Flags a candidate that deepens the worst-path drawdown or introduces a new diversification failure.
long_horizon_stress(holdings, horizon_years, monthly_contribution | monthly_withdrawal, …)
Multi-year wealth paths for a savings or withdrawal plan: terminal-wealth quantiles (nominal + real), ruin/shortfall probabilities, a sequence-of-returns diagnosis (same plan, bad vs good first two years) and a drift-sensitivity block. Long-run drift is a stated, overridable assumption (disclosed alongside the substrate's raw stress drift); costs on by default. No rate, allocation, or product is recommended.
regime_outlook(asset, horizon_days, as_of?)
Model-conditional probabilities that an asset (SPY/QQQ/GLD/TLT) is in each regime (BULL/SIDEWAYS/BEAR/CRISIS) after 5 or 21 trading days, with persistence and unconditional baselines alongside. Preregistered, out-of-sample validated; annual seasonality was tested, falsified and excluded. Descriptive probabilities of operationally defined regime classes — not a market prediction.
Strategy & backtest grounding
challenge_strategy(strategy_id)
The full cross-regime gate for a strategy: where it degrades across the failure-mode taxonomy, plus a revision signal.
backtest_integrity(annualized_sharpe, n_trials, backtest_start, backtest_end, …)
Confront a backtest claim with its over-optimism: the deflated Sharpe (the maximum Sharpe reachable by chance grows with the number of configurations tried — Bailey & López de Prado) and which crisis regimes were absent from the backtest window.
run_stress_test(profile_hint)
Buy-and-hold diagnostic on one cached synthetic regime — a fast single-regime probe.
Regime discovery & feedback
find_similar_regime(descriptors | profile_hint)
Nearest-neighbour search over the regime descriptor space — find the stress regimes closest to a described market.
describe_regime(profile_hint)
The full descriptor profile of a named regime (return, clustering, volatility, drawdown character).
submit_feedback(...)
Persist structured agent feedback (observation + suggested_action) about a response — the platform learns where it is weak.
Primitives — resources
Read-only, app-controlled context an agent can pull before or after a tool call:
| URI | Returns |
|---|---|
| validation://{asset} | Per-asset realism evidence — the 18 daily stylized facts measured against the historical reference bands. |
| validation://value | Cross-asset value basis — regime-conditional correlation and tail dependence vs a Gauss-copula baseline, with honest caveats. |
| ontology://failure-modes | The failure-mode taxonomy — the same set as the /ontology pages. |
| ontology://failure-behaviors | The failure-behavior vocabulary used in diagnostics. |
| ontology://regime-descriptors | The 18 behavioral descriptors that form the regime embedding axes. |
| portfolio://universe | The assets a Tier-1 portfolio request can be composed from. |
| regimes://available | The profile_hints currently servable by run_stress_test. |
| methodology://overview | A brief pointer to the methodology, canonical on .com. |
| feedback://insights | Aggregated agent-feedback statistics. |
The response envelope
Every primitive returns the same envelope, so the grounding can be enforced uniformly. The two load-bearing fields are revision_required (the gate) and grounding_summary (the one sentence an agent must carry into its answer).
| Field | Meaning |
|---|---|
| schema_version | e.g. ctys-agent-portfolio-v1 — the primitive’s versioned schema. |
| request_id | UUID for the call, for correlation and feedback. |
| generated_utc | ISO timestamp. |
| methodology_url | Link back to the canonical methodology. |
| revision_required | Boolean gate — true when the result warrants the agent revising its proposal before presenting it. |
| grounding_summary | One factual risk sentence — the headline the agent must not omit. |
| methodological_limitations | The model’s boundaries, baked into every response — not a separate disclosure. |
| _feedback | How to submit structured feedback on this response. |
Example — portfolio_stress_test
The classic "safe" 60/40 — and what a rate shock does to it.
// what a request describes
portfolio_stress_test({
holdings: [
{ asset: "SPY", weight: 0.6 },
{ asset: "TLT", weight: 0.4 } // the "safe" 60/40
]
}){
"schema_version": "ctys-agent-portfolio-v1",
"request_id": "…",
"revision_required": true,
"grounding_summary": "In a rate shock both legs fall together — the hedge breaks.",
"universe": ["SPY", "TLT", "GOLD", "BTC"],
"portfolio": [{ "asset": "SPY", "weight": 0.6 }, { "asset": "TLT", "weight": 0.4 }],
"substrate_version": "substrate_v1",
"portfolio_scenarios": [
{ "scenario": "baseline", "severity": "low",
"cross_asset_finding": { "behavior": "diversification_intact" } },
{ "scenario": "risk_off_crisis", "severity": "moderate",
"cross_asset_finding": { "behavior": "hedge_holds" } },
{ "scenario": "rate_shock", "severity": "high",
"cross_asset_finding": { "behavior": "hedge_breaks" },
"portfolio_worst_episode_drawdown": -0.14 }
],
"derisk_comparison": {
"rule": "drawdown_stop",
"rule_detail": "Exit to cash at -10% drawdown; re-enter at the 50-day MA",
"full_path": { "buy_hold_worst": -0.14, "derisk_worst": -0.10 }
},
"realism_basis": { "evidence_resource": "validation://{asset}" },
"methodological_limitations": [ "…" ]
}Abbreviated. The hedge-break finding and the −0.14 → −0.10 drawdown-stop improvement are from the live endpoint; full per-scenario fields (VaR, expected shortfall, leg decomposition) are in the schema.
Controlled vocabularies
Stable within the ctys-agent-v1 family; new values are additive and announced.
Diversification behavior (per scenario)
| Value | Meaning |
|---|---|
| diversification_intact | No major holding declined materially in the scenario. |
| hedge_holds | A leg rose while another fell — the hedge offset the loss. |
| hedge_breaks | A usually-offsetting leg fell together with the others — the hedge failed when it was needed (the 2022 stock-bond case). |
| shared_drawdown | Several major legs declined together — diversification did not help. |
| concentrated_loss | One leg drove the loss; the others were roughly flat. |
Severity (per scenario)
| Value | Meaning |
|---|---|
| low | Shallow, contained drawdown in the scenario. |
| moderate | A noticeable but survivable drawdown. |
| high | A deep drawdown a typical mandate would struggle to hold through. |
| critical | A severe, mandate-threatening drawdown. |
Behavior (per regime, strategy diagnostics)
| Value | Meaning |
|---|---|
| stable | Holds up under the regime; no meaningful regime-specific weakness. |
| sensitive | Mild underperformance versus the pool baseline. |
| degrading | Meaningful regime-specific weakness. |
| inactive | Too little data in the bucket to characterise behaviour. |
Impact type (failure-mode ontology)
| Value | Meaning |
|---|---|
| drawdown_expansion | Tail drawdowns deepen under the regime conditions. |
| return_instability | Mean / median returns swing widely; high variance vs the pool. |
| volatility_amplification | Outputs amplify when realised volatility crosses the regime threshold. |
| whipsaw_sensitivity | Frequent sign-changes consume capital through entry / exit churn. |
| trend_dependency | Performance contingent on persistent directional moves; degrades in their absence. |
| tail_loss_acceleration | Losses accelerate past the configured failure-drawdown threshold. |
Regime classes (for aggregation)
| Value | FM buckets | Meaning |
|---|---|---|
| trend_regimes | TREND_UP · TREND_DOWN · SLOW_BEAR | Persistent directional moves. |
| mean_reverting_regimes | SIDEWAYS · WHIPSAW | Range-bound or sign-changing markets without strong direction. |
| volatility_regimes | VOL_EXPANSION · VOL_COMPRESSION | Realised-volatility regimes (above / below baseline). |
| tail_event_regimes | SHARP_CRASH · LIQUIDITY_STRESS | Abrupt-loss / stress-event regimes. |
The individual failure-mode definitions (TREND_UP, SHARP_CRASH, LIQUIDITY_STRESS, …) are published as citable DefinedTerm pages in the ontology.
Capability matrix
A compact yes / no for capability-matching. Items marked no are deliberate exclusions — outside the model on purpose, not roadmap gaps.
| Capability | Supported | Note |
|---|---|---|
| Portfolio stress (cross-asset, regime-conditional) | ✓ yes | SPY/TLT/GOLD/BTC substrate; hedge-hold vs hedge-break detection. |
| Stock-bond hedge-break detection (the 2022 case) | ✓ yes | Shared-drawdown / rate-shock scenario. |
| Risk-concentration decomposition (capital vs risk) | ✓ yes | Euler risk contributions. |
| Backtest over-optimism check (deflated Sharpe) | ✓ yes | Bailey & López de Prado; + crisis-coverage gaps. |
| Strategy failure-mode classification | ✓ yes | Cross-regime gate over the taxonomy. |
| Realism / value validation (citable resources) | ✓ yes | validation://{asset} + validation://value. |
| Probabilistic forecasting / price prediction | — no | Scenario-based; not a distribution over futures. |
| Investment recommendations / suitability | — no | Descriptive output only; BaFin/WpHG-aware. |
| Portfolio allocation / position-sizing advice | — no | It stresses a portfolio; it does not tell you how to weight one. |
| Fundamentals / macro analysis | — no | Price-stress of systematic strategies only. |
| Live trading / order execution | — no | No broker integration; not an execution platform. |
Methodological boundaries
Echoed as methodological_limitations on every response — part of the schema, not a separate disclosure.
- —Outputs are descriptive — regime behaviour, failure modes, drawdown distributions, risk decomposition. They are not predictions, recommendations, or suitability assessments.
- —The portfolio substrate is Tier-1: a fixed SPY/TLT/GOLD/BTC universe of pre-computed joint paths, re-weighted per request. Custom assets are Tier-2 and deferred.
- —The cross-asset model adds value in regime-conditional correlation and joint-crash tail dependence; single-asset daily marginals are at parity with cheap baselines (no unique edge claimed there).
- —Synthetic regimes impose conditions on a regime-switching simulator — they are not draws from a distribution of futures.
- —Drawdown and tail metrics are computed over the substrate paths, not a true out-of-sample tail estimate.
- —Scope is price-stress of systematic strategies and portfolios — not fundamentals, macro, suitability, or execution.
References
- /llms.txt — compact overview for LLM crawlers, incl. the live platform.
- /methodology— how the simulator works, the failure-mode taxonomy.
- /reference— architecture, bands, cross-asset coupling, limits.
- /verifiability— stylized-fact realism + value-vs-copula validation.
- /ontology— the failure-mode DefinedTerm pages.
Operated under German jurisdiction (BaFin / WpHG framework). All outputs use neutral framing — no rankings, no directive language, no buy / sell signals, no suitability assessment.
Schema family ctys-agent-v1 · portfolio primitive ctys-agent-portfolio-v1.