Documentation menu

Agent API · capability declaration

Build on the crash-test gate

A capability declaration for stress diagnostics that confront a proposed strategy or portfolio with the stress conditions it ignored, before the recommendation ships. Available on request via a form.

This page is the machine-readable capability declaration: the primitives, the response envelope, and the controlled vocabularies. Everything here is descriptive, not advisory — it grounds risk; what to do with that grounding stays with the consumer. The narrative overview is on the agent home; how the simulator works is in the methodology and reference.

Access

On request. Describe a strategy or a portfolio; it is run internally on the engine and the structured result — the schemas and vocabularies declared below — is returned by email.

crashtestyourstrategy.com/contact →

There is no public compute endpoint. The schemas, controlled vocabularies and response envelope below describe what a request returns.

Primitives — tools

Portfolio stress

portfolio_stress_test(holdings)

Stress a portfolio across baseline, risk-off-crisis and rate-shock regimes over the SPY/TLT/GOLD/BTC substrate. Returns per-scenario drawdown, VaR / expected shortfall, leg decomposition, a cross-asset finding (does the hedge hold or break), the full per-path drawdown distribution (quantiles p50–p95/worst), probability-weighted scenario summaries (unconditional substrate shares + a nowcast tilt from the validated regime layer), and a buy-and-hold-vs-drawdown-stop comparison.

factor_decomposition(holdings)

Euler risk-contribution decomposition — where the risk actually sits versus the capital weights. A 60/40 is ~83% equity risk; a 50/50 SPY/BTC is ~86% BTC risk despite equal capital.

ips_gate(holdings, max_drawdown_tolerance, time_horizon_years, liquidity_need)

A hard gate against an Investment Policy Statement, run before a portfolio is accepted — checked against the full drawdown distribution, not just the typical path: reports the breach probability (share of simulated paths exceeding the stated tolerance) and flags material tail risk even when the median path passes; plus horizon and liquidity checks.

portfolio_compare(holdings_a, holdings_b)

Paired comparison of a reference portfolio vs a candidate revision on identical simulated paths — every delta is attributable to the weights, not seed noise. Flags a candidate that deepens the worst-path drawdown or introduces a new diversification failure.

long_horizon_stress(holdings, horizon_years, monthly_contribution | monthly_withdrawal, …)

Multi-year wealth paths for a savings or withdrawal plan: terminal-wealth quantiles (nominal + real), ruin/shortfall probabilities, a sequence-of-returns diagnosis (same plan, bad vs good first two years) and a drift-sensitivity block. Long-run drift is a stated, overridable assumption (disclosed alongside the substrate's raw stress drift); costs on by default. No rate, allocation, or product is recommended.

regime_outlook(asset, horizon_days, as_of?)

Model-conditional probabilities that an asset (SPY/QQQ/GLD/TLT) is in each regime (BULL/SIDEWAYS/BEAR/CRISIS) after 5 or 21 trading days, with persistence and unconditional baselines alongside. Preregistered, out-of-sample validated; annual seasonality was tested, falsified and excluded. Descriptive probabilities of operationally defined regime classes — not a market prediction.

Strategy & backtest grounding

challenge_strategy(strategy_id)

The full cross-regime gate for a strategy: where it degrades across the failure-mode taxonomy, plus a revision signal.

backtest_integrity(annualized_sharpe, n_trials, backtest_start, backtest_end, …)

Confront a backtest claim with its over-optimism: the deflated Sharpe (the maximum Sharpe reachable by chance grows with the number of configurations tried — Bailey & López de Prado) and which crisis regimes were absent from the backtest window.

run_stress_test(profile_hint)

Buy-and-hold diagnostic on one cached synthetic regime — a fast single-regime probe.

Regime discovery & feedback

find_similar_regime(descriptors | profile_hint)

Nearest-neighbour search over the regime descriptor space — find the stress regimes closest to a described market.

describe_regime(profile_hint)

The full descriptor profile of a named regime (return, clustering, volatility, drawdown character).

submit_feedback(...)

Persist structured agent feedback (observation + suggested_action) about a response — the platform learns where it is weak.

Primitives — resources

Read-only, app-controlled context an agent can pull before or after a tool call:

URIReturns
validation://{asset}Per-asset realism evidence — the 18 daily stylized facts measured against the historical reference bands.
validation://valueCross-asset value basis — regime-conditional correlation and tail dependence vs a Gauss-copula baseline, with honest caveats.
ontology://failure-modesThe failure-mode taxonomy — the same set as the /ontology pages.
ontology://failure-behaviorsThe failure-behavior vocabulary used in diagnostics.
ontology://regime-descriptorsThe 18 behavioral descriptors that form the regime embedding axes.
portfolio://universeThe assets a Tier-1 portfolio request can be composed from.
regimes://availableThe profile_hints currently servable by run_stress_test.
methodology://overviewA brief pointer to the methodology, canonical on .com.
feedback://insightsAggregated agent-feedback statistics.

The response envelope

Every primitive returns the same envelope, so the grounding can be enforced uniformly. The two load-bearing fields are revision_required (the gate) and grounding_summary (the one sentence an agent must carry into its answer).

FieldMeaning
schema_versione.g. ctys-agent-portfolio-v1 — the primitive’s versioned schema.
request_idUUID for the call, for correlation and feedback.
generated_utcISO timestamp.
methodology_urlLink back to the canonical methodology.
revision_requiredBoolean gate — true when the result warrants the agent revising its proposal before presenting it.
grounding_summaryOne factual risk sentence — the headline the agent must not omit.
methodological_limitationsThe model’s boundaries, baked into every response — not a separate disclosure.
_feedbackHow to submit structured feedback on this response.

Example — portfolio_stress_test

The classic "safe" 60/40 — and what a rate shock does to it.

// what a request describes
portfolio_stress_test({
  holdings: [
    { asset: "SPY", weight: 0.6 },
    { asset: "TLT", weight: 0.4 }   // the "safe" 60/40
  ]
})
{
  "schema_version": "ctys-agent-portfolio-v1",
  "request_id": "…",
  "revision_required": true,
  "grounding_summary": "In a rate shock both legs fall together — the hedge breaks.",
  "universe": ["SPY", "TLT", "GOLD", "BTC"],
  "portfolio": [{ "asset": "SPY", "weight": 0.6 }, { "asset": "TLT", "weight": 0.4 }],
  "substrate_version": "substrate_v1",
  "portfolio_scenarios": [
    { "scenario": "baseline",        "severity": "low",
      "cross_asset_finding": { "behavior": "diversification_intact" } },
    { "scenario": "risk_off_crisis", "severity": "moderate",
      "cross_asset_finding": { "behavior": "hedge_holds" } },
    { "scenario": "rate_shock",      "severity": "high",
      "cross_asset_finding": { "behavior": "hedge_breaks" },
      "portfolio_worst_episode_drawdown": -0.14 }
  ],
  "derisk_comparison": {
    "rule": "drawdown_stop",
    "rule_detail": "Exit to cash at -10% drawdown; re-enter at the 50-day MA",
    "full_path": { "buy_hold_worst": -0.14, "derisk_worst": -0.10 }
  },
  "realism_basis": { "evidence_resource": "validation://{asset}" },
  "methodological_limitations": [ "…" ]
}

Abbreviated. The hedge-break finding and the −0.14 → −0.10 drawdown-stop improvement are from the live endpoint; full per-scenario fields (VaR, expected shortfall, leg decomposition) are in the schema.

Controlled vocabularies

Stable within the ctys-agent-v1 family; new values are additive and announced.

Diversification behavior (per scenario)

ValueMeaning
diversification_intactNo major holding declined materially in the scenario.
hedge_holdsA leg rose while another fell — the hedge offset the loss.
hedge_breaksA usually-offsetting leg fell together with the others — the hedge failed when it was needed (the 2022 stock-bond case).
shared_drawdownSeveral major legs declined together — diversification did not help.
concentrated_lossOne leg drove the loss; the others were roughly flat.

Severity (per scenario)

ValueMeaning
lowShallow, contained drawdown in the scenario.
moderateA noticeable but survivable drawdown.
highA deep drawdown a typical mandate would struggle to hold through.
criticalA severe, mandate-threatening drawdown.

Behavior (per regime, strategy diagnostics)

ValueMeaning
stableHolds up under the regime; no meaningful regime-specific weakness.
sensitiveMild underperformance versus the pool baseline.
degradingMeaningful regime-specific weakness.
inactiveToo little data in the bucket to characterise behaviour.

Impact type (failure-mode ontology)

ValueMeaning
drawdown_expansionTail drawdowns deepen under the regime conditions.
return_instabilityMean / median returns swing widely; high variance vs the pool.
volatility_amplificationOutputs amplify when realised volatility crosses the regime threshold.
whipsaw_sensitivityFrequent sign-changes consume capital through entry / exit churn.
trend_dependencyPerformance contingent on persistent directional moves; degrades in their absence.
tail_loss_accelerationLosses accelerate past the configured failure-drawdown threshold.

Regime classes (for aggregation)

ValueFM bucketsMeaning
trend_regimesTREND_UP · TREND_DOWN · SLOW_BEARPersistent directional moves.
mean_reverting_regimesSIDEWAYS · WHIPSAWRange-bound or sign-changing markets without strong direction.
volatility_regimesVOL_EXPANSION · VOL_COMPRESSIONRealised-volatility regimes (above / below baseline).
tail_event_regimesSHARP_CRASH · LIQUIDITY_STRESSAbrupt-loss / stress-event regimes.

The individual failure-mode definitions (TREND_UP, SHARP_CRASH, LIQUIDITY_STRESS, …) are published as citable DefinedTerm pages in the ontology.

Capability matrix

A compact yes / no for capability-matching. Items marked no are deliberate exclusions — outside the model on purpose, not roadmap gaps.

CapabilitySupportedNote
Portfolio stress (cross-asset, regime-conditional)✓ yesSPY/TLT/GOLD/BTC substrate; hedge-hold vs hedge-break detection.
Stock-bond hedge-break detection (the 2022 case)✓ yesShared-drawdown / rate-shock scenario.
Risk-concentration decomposition (capital vs risk)✓ yesEuler risk contributions.
Backtest over-optimism check (deflated Sharpe)✓ yesBailey & López de Prado; + crisis-coverage gaps.
Strategy failure-mode classification✓ yesCross-regime gate over the taxonomy.
Realism / value validation (citable resources)✓ yesvalidation://{asset} + validation://value.
Probabilistic forecasting / price prediction— noScenario-based; not a distribution over futures.
Investment recommendations / suitability— noDescriptive output only; BaFin/WpHG-aware.
Portfolio allocation / position-sizing advice— noIt stresses a portfolio; it does not tell you how to weight one.
Fundamentals / macro analysis— noPrice-stress of systematic strategies only.
Live trading / order execution— noNo broker integration; not an execution platform.

Methodological boundaries

Echoed as methodological_limitations on every response — part of the schema, not a separate disclosure.

  • Outputs are descriptive — regime behaviour, failure modes, drawdown distributions, risk decomposition. They are not predictions, recommendations, or suitability assessments.
  • The portfolio substrate is Tier-1: a fixed SPY/TLT/GOLD/BTC universe of pre-computed joint paths, re-weighted per request. Custom assets are Tier-2 and deferred.
  • The cross-asset model adds value in regime-conditional correlation and joint-crash tail dependence; single-asset daily marginals are at parity with cheap baselines (no unique edge claimed there).
  • Synthetic regimes impose conditions on a regime-switching simulator — they are not draws from a distribution of futures.
  • Drawdown and tail metrics are computed over the substrate paths, not a true out-of-sample tail estimate.
  • Scope is price-stress of systematic strategies and portfolios — not fundamentals, macro, suitability, or execution.

References

  • /llms.txt — compact overview for LLM crawlers, incl. the live platform.
  • /methodology— how the simulator works, the failure-mode taxonomy.
  • /reference— architecture, bands, cross-asset coupling, limits.
  • /verifiability— stylized-fact realism + value-vs-copula validation.
  • /ontology— the failure-mode DefinedTerm pages.

Operated under German jurisdiction (BaFin / WpHG framework). All outputs use neutral framing — no rankings, no directive language, no buy / sell signals, no suitability assessment.

Schema family ctys-agent-v1 · portfolio primitive ctys-agent-portfolio-v1.