Live Portfolio — what to own today
Performance
Cumulative return for the deployed blend and each of the four contributing sleeves, anchored at 0% at the start of the selected period. The stat strip and the attribution bars below the chart recompute live with the timeframe — switching to 1Y will show this year's return, switching to MAX will show full backtest history. Default YTD. Amber-shaded bands mark periods when the breadth regime overlay was active (RISK_OFF — half of NAV moved to SHY, 1-3y US Treasury). Look at the Mar-Apr 2026 band on the YTD view: that is the overlay protecting the strategy through a real drawdown.
Combined holdings
Every position in the deployed portfolio, sorted by total weight. Each row shows the ETF, which strategy holds it (A = US sector breadth, B = asset-class momentum, C = thematic momentum, D = Europe sector breadth), why it's in the portfolio (the signal that put it there), the within-strategy weight, and the effective weight after the 35/35/10/20 blend. If you have $100,000 to deploy, the "Total weight" column tells you how much capital each ETF gets. Click any row to see the underlying ticker's 1Y price chart.
How "Total weight" is built, with one row as a worked example. Say IUES sits in Strategy A only with a within-strategy weight of 19.3%. Strategy A allocates 35% of total NAV, so IUES gets 0.35 × 19.3% = 6.76% of the combined portfolio. On $100k, that is $6,755 in XLE (the trade proxy for IUES). The four-strategy panels below show how each within-strategy weight is computed from the underlying breadth or momentum signal — hover any row for the exact arithmetic.
Exposure by asset class
Roll-up of the combined portfolio by broad category. Tells you what the strategy is actually betting on right now beyond individual ticker symbols.
Diversification check — realised return correlation between the four sleeves
Pearson correlation of daily returns (not breadth signals) between each unique pair of sleeves, computed on the common 2018-11 → today window. Lower = more diversifying — the blend's risk-adjusted return depends on these being well below 1.0. Lower-triangular layout shows only the six unique pairs (the diagonal of self-correlations and the redundant mirror entries are omitted).
Strategy A, B, C and D — side by side
The four engines that produced today's holdings. Strategy A picks the strongest US sectors by constituent breadth. Strategy B picks the strongest asset classes by ETF-level momentum. Strategy C is a small thematic sleeve for catching secular trends (AI, clean energy, biotech). Strategy D (Phase 4) is a Europe sector breadth sleeve for non-US macro orthogonality. Combined portfolio: 35% A + 35% B + 10% C + 20% D.
How the deployment works in 6 steps
Recipe — combined 35/35/10/20 A+B+C+D, no leverage (Phase 4 deployed)
Expected statistics (in-sample, common 2018-2026 window): Sharpe +1.15, CAGR +15.1%, max drawdown -23.8%. If you prefer lower drawdown without the Europe sleeve, use the prior 45/45/10 A+B+C blend (Sharpe +1.09, max DD -21.5%) — see Multi-Strategy tab.
The deployed 4-way blend's complete history — every position, every rebalance, every contribution. This tab consolidates the per-sleeve trade detail that lives on the four individual strategy tabs into one blend-level view. Useful for: walking a client through "how does this actually work over time", validating the performance attribution numbers, and answering "what did the portfolio hold during regime X" questions.
1 · Asset-class exposure over time
The deployed 35/35/10/20 A:B:C:D portfolio's exposure by asset class, stacked to 100%, weekly snapshots since 2018-Q4 (common window across all four sleeves). Strategy A's 14 US sector ETFs are decomposed into 5 macro buckets (US Tech & Comms, US Cyclicals, US Energy, US Financials, US Defensives) so A's actual sector rotation is visible — previously they were lumped into one frozen "US Equity (sectors)" block. Notice the bond spike in early-2020 (B's flight to TLT/IEF), the thematic spike in 2020-21 (C catching the AI/clean-energy boom), the commodity tilt in 2022, the rotation between US Tech / Cyclicals / Defensives within Strategy A over the full window.
Bands sum to 100% of NAV every week. Bucketing: Strategy A's sectors split into Tech / Comms (CNDX, SOXX, IUCM), Cyclicals (IUIS, IUCD, IUMS), Energy (IUES), Financials (IUFS), and Defensives (IUHC, IUCS, IUUS). A's REIT (IUSP) aggregates with B's VNQ into Real Estate. B's VGK (FTSE Europe) aggregates with Strategy D's Stoxx 600 sectors into Europe Equity. Strategy C's IEF cash floor rolls into Bonds.
2 · Combined trade ledger — every rebalance across all four sleeves
Flat list of every position change across A, B, C, and D. One row per (sleeve × rebalance × ETF). Action = entry (first appearance) / exit (last appearance) / resize (weight change within an existing position). Filter by sleeve, ETF, or date to drill into a specific moment.
3 · Blend-level attribution — total NAV contribution per ETF
Each ETF's total contribution to the deployed 35/35/10/20 blend's return, aggregated across the sleeves that held it. If an ETF appears in multiple sleeves (e.g., GLD in Strategy B AND Strategy C, IEF as cash floor in both B and C, JETS as thematic in C), its blend contribution is the sum of (sleeve_weight × sleeve_contribution) across each sleeve. This is the auditable per-ETF P&L attribution that proves the headline blend return is built from these specific position-level decisions. Sorted by absolute contribution.
Inspect the underlying signal for any ETF in the deployed universe. Three sections because the two signal types differ: Strategies A and D use constituent breadth (% of an ETF's holdings above their own 200d MA, range 0-100%) — but for different universes (US vs Europe). Strategies B and C use ETF-level momentum (distance of the ETF price above its own 200d MA, can be negative). Each section has its own chip group and chart.
Section 1 · US Sector Breadth (Strategy A universe)
Default: Leader vs Market — Strategy A's top-breadth holding plus CSP1 (S&P 500) as a broad-market reference. Click any chip to add or remove.
Trend snapshot
Current breadth + change over recent past. Positive Δ = strengthening; negative = weakening. Sorted by current breadth.
Section 2 · Asset Class & Thematic Momentum (Strategy B + C universes)
Default: today's Strategy B leader + SPY as reference. The Y-axis is distance above the 200d MA (can be negative). IEF is included even though it is the cash proxy — useful for seeing when Treasury duration is rallying or selling off. Strategy C's +5% signal floor is shown as a horizontal guide.
Trend snapshot
Current momentum signal + change over recent past. Sorted by current signal value.
Section 3 · Europe Sector Breadth (Strategy D universe)
Default: today's Strategy D leader + the universe-wide mean as reference. Same metric as Section 1 (% of an ETF's constituents above their own 200d MA, 0-100%) but for the 5 Stoxx Europe 600 sector UCITS. Click any chip to add or remove.
Trend snapshot
Current Europe sector breadth + change over recent past. Positive Δ = strengthening; negative = weakening. Sorted by current breadth.
Strategy A — US sector breadth rotation
Each Friday close, compute % above 200d MA for each of the 14 ETFs in the US sector universe. Rank them. Hold the top K = 7 next week, weighted by breadth (highest-breadth sector gets the largest slice). Refit K only annually on expanding-window Sharpe — the choice has been stable at K = 7 throughout the backtest. Strategy A is the largest single sleeve (35%) in the deployed multi-strategy blend; see the Multi-Strategy tab for how it combines with B, C, and D.
- Every Friday close: pull the 14 ETF rosters, compute % of constituents above their own 200d simple MA.
- Rank ETFs by breadth. Take the top 7.
- Weight each by its breadth share: weight_i = breadth_i / Σ(top-7 breadths). High-breadth sectors get more capital, weak ones less.
- Rebalance to the new mix. The Trade Explorer deep-dive (below) shows the exact weights at every rebalance.
- Once a year, re-evaluate K ∈ {3, 5, 7}. So far the answer has been K = 7 every year.
Portfolio construction verdict
Loading…
Headline comparison
Top row (highlighted green, tagged DEPLOYED) is the actual Strategy A used in the 4-way Multi-Strategy blend at 35% weight: Top-7 breadth-weighted rotation, weekly Friday, no leverage. Below it: the K=5 unleveraged variant as a K-sensitivity reference, and the two passive benchmarks (naive equal-weight across the 14 ETFs, SPY buy-and-hold).
Equity curves — apples-to-apples
All five lines anchored at 0% at the common start date (the latest "data available" date across all series) so visual comparison is fair. Use the timeframe selector to view shorter windows.
Right-tail behaviour (Phase 8)
Metrics that Sharpe alone underrates. Sortino credits upside vol; skewness shows monthly-return asymmetry; rolling 12m extremes show the actual best/worst year experienced. Asymmetry ratio = |best| ÷ |worst| — values > 1 mean the upside tail dominates the downside tail.
Trade Explorer — Strategy A deep-dive
Every rebalance of Strategy A from — to —. Each row = a Friday close; holdings shown are what the strategy should be in for the following week. Green chips entered this week; red chips exited; blue chips persisted. Weight = portfolio share; breadth = % of that ETF's constituents above 200d MA at the rebalance date. For the BLEND-LEVEL combined ledger across all four sleeves, see the Trade History tab.
Strategy A · Equity vs benchmarks
Strategy A (K = 7, weekly Friday) versus SPY buy-and-hold and naive equal-weight across all 14 ETFs. All series shown as cumulative return from period start (0% baseline) — use the timeframe selector to switch between YTD / 1Y / MAX windows.
CAGR = compound annual growth rate. Δ = strategy minus benchmark.
Strategy A · Sector allocation over time
Weekly snapshot of the portfolio composition — stacked to 100%. Colours by ETF. Wide bands = persistent leadership; thin slivers = brief appearances. Tells you what the rotation was actually betting on in each market regime.
Strategy A · Trade history
All rebalances, newest first. Use the filter to find rebalances containing a specific ETF or close to a specific date.
Strategy A · Performance attribution
For each ETF, the contribution to Strategy A's total return. Contribution = sum of (yesterday's weight × today's ETF return) over all days in the backtest. % of total normalises across ETFs (positives net of negatives). Annualised return when held = the geometric daily return of just the ETF on days the strategy held it — answers "did the strategy time this sector well, or just own it through good periods?". Sorted by contribution.
Show all top-K variants (K ∈ {3,5,7} × weighting ∈ {equal, rank, breadth}, unleveraged)
Full unleveraged sweep — different ways to pick and weight the top-K-by-breadth basket. Weekly rebalance, 5 bps per unit of turnover.
Strategy B — Phase 2
A separate strategy that operates at the asset-class level instead of within US sectors. 14 broad ETFs across US equity, international developed, emerging markets, real estate, commodities, and bonds. Each Friday close, rank by distance above own 200d MA. Hold the top K with positive signal, weight by signal strength; idle capital sits in IEF (intermediate Treasury) as a cash proxy. This is "which asset classes should I be in at all?" — orthogonal to Strategy A's "within US equities, which sector?".
- Each Friday close, compute (price − MA200) / MA200 for each of the 14 asset-class ETFs.
- Drop any ETF with negative signal (price below its 200d MA — in a downtrend).
- Rank survivors. Take the top K = 7.
- Weight by signal share. If only N < K ETFs have positive signal, allocate (N/K) of capital to them and the remaining (K − N)/K to IEF as cash proxy. Built-in cash floor when the world is broadly weak — unlike Strategy A which stays 100% invested.
- Rebalance to the new mix. Costs 5 bps × turnover.
Equity vs benchmarks (18-year backtest)
Strategy B versus SPY buy-and-hold (broad-market passive — note its catastrophic -51% drawdown in 2008), the 60/40 SPY/IEF balanced portfolio (the conventional benchmark), and naive equal-weight across all 14 ETFs (diversification only, no signal). Strategy B's longer history (back to 2008) catches the GFC, the 2011 mini-crash, 2015-2016 commodity bust, 2018 Q4, COVID 2020, 2022 rates shock, and the recoveries in between. The headline result is the massive drawdown reduction.
Right-tail behaviour (Phase 8)
B is the boring sleeve and that is the point — narrowest distribution, lowest 12m extremes, smallest skewness, top sleeve only ~10% of months. The asset-class rotation does not catch fads; it rotates into bonds when equities weaken. Its asymmetric value is downside protection, not upside capture.
Asset-class allocation over time
Weekly stacked-to-100% allocation. Notice the regime shifts — the strategy rotated decisively into bonds (TLT/IEF) in 2008 and 2020, into commodities during the 2022 inflation shock, and into equities through the 2017-2019 and 2023-2024 expansions. This regime-following behaviour is what produces the drawdown reduction.
K × cadence sensitivity
Same heat-grid format as Strategy A's Test 11. Best Sharpe cell highlighted green, worst red.
Walk-forward K refit
Annual refit picking K ∈ {3, 4, 5, 6, 7} on expanding-window Sharpe, applying that K to the next 12 months. Same methodology as Strategy A's Test 10.
Trade history
Newest first. Filter to find specific ETFs or dates.
Performance attribution
Per-ETF contribution to the strategy's total NAV. The signal column shows the asset class — tells you which classes drove returns across the 18-year window.
Strategy C — Phase 3 (thematic sleeve)
A third sleeve for catching secular trends that don't fit traditional sectors (AI, cybersecurity, clean energy, biotech, blockchain, defence, crypto, broad metals & mining, timber, rare earth, China-tech-broad, China-A-share semis, etc). Same momentum signal as Strategy B applied to 23 thematic ETFs, with extra guardrails to limit fad-chasing. Sized at 10% of the combined portfolio — designed as optionality on the next AI-style bull run, not a primary alpha source.
- Each Friday close, compute distance above 200d MA for each of the 23 thematic ETFs.
- Hard signal floor: only consider ETFs with signal ≥ +5% above their 200d MA (not just positive). This filters out marginal "in an uptrend" cases that often reverse.
- Rank the survivors. Take the top K = 4.
- Equal-weight (1/K) across the top K (Phase 6 — 2026-05-24). The +5% signal floor already filters out modest trends, so signal magnitude beyond eligibility carries little extra information — every eligible candidate is well into an uptrend. Equal-weighting prevents the most-overbought ETF (statistically the one most likely to mean-revert) from being overweighted.
- Cash floor: when fewer than K candidates clear the +5% threshold, the deficit sits in IEF (intermediate Treasury) as cash proxy.
Honest assessment — Strategy C is an optionality sleeve, not a Sharpe-alpha sleeve
The standard quant lens (Sharpe ratio, Phase 7 bootstrap p(better)) systematically underrates Strategy C. Sharpe penalises upside vol symmetrically with downside vol; bootstrap p(better) measures the mean outcome, not the tail outcome. C is structured as an optionality sleeve — capped 10% sleeve weight (limits downside per year), unbounded upside if a thematic bull fires. The right metrics are right-tail metrics.
The empirical case for C, on right-tail metrics (see "Right-tail behaviour" section below):
- Best rolling 12-month return: +162%. Vs A +85% / B +43% / D +66%. C delivers the largest absolute upside tail in the universe by 2-4×.
- Top-performing sleeve 41% of months (most of any sleeve). When C wins it wins often and big; the bootstrap-on-Sharpe missed this because magnitudes are highly variable.
- COVID + thematic boom (Mar 2020 → Feb 2021 ARKK peak): C delivered +170% standalone. The 50/50 A:B got +58% in the same window. The 4-way blend with 10% C got +65% — the small sleeve captured ~70% of C's idiosyncratic alpha.
- 2022 inflation crash: C was the worst sleeve (-25%), but the 10% cap limited blend damage to -13% (vs SPY -25%). The cap works.
The asymmetry is the point. At a 10% sleeve weight, C contributes ~+17pp to blend return in its best year and ~-2.5pp in its worst year — a 7:1 upside/downside ratio. That is textbook optionality. Backtests of the last 7.5 years include the 2020-21 boom but cannot price the OPTIONS embedded in C — its real value compounds whenever a new thematic regime emerges that we cannot predict today.
Caveat: walk-forward Sharpe is +0.49 vs in-sample +0.84 (Phase 17.1, 23-ETF universe with CQQQ + 159801.SZ for the China-tech and China-A-share-semi gaps) — modest degradation, which is structural to thematic momentum. The 10% sleeve cap is the risk-management response to this. Do not deploy C as a primary alpha source; do deploy it as a small optionality sleeve.
Equity vs SPY (7.5-year backtest, 2018-11 → today)
Strategy C versus SPY buy-and-hold as the obvious passive benchmark. Note the deep drawdowns (-48% peak-to-trough after the Phase 15.2 Bitcoin addition; was -43% pre-BTC) — thematic ETFs cluster on tech, clean-energy, and now spot crypto, all of which run high vol. The recovery has been strong but at high vol. The deep drawdowns are the option premium; the +170% COVID-era return and 2024 crypto rally are the option payoffs.
Right-tail behaviour (Phase 8) — the optionality scorecard
This is the section that earns Strategy C its place in the deployed blend. The asymmetry ratio (best 12m ÷ |worst 12m|) and the % months as top sleeve are the metrics that capture optionality value. Compare against the other strategy tabs to see how C's right tail dwarfs everything else.
Thematic allocation over time
Weekly stacked-to-100% allocation. When the cash floor activates (fewer than K = 4 themes clear the +5% threshold), the deficit sits in IEF (Treasury). Big IEF bands = "the world is broadly weak in thematic-land". Notice the rotation through commodity-equity (gold miners, uranium, copper, lithium) when tech was weak in 2022.
K × cadence sensitivity
Same heat-grid format as Strategy A's Test 11 and Strategy B's grid. Best Sharpe cell highlighted green, worst red.
Walk-forward K refit
Annual refit picking K ∈ {3, 4, 5} on expanding-window Sharpe. The walk-forward Sharpe is significantly lower than the in-sample Sharpe — a known characteristic of thematic ETFs (recent winners get extrapolated; the rotation chases fads). This is one reason for the 10% sleeve cap.
Trade history
Newest first. Filter to find specific themes or dates.
Performance attribution
Per-ETF contribution to Strategy C's total NAV. The theme column shows which thematic categories drove returns.
Strategy D — Phase 4 (Europe sleeve)
A fourth sleeve applying the same constituent-breadth mechanism as Strategy A, but to 5 Stoxx Europe 600 sector UCITS funds (Banks, Oil & Gas, Technology, Industrials, Utilities). The motivation is structural orthogonality: Europe runs on its own macro cycle (ECB rates, EUR/USD, China trade exposure) that is genuinely different from the US sector universe. Sized at 20% of the combined portfolio in the recommended 4-way blend, which empirically lifts Sharpe by ~+0.07 vs the prior 3-way 45/45/10 baseline.
- Each Friday close, compute % of constituents above 200d MA for each of the 5 Europe sector ETFs.
- Rank by breadth. Take the top K = 3.
- Weight by breadth-share excess (each holding's weight ∝ its breadth − the K+1-ranked ETF's breadth). Same weighting rule as Strategy A.
- Rebalance to the new mix. Costs 5 bps × turnover.
- Trade as: the underlying Xetra-listed UCITS (EXV1.DE etc.) in EUR. Settlement T+2, full liquidity through any EU broker.
Honest assessment — Strategy D is a Sharpe lifter but raises drawdown
Standalone Sharpe over the common 2018-11 → 2026-05 window is +0.93 — better than SPY's +0.77 and similar to Strategy B's +0.95. CAGR is +14.9% with max DD -32.0%. The DD is fully participatory in the 2020 and 2022 sell-offs; Europe sectors did not provide downside protection.
When blended at 20% with the existing 45/45/10 A:B:C baseline (35/35/10/20 A:B:C:D), Sharpe rises from +1.08 to +1.15 and CAGR from +14.9% to +15.1%, but max DD widens from -21.5% to -23.8%. This is not a free lunch — it is a real diversification trade: marginally better risk-adjusted returns at the cost of ~2.3pp deeper drawdowns. If drawdown floor is your priority, stay with the 3-way 45/45/10. If Sharpe is your priority, switch to 35/35/10/20.
Pipeline fix note: discovery during Phase 4 — the `compute_ma200_breadth` function used `rolling(200, min_periods=200)` which dropped the MA for any constituent with even 1-2% missing days. US constituents (S&P 500 via yfinance) have ~100% coverage so this was a no-op for Strategy A. Non-US constituents (.L / .DE / .PA / .AS / .MI) have sparse missing days from local holidays / dividend events, which caused breadth to silently freeze on a stale value (last good was 2023-04-06 in the broken version). Fix: relaxed `min_periods` to 90% of the window, allowing the typical sparse missingness. Strategy A Sharpe changed by <0.01 (US is the universe where the fix is a no-op). All Phase 4 numbers above are on corrected data.
Equity vs SPY (8-year backtest, 2018-01 → today)
Strategy D versus SPY buy-and-hold over the full Europe sector backtest window. Note the early flat period before 2018-Q2 reflects the 200d MA warmup; from there forward Europe rotation tracks above SPY on a Sharpe basis with similar max drawdown.
Right-tail behaviour (Phase 8)
Strategy D's profile is closer to A than to C — solid rolling 12m upside, modest skewness, top sleeve about 28% of months. The optionality comes from regime orthogonality vs US sectors (ECB / EUR / China cycle), not from thematic convexity.
Europe sector allocation over time
Weekly stacked-to-100% allocation across the 5 Stoxx Europe 600 sector ETFs. The rotation visibly shifts between defensives (Utilities) and cyclicals (Banks, Industrials) as the European macro regime changes.
K × cadence sensitivity
Same heat-grid format as Strategies A / B / C. K = number of Europe sectors held; cadence = how often we rebalance. Best Sharpe cell highlighted green, worst red.
Trade history
Newest first. Filter to find specific sectors or dates.
Performance attribution
Per-ETF contribution to Strategy D's total NAV. The sector column shows which Stoxx 600 sectors drove returns.
Multi-strategy combination — A, B, C, and (Phase 4) D
Strategy A (US sector top-K-by-breadth, Sharpe ~0.98) handles within-US-equity rotation. Strategy B (asset-class top-K-by-momentum, Sharpe ~0.95 with max DD only -14%) controls drawdown via flight-to-bonds in crises. Strategy C (thematic top-K-by-momentum) adds optional exposure to secular trends (AI, clean energy, etc). Strategy D (Phase 4 — Europe sector top-K-by-breadth, Sharpe ~0.93) adds structural orthogonality from a non-US macro cycle. Blended fixed-weight, rebalanced weekly. The deployed default is 35/35/10/20 A:B:C:D — the Phase 4 winner. See the honest assessment below before choosing.
All variants side-by-side
The four standalone strategies plus the blend variants: three 4-way A:B:C:D blends (Phase 4), three 3-way A:B:C blends (pre-Europe baseline), three 2-way A:B blends, and the meta-rotation. Default view shows only four lines — the deployed 4-way blend, the prior 3-way baseline, the 2-way 50/50 A:B (for "does adding C help?"), and Strategy A standalone. Click any legend chip below to add or remove a line; the chart below the chart-area will redraw immediately. All series renormalised to 1.0 at the common start date.
Regime decomposition — how does each blend behave in specific market environments?
Per-strategy and per-blend total return + max drawdown across four hand-picked regimes from the backtest window. This is where Strategy C earns its place in the deployed blend — the COVID + thematic boom row tells the optionality story most directly. Each cell shows total return; sub-cell shows max DD.
Statistical significance — is the deployed blend distinguishable from the alternatives?
The Sharpe improvements documented across Phase 4-6 are point estimates. To know whether they are real signal or sample noise, we paired-bootstrap the daily return series (moving block bootstrap, block 60d, 2,000 samples). Each row below shows the Sharpe differential point estimate, the 95% bootstrap CI on the differential, and p(better) — the fraction of bootstrap samples where the deployed blend's Sharpe exceeded the alternative. If the CI's lower bound is positive, the improvement is statistically significant at the 5% level (one-sided).
Caveat — what Sharpe bootstrap MISSES (Phase 8): Sharpe ratio treats positive and negative volatility symmetrically. For optionality sleeves like Strategy C, this systematically underrates the strategy's value. The "C does not lift the blend" finding above is true on average but misses C's asymmetric upside contribution in specific regimes. See the Regime decomposition section above for the actual right-tail evidence: in the COVID + thematic boom window (Mar 2020 → Feb 2021), Strategy C standalone delivered +170%; the 4-way blend's 10% C sleeve added ~7pp to blend return in that window vs no-C blends. The 10% sleeve cap is the optionality structure — bounded downside, unbounded upside — which is precisely the shape of bet that Sharpe-based metrics cannot price properly. The C decision is justified on right-tail / regime metrics, not on bootstrap Sharpe.
How to choose your blend — and what the data actually says
If you want best risk-adjusted return (Phase 4 default): 35/35/10/20 A:B:C:D. Highest Sharpe of any variant tested (~+1.15 on the common 2018-2026 window). The Europe sleeve adds genuine diversification from a non-US macro cycle (ECB, EUR/USD, China trade). Cost: max DD widens by ~2.3pp vs the 3-way baseline because Europe is real equity exposure with real volatility — not a downside hedge.
If you want lowest drawdown with thematic optionality: 45/45/10 A:B:C (the prior 3-way baseline). Sharpe ~+1.09 (slightly worse than 4-way) but max DD ~-21.5% (~2.3pp shallower than 4-way). The right choice if drawdown floor matters more than incremental Sharpe.
If you want lowest drawdown overall: 30/70 A:B (no C, no D). Sharpe still strong (~+1.10), CAGR ~+13%, max DD only ~-17.7%. Cleanest two-sleeve construction.
If you want maximum CAGR and tolerate big drawdowns: Strategy A alone (~+17.5% CAGR but -31% max DD). The blends trade ~2-3pp of CAGR for ~7-10pp of drawdown reduction — usually a good trade.
The meta-rotation variant (own only A or only B each week, picking by trailing 6-month Sharpe) performs worse than either strategy alone. The lookback lags too much; by the time it identifies that B is winning, A has often started winning again. Static blends beat dynamic blends in this dataset.
How to think about Strategy C (Phase 8 reframing): C is an optionality sleeve, not a Sharpe-alpha sleeve. Bootstrap-on-Sharpe gave C ~42% p(better) vs no-C blends — true on AVERAGE outcome, but misses the asymmetric option payoff. C's right-tail metrics tell the real story: best rolling 12m +162% (vs A +85% / B +43% / D +66%), top performer 41% of months (most of any sleeve), and in the COVID + thematic boom regime delivered +170% standalone (capturing ~70% of which got into the 10% blend sleeve). The 10% sleeve cap structures it as a long-dated out-of-the-money call basket — small premium (the -25% it lost in 2022 was -2.5pp at the blend level), unbounded upside (next AI/clean-energy/space/quantum boom). See the Regime decomposition section above for the empirical evidence and the Strategy C tab for the full optionality scorecard.
Caveat on Strategy D: the Europe sleeve is a Sharpe lifter but it raises max drawdown — Europe sectors are participatory in global equity sell-offs (2020, 2022), not defensive. The Phase 4 trade is "marginally better risk-adjusted return for ~2pp deeper drawdowns". If you primarily care about downside protection, stay with 45/45/10. If you care about Sharpe and CAGR, switch to 35/35/10/20.
Risk & Validation — why we deploy what we deploy
Loading…
Why top-K rotation, not per-ETF threshold tuning
The same MA200 breadth signal can be deployed three different ways. Each paradigm has a different number of free parameters, a different overfit profile, and a different walk-forward Sharpe. This is the strategic justification for the current Strategy A architecture: cross-sectional top-K rotation over per-ETF L-threshold tuning.
Walk-forward K selection — per-segment detail
For the deployed cross-sectional rotation paradigm, K (number of top-breadth ETFs to own) is refit annually on the expanding train window. Each refit picks the K ∈ {3, 5, 7} that maximised Sharpe so far, then applies it to the next 12 months. Concatenate to get the realistic OOS curve. The K sequence has been stable, which is itself a robustness signal.
Strategy A — K × rebalance-cadence sensitivity
How sensitive is the deployed Strategy A to its two free choices: how many ETFs to own (K) and how often to rebalance? Each cell is the in-sample Sharpe; sub-cell shows max drawdown, annual turnover (a unit = full portfolio replacement), and number of position flips. Best cell highlighted green, worst red. The deployed cell is K = 7 × Weekly Fri.
Universe correlation diagnostic — is the universe saturated?
Pairwise Pearson correlations of the underlying signal time series across each strategy's universe. Diagnostic for the question "would adding more ETFs help?" — if existing ETFs already cluster at correlations > 0.85, the universe is saturated and the marginal added ETF just adds turnover without new signal. If correlations are mostly < 0.5, there is room to diversify the universe.
Strategy A — breadth signal correlations
14 US sector/broad ETFs (after pruning IUIT in May 2026). Each cell = Pearson correlation of weekly breadth series. Red = highly correlated (redundant). Blue = uncorrelated (diversifying).
Strategy B + C — momentum signal correlations
14 asset-class + 16 thematic ETFs (deduped + IEF). Same metric — Pearson on weekly distance-above-200d-MA series.
The strategy explained, plain English
Pick the strongest seven sectors of the US stock market each week. Own them. Update weekly. That is the whole strategy.
What is a "sector"? The US stock market is split into industry groups — Financials, Energy, Health Care, Industrials, Consumer Discretionary, Consumer Staples, Utilities, Materials, Communication Services, Real Estate — plus a few broad-market and concentrated picks (S&P 500, NASDAQ-100, Semiconductors, Small-cap 600). We use 14 ETFs that represent these groups. (NASDAQ-100 already gives us the large-cap tech exposure; we pruned the S&P 500 Info Tech ETF in May 2026 because it duplicated the NASDAQ-100 too closely.)
How do we judge which sector is "strong"? Every sector contains 30 to 500 individual companies. We count what percentage of those companies are currently trading above their own 200-day moving average. A stock above its 200-day average is in an uptrend; below it, a downtrend. So if 85% of the companies in a sector are in uptrends, that sector is strong — broadly strong, not just lifted by a few mega-caps. We call this number the breadth of the sector. It ranges from 0% (everything broken) to 100% (everything in an uptrend).
What this strategy is NOT doing. It is not trying to time the market — there is no "stay in cash if everything looks bad" rule. It is always 100% invested. It is also not trying to predict which individual sector will go up next. It is doing something simpler: own whatever is leading, trim whatever is lagging. Markets reward this pattern more often than they punish it, because trends in market breadth tend to persist for weeks-to-months at a time.
Why we trust it. The "Robustness" tab shows the strategy holds up when we apply the standard quant-research stress tests: walk-forward validation (refitting the only free parameter, K = the number of sectors to own, each year on out-of-sample data), bootstrap confidence intervals, sub-period decomposition through bear markets, sensitivity to the moving-average lookback and the rebalance frequency. The walk-forward Sharpe ratio is approximately 1.05 with no degradation from the in-sample number — that is unusual and is the main reason we picked this paradigm over per-ETF threshold tuning or fixed-threshold timing.
Caveats and what would kill it: a regime where all 11 sectors fall together (2008-style systemic crisis) hurts the strategy because owning the "strongest" of a falling universe is still owning falling stuff. We do not currently hedge or shift to cash. A modest improvement on the to-do list is to add a "if median breadth across all sectors is below 40%, reduce gross exposure to 50%" overlay — but we have not yet validated it OOS so it is not in the headline numbers.
Signal definition (formal)
Breadth indicator (Strategy A): % of an ETF's point-in-time constituents whose closing price is above their own 200-day simple moving average. Computed daily; constituents reset weekly from the iShares UK holdings endpoint.
Momentum signal (Strategy B): distance above own 200-day MA per ETF, (close − MA200) / MA200. Positive = uptrend, negative = downtrend. Computed daily from yfinance adjusted close.
Breadth indicator (Strategy D): same as Strategy A but on Stoxx Europe 600 sector UCITS constituents. Pipeline fix 2026-05-24 — relaxed `min_periods` in the rolling MA200 from 100% to 90% of window so that non-US constituents with sparse missing days (local holidays, dividend events) do not silently lose their MA. See the Phase 4 retrospective section below.
Headline strategy — Multi-Strategy 35/35/10/20 A:B:C:D blend (no leverage, Phase 4 deployed):
- Strategy A: rank 14 US sector/broad ETFs by breadth, take top K = 7, weight by breadth share. Universe: SOXX, CSP1, CNDX, IUES, IUFS, IUHC, IUIS, IUCS, IUCD, IUUS, IUMS, IUCM, IUSP, IDP6. IUIT (S&P 500 Info Tech) was pruned May 2026 because of 0.97 correlation with CNDX — see Robustness Test 12.
- Strategy B: rank 14 broad asset-class ETFs by distance above 200d MA, drop negatives, take top K = 7, weight by signal share, cash floor in IEF for unfilled slots. Universe: SPY, IJR, QQQ, EFA, VGK, EWJ, EEM, VNQ, GLD, DBC, TLT, IEF, TIP, HYG.
- Strategy C (10% sleeve): rank 23 thematic ETFs by distance above 200d MA, drop any below +5% (signal floor), take top K = 4, equal-weight (1/K) across the holdings, cash floor in IEF for unfilled slots. Universe: ARKK, CIBR, SKYY, BOTZ, BLOK, ICLN, TAN, LIT, URA, XBI, ARKG, JETS, GDX, COPX, MOO, PAVE, ITA (defence/aerospace, added Phase 15), BTC-USD (Bitcoin spot — see caveat below; live execution in IBIT, added Phase 15.2), XME (broad metals & mining), WOOD (timber & forestry), REMX (rare earth / strategic metals) (those three added Phase 16 for commodity-equity breadth), CQQQ (Invesco China Technology — broad China-tech, added Phase 17), 159801.SZ (Bosera CSI Chip ETF, CNY-denominated, USD-adjusted via spot FX — pure China A-share semiconductor hardware basket: Cambricon, AMEC, NAURA, SMIC. Deployed via IBKR Stock Connect; 588200.SS is an interchangeable Shanghai-listed alternative. Added Phase 17.1). Phase 6 (2026-05-24) replaced the original signal-share+35%-cap weighting with equal-weight after an A/B test showed equal-weight dominates on every metric for this high-floor (5%) regime — see Phase 6 retrospective below.
- Strategy D (Phase 4, 20% sleeve): rank 5 Stoxx Europe 600 sector UCITS by constituent breadth, take top K = 3, weight by breadth share. Universe: EXV1 (Banks), EXH1 (Oil & Gas), EXV3 (Technology), EXH3 (Industrial Goods & Services), EXH9 (Utilities). Trade as the Xetra-listed UCITS (EXV1.DE etc.) in EUR.
- Combination: 35% A + 35% B + 10% C + 20% D, rebalanced weekly Friday.
- Transaction cost: 5 bps per unit of weight change (10 bps round-trip).
- No look-ahead: every weight is decided using the prior trading day's signal.
- K is refit only annually on expanding-window Sharpe; choices have been stable at K = 7 (A), K = 7 (B), K = 4 (C), K = 3 (D).
Why 35/35/10/20 and not 45/45/10? The 4-way blend with the Europe sleeve posts the highest Sharpe of any variant tested (+1.15 vs +1.09 for the 3-way 45/45/10), at the cost of ~2.3pp wider max drawdown (-23.8% vs -21.5%). The Europe sleeve is real equity beta — it participates in 2020/2022 sell-offs and does not provide downside protection — but its idiosyncratic European macro cycle adds genuine diversification at the blend level. If drawdown floor is more important than Sharpe, stay with 45/45/10. If you want best risk-adjusted return, switch to 35/35/10/20.
Phase 4 retrospective (2026-05-24)
Phase 4 added Strategy D (Europe sector breadth) as a 4th sleeve. The build also forced a fix to the constituent breadth pipeline that quietly affected non-US ETFs.
What was tested: 9 new ETFs added to the registry — 5 Stoxx Europe 600 sector UCITS (EXV1, EXH1, EXV3, EXH3, EXH9) plus 4 single-country UCITS (IJPN, NDIA, ICHN, ITWN). The Phase 4 experiment ran 8 architecture variants: baseline 45/45/10 A:B:C, +Europe (heavy and light), +Countries (heavy and light), +both, MERGE (all into Strategy A's universe).
Result: Europe sleeve wins as a separate 20% sleeve (35/35/10/20 A:B:C:D, Sharpe +1.152 vs baseline +1.082 — delta +0.07). Countries lose on every variant (max Sharpe +0.99 vs baseline +1.08). MERGE loses (Sharpe +1.05). Countries deferred to a future phase pending universe expansion (only Japan + Taiwan had usable data after the pipeline issue was fixed; India + China constituent coverage remains thin in yfinance for non-US tickers).
Pipeline fix — the surprise discovery: while validating Strategy D output, noticed the equity curve was flat at 1.0 from 2018-01-26 to 2021-02-12 (3 years of "cash") then broke into trading, with breadth values frozen at the same number from 2023-04-06 onwards (3 years of "stale signal"). Investigation revealed:
compute_ma200_breadthinrun_ma200_sweep.pyusedrolling(200, min_periods=200).mean()— strict, requires ALL 200 observations in the window to be non-NaN.- US S&P 500 constituents have ~100% daily coverage via yfinance, so the strict requirement is a no-op for Strategy A.
- Non-US constituents (
.L,.DE,.PA,.AS,.MItickers) have 1-2% sparse missing days from local holidays and dividend events. Even constituents with 99% coverage failed the strict requirement because every 200-day window contained at least one missing day. Result:n_valid(count of constituents with computable MA200) collapsed to 0 → breadth went NaN →_build_panels_forffill'd the last valid value indefinitely. - Fix: relaxed
min_periodstoint(period * 0.9)(=180 for MA200) and tightened the denominator to require both today's price AND the MA to be valid. Now a constituent with at least 180 of the last 200 days valid contributes to the breadth signal — matching what a human trader would consider "enough history to call the trend". - Impact on US strategies: Strategy A Sharpe changed from +0.949 to +0.96 — well within rounding (US is the universe where the fix is a no-op). Strategy B and C use ETF-level momentum (not constituent breadth) so they were unaffected.
- Impact on Strategy D: was previously a broken +0.59 Sharpe with frozen post-2023 signal; now correctly +0.93 Sharpe across the honest 2018-2026 backtest with realistic -32% DD (was a fake -19% under the frozen-signal version).
Lesson: when extending a strategy mechanism to a new universe, the data plumbing assumptions may not hold. The bug had been latent the entire time we added IJPN / NDIA / ICHN / ITWN constituents — none of them produced usable breadth until the fix. Worth re-running the universe-level diagnostics on any future non-US additions to confirm continuous breadth.
Phase 5 retrospective — negative result (2026-05-24)
Phase 5 tested 11 US sub-sector / industry ETFs (XME, AMLP, ITB, OIH, KRE, XRT, FDN, IBB, SMH, XOP, PBW, KIE, PHO, IGV) as candidates for expanding Strategy C's universe beyond the existing 16 thematic ETFs. Most were SPDR / VanEck / First Trust / Invesco / Global X funds — none are iShares, so they would not fit Strategy A (which requires iShares holdings CSV), but the ETF-level momentum mechanism in Strategy C accommodates them with a one-line registry change per ticker.
Diagnostic gate (Test 12-style correlation analysis on weekly signal series):
- Within-Strategy-C gate (would the candidate just duplicate an existing C member?): 4 candidates failed — XME (cousin of COPX +0.87), IGV (cousin of SKYY +0.94), XRT (cousin of PAVE +0.86), FDN (cousin of SKYY +0.96). The user's initial screenshot intuition was wrong on these.
- Cross-strategy gate (would the candidate just amplify a sector Strategy A already holds?): 3 more candidates were blatant cousins of A's sector slate — XOP (XLE +0.95), OIH (XLE +0.91), KIE (XLF +0.91). Adding them would mostly just double-weight Energy / Financials when A is already there.
- Survivors: 4 candidates passed within-C but were marginal on cross-strategy — ITB (clean on both, max cross-A +0.75 with SPY), AMLP (XLE +0.85), PHO (SPY +0.85), KRE (XLF +0.87).
Empirical test: added the 4 survivors to Strategy C's UNIVERSE, re-ran Strategy C and the multi-strategy combinator. Results:
- Strategy C standalone Sharpe: +0.71 → +0.74 (+0.03 lift)
- Strategy C walk-forward Sharpe: +0.36 → +0.26 (-0.10 degradation — the deployment-quality metric got worse)
- Strategy C max drawdown: -43.7% → -41.6% (+2.1pp shallower)
- Deployed 4-way blend 35/35/10/20 Sharpe: +1.1503 → +1.1506 (+0.0003 — within noise)
- Deployed 4-way blend max DD: -23.8% → -23.8% (no change)
Verdict: reverted. The standalone Strategy C in-sample improvement is real but small; at the 10% sleeve weight in the deployed blend it is undetectable (+0.0003 Sharpe). The walk-forward degradation is the actual signal — the marginal cross-strategy cousins (AMLP / PHO / KRE riding XLE / SPY / XLF momentum cycles) chase fads that mean-revert out-of-sample. Combined with the operational cost of 4 extra holdings (AMLP issues K-1 tax forms which are annoying for SG/EU investors), no net benefit.
Lesson: the within-strategy correlation gate is necessary but not sufficient. The cross-strategy correlation gate matters too — at +0.85+ it kills the marginal additivity even when the within-strategy correlation is low. Future universe-expansion candidates should pass BOTH gates at threshold < 0.85 before being deployed. Sub-sector ETFs that are cousins of any sector already in Strategy A's slate (XLE, XLF, XLI, XLY, etc.) are particularly unproductive — Strategy A already captures that exposure efficiently.
What's queued for future research: (1) single-country breadth — needs universe expansion beyond the original 4 (consider adding EWZ Brazil, EWA Australia, EZA South Africa, EWS Singapore) and a more careful Strategy E architecture rather than treating countries as a sleeve of Strategy A. (2) MSCI ACWI ex-US sector breadth — if iShares offers UK UCITS that track non-US developed-market sectors at a finer granularity than the current Europe-only Strategy D, the same constituent-breadth mechanism could extend further.
Phase 6 retrospective — weighting-scheme A/B test (2026-05-24)
Phase 6 tested the hypothesis that Strategy B and C's modest walk-forward Sharpe was partly caused by the signal-share weighting scheme, which heavily overweights the most-overbought ETF — statistically the one most likely to mean-revert.
The asymmetry: Strategy A uses breadth-share weighting on a bounded signal (breadth ∈ [0,1]) — top-1 weight typically ~16% of the sleeve. Strategy B and C use signal-share weighting on an unbounded signal (distance above 200d MA, can be +50% in bubbles) — top-1 weight can reach ~46% of the sleeve. That is 3× the concentration of A, on candidates that are statistically more reversal-prone.
Test: ran 4 weighting schemes side-by-side on Strategy C and Strategy B (no universe changes — pure parameter sweep):
- Current — signal-share (with 35% cap for C, no cap for B)
- Equal-weight — 1/K per holding (ignores signal magnitude)
- Sqrt(signal) — weight ∝ √signal (softens proportionality)
- Rank-weighted — top gets K weight units, K-th gets 1 (bounded dispersion regardless of signal magnitude)
Strategy C results: equal-weight dominates on every metric. IS Sharpe +0.708 → +0.781, WF Sharpe +0.364 → +0.388, CAGR +16.5% → +18.3%, max DD -43.7% → -42.8%, turnover 16.9× → 15.7×. The walk-forward K-sequence also shifts: the original scheme always picked the smallest K=3 (to compensate for the concentration the weighting creates); equal-weight alternates K=3 and K=5 — the rotation can hold more positions without giving any one too little weight.
Strategy B results: equal-weight has best IS Sharpe (+0.794 → +0.876) BUT max DD widens by 6pp (-14.6% → -20.9%). Sqrt(signal) is a softer alternative — best WF Sharpe (+0.743) with smaller DD damage (-17.2%). For B the signal-share scheme remains the best deployment choice. The DD trade-off matters: B's primary job is downside control via flight-to-bonds in crises, and that mechanism depends on weighting heavily into TLT/IEF when they are the strongest signals.
The mechanistic reason for the asymmetry: Strategy C has a +5% signal floor — eligible candidates are by definition already in well-established uptrends, so signal magnitude beyond eligibility carries little extra information (they're all strong). Weighting heavily toward the strongest just overweights the candidate most likely to mean-revert. Strategy B has 0% floor — eligible candidates include modest +0.5% above MA200, so signal magnitude does carry meaningful information about which trends are well-established vs. just starting. Signal-share is informative for B; equal-weight is informative for C.
Verdict: shipped equal-weight for Strategy C, kept signal-share for Strategy B and (already) breadth-share for A and D. The Strategy C standalone Sharpe lifts from +0.71 to +0.79; at the 10% sleeve weight, the 4-way deployed blend Sharpe lifts from +1.150 to +1.156. Modest in magnitude but consistent in direction with no downside (DD slightly improves, turnover slightly lower).
Generalisable lesson: the right weighting scheme depends on the signal-floor regime. With a high floor (C: +5%), the floor IS the filter — equal-weight after the floor. With a low floor (B: 0%), the magnitude IS the filter — signal-share with no floor. With a bounded signal (A and D: breadth ∈ [0,1]), signal-share is naturally tight and produces near-equal weights anyway, so the question does not arise.
Operational note: the original 35% per-ETF cap on Strategy C is now moot under equal-weight (K=4 → 25% per holding, well below the 35% cap). The cap code is retained in run_thematic_rotation.py as a no-op safeguard if K is ever reduced to 3 (where 33.3% per holding would still be under cap).
Phase 7 retrospective — metric overhaul + statistical significance (2026-05-24)
Phase 7 was prompted by a critique that the dashboard headline stats included three useless metrics (Total Return, Annual Turnover ×, Number of Rebalances) and missed the metrics an institutional AI would actually ask about. Three deliverables:
- Replaced headline stats across all four strategy tabs + the Multi-Strategy tab. Dropped Total Return (path-dependent, length-dependent, redundant with CAGR) and Number of Rebalances (carries zero information beyond what the cadence label already says). Added: Walk-forward Sharpe (the deployment-quality number, was buried in details for some sleeves), Calmar ratio (CAGR / |Max DD| — single best one-number risk-adjusted return), and average holding period in days (252 / annual_turnover — much more interpretable than "10.4×/yr"). Added an explicit cost drag sub-line (annual_turnover × 5 bps) so the dollar cost of execution is no longer hidden.
- Added walk-forward K refit to Strategy D, which previously had no OOS validation. Annual K refit on expanding train window, K ∈ {2, 3, 4}. Result: Strategy D walk-forward Sharpe is +0.97, HIGHER than in-sample +0.89. The K choice is stable (always K=3), the signal is genuinely persistent OOS, and the 6-segment K-sequence shows no overfitting on the cadence/K choice. This was the single biggest robustness gap in the Phase 4 deployment story; it now closes cleanly. WF Sharpe above IS Sharpe is an unusual and strong robustness result (most strategies see WF degradation from IS due to overfitting).
- Computed block-bootstrap CIs on every strategy's Sharpe and on the key paired differentials. Moving block bootstrap, block size 60 trading days (~3 months, matches Robustness Test 4 methodology), 2,000 samples, paired sampling preserves cross-strategy correlation. Output: per-strategy Sharpe with 95% CI, plus 4 paired differentials (deployed vs 3-way, deployed vs A alone, deployed vs 50/50 A:B, 3-way vs 50/50 A:B). Surfaced as a dedicated "Statistical significance" section on the Multi-Strategy tab and as a sub-line under each strategy tab's recipe-stats.
The honest finding from the bootstrap CIs:
- The 4-way deployed blend Sharpe (+1.16) is NOT statistically distinguishable from the 3-way baseline (+1.09) at 5% significance. The 95% CI on the differential is [-0.04, +0.17] — straddles zero. But the bootstrap gives the deployed blend a 83% probability of being a real improvement, which combined with the mechanistic rationale (Europe orthogonality, Phase 4 retrospective) is meaningful but not conclusive evidence.
- Strategy C's 10% sleeve contributes essentially zero at the blend level vs just 50/50 A:B. The 3-way 45/45/10 vs 2-way 50/50 differential is -0.004 Sharpe with p(better) = 41.9% — coin-flip. C earns its inclusion only as optionality on the next thematic bull run, not as a Sharpe lifter. This is now stated explicitly in the Multi-Strategy tab "How to choose your blend" section.
- The Phase 4-6 cumulative Sharpe improvement (+1.09 → +1.16, delta +0.06) is consistent with real improvement but within the noise floor of 7.5 years of data. Confidence will grow with more OOS years. The point estimates are all positive, the directions are consistent across multiple comparison points, and the mechanistic stories (Europe orthogonality for Phase 4, equal-weighting in high-floor regime for Phase 6) suggest the improvements are structurally motivated rather than data-snooped.
Implication for fund deployment: the deployed blend is the right point estimate, and the engineering process (correlation diagnostics, A/B tests, this bootstrap audit) is the right discipline. But the dashboard should not oversell the +0.06 Sharpe improvement as a definitive win — it is a directional improvement with strong mechanistic support, awaiting more OOS data for statistical confirmation. Anyone evaluating this as Navigo IP should understand both numbers: the point estimate and the CI.
What's still missing (Tier 2 backlog): live tracking infrastructure (paper-trade weekly, log actual-vs-backtest divergence — the regulatory-grade addition for Navigo's CMS application), Bayesian / probabilistic Sharpe (richer than the existing bootstrap CI), regime-aware risk overlay (could reduce DD if it doesn't overfit), and tax-aware accounting documentation (K-1 forms for AMLP, European UCITS withholding, etc. — relevant for the SG investor base). Items already completed in Phase 11-12: realised cross-strategy return correlation matrix on Monitor (Phase 11A), hit rate + longest drawdown duration on each strategy tab (Phase 11B), per-strategy cost calibration (Phase 12 — see Caveats below).
Phase 8 retrospective — right-tail metrics and the optionality framing (2026-05-24)
Phase 8 was prompted by a substantive critique that the Phase 7 conclusion "Strategy C contributes essentially nothing at the blend level" was mean-variance-accurate but option-theoretically wrong. The point is generalisable: Sharpe ratio and bootstrap-on-Sharpe are the right metrics for symmetric alpha sleeves but the wrong metrics for asymmetric / optionality sleeves. Phase 8 builds the right metrics and reframes C correctly.
Why Sharpe underrates optionality strategies:
- Sharpe = mean ÷ standard deviation. Standard deviation treats positive and negative returns symmetrically.
- An optionality strategy is structured to be asymmetric: capped downside (Strategy C: 10% sleeve weight caps annual NAV impact at ~-10%), unbounded upside (if a thematic bull fires, C captures it; max sleeve impact ~+15% NAV per year empirically).
- The bootstrap-on-Sharpe p(better)=42% finding (Phase 7) is the mean outcome over 7.5 years. It implicitly assumes "the next 7.5 years look like the last 7.5 years". This assumption fails precisely when a new thematic regime emerges — the moment when C is most valuable.
- Backtests cannot price embedded options. The 2018-2026 window includes the 2020-21 thematic boom but cannot price the OPTIONS embedded in C for future booms we cannot name (space, quantum, longevity, fusion, AI-derivative themes).
What Phase 8 added — right-tail metrics (one new script + 5 new dashboard sections):
- Sortino ratio: annualised mean / annualised downside-only volatility. Credits upside vol; only penalises drawdowns. Fairer than Sharpe for convex strategies.
- Skewness of monthly returns: positive skew = right-tail bias.
- Best / worst rolling 12-month return + the specific date windows.
- Asymmetry ratio (|best 12m| / |worst 12m|): direct measure of right-tail dominance.
- % of months as top-performing sleeve: how often does each sleeve win? Captures the "when C wins, it wins often and big" property that bootstrap-on-Sharpe missed.
- Regime decomposition across 4 hand-picked sub-windows (Q4 2018 Powell pivot, COVID + thematic boom Mar 2020 → Feb 2021 ARKK peak, 2022 inflation crash, 2024 AI surge). Per-strategy total return + max DD in each.
Empirical findings — the C optionality case in numbers:
- C's best rolling 12-month return: +162%. Vs A +85% / B +43% / D +66%. C delivers the largest absolute upside tail in the universe by 2-4×.
- C is the top-performing sleeve 41% of months. A is top 21% / B is top 10% / D is top 28%. When C wins it wins more often than any other sleeve.
- COVID + thematic boom (Mar 2020 → Feb 2021): Strategy C standalone +170%. The 50/50 A:B blend (no C) got +58%. The 4-way blend (10% C) got +65%. The small C sleeve added ~+7pp to blend return in an 11-month window — capturing roughly 70% of C's idiosyncratic alpha at just 10% sleeve weight.
- 2022 inflation crash: C was worst (-25%), but the 10% cap limited blend damage to -13% (the 4-way blend was actually 2pp worse than the 50/50 A:B — a -2pp option premium for the +7pp option payoff). The asymmetric structure works.
- Q4 2018 Powell pivot: C +4.4% (defensive!) while A/B/D all -5 to -10%. The cash-floor mechanism (when fewer than K candidates clear the +5% floor, the deficit sits in IEF) kicked in during the sell-off.
The reframing: Strategy C earns its 10% sleeve weight through asymmetric convexity. Best 12m return is 2-4× any other strategy. Top sleeve 41% of months. The 10% sleeve cap structures it as a long-dated out-of-the-money call basket — small premium (the modest drawdown contribution), unbounded upside (any future thematic boom). Bootstrap-on-Sharpe gives p(better)=42% because Sharpe penalises upside vol; that finding is technically true but misses the optionality value entirely.
Generalisable lesson: when evaluating a strategy or sleeve, ask "is this symmetric alpha or asymmetric optionality?" first. Symmetric alpha → Sharpe + bootstrap CI is the right gate. Asymmetric optionality → right-tail metrics + regime decomposition + sleeve-cap structure is the right gate. Using the wrong gate leads to wrong conclusions; Phase 7 nearly did so for C.
Implication for fund deployment: the 4-way 35/35/10/20 blend remains the deployed default. Phase 8 strengthens the case for keeping the 10% C sleeve rather than dropping it to deploy 50/50 A:B. For Navigo marketing materials to AIs, the right framing is: "core risk-adjusted return from A+B+D (~+1.13 Sharpe), with a 10% optionality sleeve via C that delivered +170% during the 2020-21 thematic boom and is positioned to capture the next regime that emerges". The Sharpe-only framing undersells the strategy.
Universe and data sources
14 ETFs in the universe: SOXX (semis), CSP1 (S&P 500), CNDX (NASDAQ-100), and the S&P 500 sector slate — IUES (Energy), IUFS (Financials), IUHC (Health Care), IUIS (Industrials), IUCS (Consumer Staples), IUCD (Consumer Discretionary), IUUS (Utilities), IUMS (Materials), IUCM (Communication Services). Plus IUSP (US REITs as Real Estate proxy — see caveat) and IDP6 (S&P SmallCap 600).
Pruned (May 2026): IUIT (S&P 500 Info Tech) was removed because of 0.97 correlation with CNDX (NASDAQ-100). CNDX has much higher trading liquidity via its US-listed equivalent QQQ, so keeping IUIT was double-counting the same large-cap tech bet with a less-tradeable variant. The prune improved walk-forward Sharpe by +0.034 (Robustness Test 12).
Trading proxies: SPDR Select Sector ETFs (XLE, XLF, XLV, XLI, XLP, XLY, XLU, XLB, XLC, XLRE). CSP1 trades via SPY; CNDX via QQQ; SOXX trades itself; IDP6 trades via IJR.
Real Estate caveat: no iShares UK S&P 500 Real Estate sector UCITS exists as of 2026-05. IUSP (FTSE EPRA NAREIT US Dividend+ Index, ~38 US REITs) is used as a substitute. The constituent set is broader than the S&P 500 Real Estate sub-sector, but the breadth metric (% above 200d MA) is still constituent-relative and remains comparable.
Data sources: iShares UK holdings endpoint for point-in-time constituents (weekly Friday snapshots back to 2018-01-05), yfinance for adjusted close prices of constituents and trading proxies. All data is free and reproducible from the scripts in this repo.
OOS validation
Both Strategy A and Strategy B are validated walk-forward (see Robustness → Test 10 for Strategy A's annual K refit and Strategy B's tab for its own walk-forward). The breadth signal itself was earlier validated via a 2022-09-08 train/test split on a composite breadth indicator: the train-half winner held up on the test half (train Sharpe +0.94 → test Sharpe +1.42 on SOXX; train winner ranked #1 of 11 cells on test grid). The current top-K rotation paradigm strips away the per-ETF threshold tuning that earlier work relied on, eliminating most of the in-sample selection bias.
Best practices for parameter optimisation (lessons from this project)
The Robustness tab demonstrates that the headline Sharpe was materially inflated by in-sample optimisation. Nine rules of thumb learned the hard way:
- Always report walk-forward (or train/test) Sharpe alongside in-sample. The in-sample number is biased upward by ~0.3 Sharpe in our case. If you only have IS, halve your conviction.
- Robust over optimal. A parameter that's "second-best" but stable across sub-windows is more deployable than a parameter that's "best" but jumps each refit. Our walk-forward L sequence for CSP1 was [55, 80, 75, 75, 75] — wildly unstable; deploying any single one of those would have given different results.
- Reduce degrees of freedom. Per-ETF tuning adds N free parameters. A single global parameter (e.g., L=60 for all 11 ETFs) had ZERO tuning cost, and beat walk-forward L on 8 of 11 ETFs. Less is more.
- Use external priors. If literature gives you a number (Zweig's L=60 breadth threshold), USE IT. A prior-based value has no overfit cost; a data-driven value does.
- Lower the rebalance frequency to lower in-sample noise transmission. Daily rebalancing transmits each fitted-L noise into more daily decisions. Bi-weekly rebalancing reduces the effective number of "actions" per unit time, smoothing the noise.
- Stress-test by regime, not just full-window Sharpe. The sub-period decomposition showed our strategy underperformed BH in 2019 pre-COVID, 2022 inflation shock, and 2023 AI rally. Full-window Sharpe hid this. Decompose by regime — if performance is concentrated in one period, you're sampling-lucky.
- Avoid the joint maximum across multiple parameters. If you can choose L, MA-period, cadence, AND base/thrust allocation, picking the joint argmax has compounded selection bias. Pick one parameter at a time on independent data, or use a fixed-parameter version.
- If walk-forward Sharpe ≤ BH Sharpe, abandon the strategy. No amount of in-sample tweaking saves a strategy whose held-out performance is no better than the passive benchmark. CSP1 fails this test (WF 0.57 vs BH 0.84); SOXX with bi-weekly cadence passes (1.05 vs 0.98).
- Prefer relative signals to absolute thresholds. "Top K by breadth" is a relative statement that self-normalises across regimes; "breadth ≥ L" is an absolute threshold that has to be re-tuned every regime. Test 10 shows the cross-sectional rotation paradigm has zero walk-forward degradation, while threshold-based timing degrades ~0.3 Sharpe. Whenever a signal can be expressed as a cross-sectional rank instead of a level, prefer the rank version.
Practical deployment translation. The strategy that survives all the tests is paradigm 3 from Test 10: own the top K = 5-7 ETFs by current MA200 breadth, weight by breadth excess over the cross-sectional median, rebalance weekly Friday, no per-ETF threshold tuning, refit K only annually. Walk-forward Sharpe ≈ 1.05 with no degradation from in-sample. The previous "fixed L=60 on IUIT / CNDX / IUES" recommendation was correct given the framing of paradigm 2, but paradigm 3 is strictly better — it owns the same sectors when they lead, drops them when they lag, and never bets on a single ETF being "above the L-line" in isolation. The original headline Sharpe 1.04 for CSP1 single-ETF was an artifact; the headline Sharpe 1.05 for top-K rotation is honest.
Caveats — what this backtest does not capture
Honest limitations of the deployed strategy as documented:
- Slippage on rebalances: rebalances assume mid-spread fills. Worst-case slippage may add another 5-10 bps per rebalance on top of the 5 bps transaction cost already modelled.
- Per-strategy cost calibration (Phase 12, 2026-05-25): replaced the uniform 5 bps with sleeve-specific costs that reflect the actual liquidity of each universe. Strategy A = 2 bps (very liquid SPDR Select Sector + SPY/QQQ). Strategy B = 2 bps (SPY/IEF/GLD/TLT — among the most liquid ETFs in the world). Strategy C = 5 bps (mid-liquidity thematics, mixed: ARKK/XBI are 1-3 bps but BLOK/PAVE/BOTZ are 5-10 bps). Strategy D = 9 bps (European UCITS on Xetra at 5-10 bps bid-ask plus 2-4 bps FX cost for USD-base investors). Net effect: deployed blend Sharpe lifted from +1.16 to +1.18 (~+0.015), with A and B benefiting from tighter realistic costs more than D loses from wider European spreads.
- Survivorship in some price series: a few historically-acquired constituents (~5-15% of the early-window roster for some ETFs) are missing from yfinance, biasing early-window breadth measurements toward survivors. Less of an issue post-2020. Phase 4 fix to
compute_ma200_breadthresolved the worse latent issue for non-US constituents. - Common window for the deployed blend is 2018-11 → 2026-05 (~7.5 years), constrained by Strategy C's data start date. Strategy B's 18-year history (2008-2026, covering GFC + COVID + 2022) cushions the blend through real crises in the per-strategy comparison, but the blend-level statistics only see 2020/2022.
- Concentration risk in Strategy A: even at K = 7 the top breadth tilts toward 1-2 dominant sectors, so a semis or energy reversal can move 10-20% of NAV within the sleeve in a week. The 35% sleeve weight in the 4-way blend caps blend-level exposure but does not eliminate this.
- Strategy B cash-floor mechanic: when fewer than K asset classes are positive-trend, the deficit sits in IEF (intermediate Treasury). Earns ~3-4% Treasury carry but introduces small duration risk in a rising-rate shock.
- Strategy C walk-forward Sharpe is +0.46 vs in-sample +0.83 (Phase 17 numbers, 22-ETF universe with CQQQ added to the prior 21). The Phase 16 commodity-equity additions plus Phase 17's CQQQ are net-neutral on in-sample (−0.01) and add a small drag on walk-forward (−0.04 vs Phase 16) — but the blend-level cost is essentially zero (−0.002 Sharpe) while max DD improves by 0.27pp. The 10% sleeve cap is the risk-management response to expected thematic-momentum degradation. C earns its place on right-tail / regime metrics (Phase 8) and on cumulative blend-level improvements through Phase 15.2 + 16 + 17.
- Phase 17.1 — China A-share semiconductor exposure (159801.SZ) added via FX-aware download pipeline. 159801.SZ is the Bosera CSI Chip ETF, CNY-denominated and Shenzhen-listed. The backtest pipeline now downloads CNY prices, applies a daily FX conversion using yfinance's USDCNY=X spot, and reindexes onto the NYSE trading calendar with a 10-day stale-fill cap (covers Chinese New Year + October Golden Week SSE/SZSE closures). A 50bps annual expense ratio is applied as per-calendar-day compounded drag, same pattern as IBIT's 25bps on BTC-USD. Live execution is via IBKR Stock Connect (operationally smooth, confirmed by Eileen); 588200.SS (Harvest SSE STAR Chip) is an interchangeable Shanghai-listed alternative with 0.96 correlation to 159801.SZ but insufficient history (3.65y vs the 5y walk-forward minimum). Also: because 159801.SZ inception (2019-08) is after BLOK's 2018-01 binding date, the script's eligibility logic was refactored to treat it as a late-inception ticker that does not constrain the backtest window start — without this fix, naively adding 159801.SZ would have collapsed the backtest from 7.5y to ~6y. The K=4 momentum picker excludes NaN signals automatically, so the late-inception decoupling is safe and lossless.
- Phase 16 / 17 (2026-05-26) — SLV (B) and KWEB (C) were tested and REVERTED. Two illustrative cases of corr-gate-passing candidates that empirically failed the deployment test:
- SLV (silver) passed Strategy B's gate (0.78 vs GLD) but dragged B's Sharpe from +0.99 to +0.81 (−0.18) and widened max DD from −14% to −27% (+13pp worse). Mechanism: silver's chop-then-reverse profile poisons a top-K momentum signal in a way GLD's smoother trend behaviour does not.
- KWEB (China Internet) passed Strategy C's gate (0.57 vs LIT) but together with CQQQ dragged Strategy C's walk-forward Sharpe from +0.50 to +0.37 (−0.13). Mechanism: the 2021-2023 China internet crackdown (−70% over two years) created a unique drawdown profile that the K=4 momentum signal repeatedly bounce-traded and got chopped on. CQQQ-alone keeps the China-tech diversification without KWEB's crackdown-specific drag.
- Bitcoin (BTC-USD) is included via a spot-index proxy with IBIT-equivalent cost (Phase 15.2). IBIT (iShares Bitcoin Trust) only launched 2024-01-11, too short for the 5-year walk-forward methodology. The strategy therefore backtests using the CoinDesk Bitcoin Reference Price (BTC-USD, 8.4y of history) with IBIT's 25 bps annual expense ratio applied as a per-calendar-day compounded price drag, so the historical return path matches what an IBIT holder would have paid. Live execution remains in IBIT, which tracks BTC-USD with negligible error post-launch. GBTC (Grayscale Bitcoin Trust) was considered as the backfill but rejected: its market price traded at a 10-50% premium to NAV 2015-2021 and a 20-50% discount 2022-2023 (collapsing only on the Jan 2024 ETF conversion), so GBTC-price momentum would have captured the GBTC-discount narrative rather than BTC momentum. BTC-USD has none of that fund-structure noise. Per-ETF cap (PER_ETF_CAP = 35%, but equal-weight allocator gives 1/K = 25%) × 10% sleeve weight = max ~2.5% of NAV in BTC at any one time.
- Strategy D constituent breadth depends on a Phase 4 pipeline fix:
compute_ma200_breadthoriginally usedmin_periods=200which silently dropped the MA200 for any non-US constituent with sparse missing days. Without the fix (relaxed to 90% of window), D's signal froze on a 2023-04-06 ffill'd value. Documented in the Phase 4 retrospective. - The +0.06 deployed-vs-baseline Sharpe improvement is not statistically significant at 5% (Phase 7 bootstrap 95% CI [-0.04, +0.17], p(better) 83%). Directionally supported by mechanistic rationale + consistent across multiple comparison points, but await more OOS data for statistical confirmation.
Repo + reproduce
Full source at github.com/phuazz/breadth-thrust-etf. To reproduce the deployed dashboard end-to-end:
Data fetching (one-off per ETF):
python scripts/fetch_constituents.py --etf {SOXX | CSP1 | CNDX | IUES | ... | EXV1 | EXH1 | ... }for each of the 14 US sector + 5 Europe sector ETF symbols (seescripts/etf_registry.pyfor the full list)python scripts/compute_breadth.py --etf {symbol}for each ETF that uses constituent breadth (Strategy A + D universes)
Strategy engines (per-deployment refresh):
python scripts/run_topk_robustness.py— Strategy A: top-K rotation, K × cadence grid, walk-forward, trade historypython scripts/run_asset_class_rotation.py— Strategy B: asset-class momentum rotation (14 broad ETFs)python scripts/run_thematic_rotation.py— Strategy C: thematic momentum rotation (16 ETFs, equal-weight after Phase 6)python scripts/run_europe_rotation.py— Strategy D: Europe sector breadth rotation (5 Stoxx Europe 600 sector UCITS), with walk-forward (added Phase 7)python scripts/run_multi_strategy.py— combines A+B+C+D into 14 blend variants (3 4-way + 3 3-way + 3 2-way + 4 standalones + meta-rotation)
Validation + audit (Phase 7-8):
python scripts/run_phase7_bootstrap.py— moving block bootstrap on per-strategy + paired-differential Sharpe CIspython scripts/run_phase8_right_tail.py— Sortino, skewness, rolling 12m extremes, regime decomposition, % months as top sleeve
Dashboard build:
python scripts/pipeline.py— injects all JSONs intotemplate.html→ writesdocs/index.htmlfor GitHub Pages
Other:
python scripts/run_ma200_sweep.py— per-ETF MA200 baselines (legacy: feeds the Monitor tab's per-ETF state cards)python scripts/run_robustness.py— runs the historical robustness suite (Tests 10/11/12 plus legacy Tests 1-9 which are no longer rendered in the dashboard but remain in the JSON for archival reproducibility)
An earlier iteration explored a more complex composite breadth signal (RSI breadth + MA breadth + Highs breadth + thrust detection) before the simpler MA200 sweep showed it generalised worse. That work is in the git history. Phase 1-2 also tested per-ETF L-threshold strategies (legacy Robustness Tests 1-9) before Phase 3 chose top-K rotation as the deployed paradigm.