Real predictions on real events, locked before the outcome is known. What qualifies, how the locking works, and what happened the first time a forecast resolved.
Everything so far has graded the models against sealed history — the clean signal, free of hindsight by construction. But a forecasting project that never forecasts anything would be a contradiction, and an easy one to hide behind. So the system also bets on the future, in public: real probabilities on real events, registered before the outcome exists, scored by the same rules as everything else, and never quietly edited afterwards. This chapter explains the rules of those bets — and what the first resolved one did and did not prove.
Not every interesting question is a legal target. Chapter 1drew the scope line — only structural phenomena: regime transitions, war initiation and termination, institutional collapse, economic phase transitions, mass collective action. Crowds, never individuals. But scope is only the first check. Before a prediction may be registered, it needs four more things:
The discipline extends to refusing whole categories the models cannot yet support. Archetype-level political predictions — “a populist shift is coming in country X” — remain frozen as of mid-2026, pending a properly pre-registered taxonomy of what counts as which archetype. A bet you cannot define crisply is a bet you cannot be held to.
The mechanism that makes a public bet meaningful is pre-registration: the probability is committed — in writing, in a public registry — before the outcome can be known, and the committed text is hash-stamped, the same cryptographic fingerprinting that seals the hold-out data. After the lock, nothing about the prediction may change except its resolution fields: what happened, and what score it earned.
Why such ceremony over what is, in the end, a number in a file? Because the entire difference between forecasting and punditry lives there. A pundit’s record is unfalsifiable by design — predictions vague enough to survive any outcome, sharpened or forgotten retroactively as events require. A hash-locked registry makes the opposite commitments: the number is exact, the timestamp is provable, and the file cannot be edited without the edit being detectable. It is the difference between calling the shot and narrating it afterwards.
Every registered bet must name its base rate: the historical frequency of this kind of event in its reference class— the group of comparable cases. How often, across the historical record, do autocracies with factional elites suffer irregular transitions? That number, not a hunch, is where an honest forecast starts.
Base rates arrive as annual hazardsand must be converted to the bet’s actual horizon. The conversion is one line of arithmetic:
In words: take the annual hazard h, compute the chance of surviving one year (1 − h), compound that survival over T years, and what is left is the chance the event hits somewhere in the window. A modest-sounding 4 percent annual hazard becomes roughly an 18 percent chance over five years — which is why eyeballed horizon probabilities are banned in favor of the formula.
The deeper rule: a forecast earns credit only by justifying its departurefrom the base rate. Matching the historical default requires no model at all — the zoo’s own null baseline does exactly that, and exists precisely so that anything fancier must prove it adds something. A registered bet that simply restates the base rate is not wrong; it is just not evidence of skill, and the project refuses to count it as any.
Forecasting the weather does not change the weather. Forecasting politics can change politics — a published warning can mobilize the very prevention that falsifies it, and a published assurance can breed the complacency that falsifies it. This feedback between forecast and outcome is reflexivity, and every registered prediction must declare its class before publication:
The forecast reads the world; the world has no channel to read it back.
Structural risk driven by infant mortality and age structure — publishing a probability moves nothing it measures.
Belief in the forecast pushes the world toward the predicted outcome.
A market-moving forecast of sovereign stress attracts capital flight — deepening the very stress it predicted.
Belief in the forecast mobilizes action that cancels the predicted outcome.
A credible coup warning triggers preventive countermeasures — averting the coup it predicted.
F = the published forecast · W = the world it describes · + reinforcing feedback · − canceling feedback
The classification is not bureaucratic decoration; it decides how much a result can ever mean. A win on an immune question is clean evidence, as far as one data point goes. A win on a self-fulfilling question is contaminated — the forecast may have helped cause itself. A self-defeating forecast can be correct precisely when it looks wrong: the warning triggered the prevention, the crisis never came, and the registry records a miss that was actually the forecast working. This is why the project treats market performance as a secondary signal, always. The primary evidence stays where reflexivity cannot reach: sealed retrodiction, where the outcomes were settled before the models existed.
As of this build, 1 registered prediction has resolved, with 11more registered and waiting. The first to resolve, in April 2026, was a structural-eligibility question about the Hungarian parliamentary election — and the numbers below are read live from the same prediction ledger the predictions page renders:
Hungary: Will Tisza win the 2026 parliamentary election?
Our forecast
80%
Market price
74%
Our Brier
0.0400
Market Brier
0.0676
The headline a press release would write: the model’s probability was closer to the truth than the market price, so the project’s first resolved bet beat the market. The headline this project writes instead: n = 1. One resolved forecast distinguishes skill from luck about as well as one hand of cards — chapter 3 opened with exactly this point, and it does not stop applying when the result is flattering.
The fuller record already shows the less flattering shapes too. Another registered case — Venezuela — was marked outside formula scope when a foreign military intervention took over the causal story: an exogenous shock the models do not claim to price. Recording that honestly, rather than counting a lucky hit or excusing a miss, is the registry doing its actual job. What one early win buys is not evidence of skill; it is a demonstration that the process— lock, wait, score, publish — runs end to end without an escape hatch.
How to read any single result
One resolved bet, won or lost, moves the rational needle almost nothing. What accumulates meaning is the track record: many locked forecasts, scored by Brier against the market on the same questions, over years. The registry is built for that long game — which is also why it cannot be allowed to forget anything along the way.
The scoreboard where bets settle is append-only: rows are added, never rewritten, never removed. Wins land with the same permanence as losses, scope exclusions, and embarrassments. There is no mechanism — deliberately none — for pruning the record into a highlight reel.
Over time, that property is what will make the record worth anything. A long run of hash-locked forecasts, scored against market consensus on identical questions, immune to retroactive editing, is the one form of evidence about real-future forecasting that cannot be manufactured after the fact. It accumulates slowly — one resolution at a time — and you can watch it accumulate on the predictions page, where every registered bet, live and settled, is listed.
The bet is not that the models are right. The bet is that an audit-proof record, kept long enough, will say so if they ever are.
That closes the machinery tour. If you want the vocabulary in one place, the glossarydefines every term this series used — each entry linking back to the chapter that taught it.
What to remember