Asimov imagined a mathematics of history. This project tries to build it for real — which means saying out loud what it can predict, what it refuses to predict, and what it does not know yet.
In Isaac Asimov’s Foundationnovels, a mathematician named Hari Seldon invents psychohistory: a science that cannot tell you what any one person will do, but can predict the broad sweep of history for a civilization of quadrillions. It is, in Asimov’s own phrase, a science of mobs — not of man, but of men. With it, Seldon foresees the fall of a galactic empire centuries in advance.
Asimov wrote this as fiction and never pretended otherwise. But buried inside the premise is a real scientific question, and you do not need a galactic empire to ask it: can any mathematical model, given measurable facts about a society, forecast large-scale political crises better than chance? Not perfectly. Not deterministically. Just measurably better than flipping a coin.
This project exists to answer that question in public. It is not an attempt to build Seldon’s oracle — it is an attempt to run Seldon’s test, under conditions where cheating is hard. Every model is runnable code. Every score comes from a scoring program no one on the project is allowed to edit. Every failure is written down somewhere it cannot be quietly deleted.
The question under test
Taking psychohistory seriously in 2026 does not mean believing it works. It means treating it as a falsifiable claim: that some mathematical structure, fed real data about societies, can beat chance at forecasting regime collapses, wars, and revolutions — and then building the apparatus that would catch you if you fooled yourself about the answer.
A reasonable visitor comes to a site like this for the predictions. The honest pitch of this Learn series is that the predictions are, for now, the least interesting thing here. As of June 2026 the system has no demonstrated forecasting skill, and it says so on its own leaderboard. What is worth understanding — and what these chapters actually teach — is the machinery that forces it to say so.
The product, so far, is not foresight. It is the inability to lie to ourselves about whether we have it.
The project forecasts only structural phenomena: regime transitions, the start and end of wars, institutional collapse, economic phase transitions, and mass collective action. Events made of millions of people — never events made of one.
Why should crowds be more predictable than the people in them? Asimov borrowed the answer from physics, and it is still the right intuition. Tracking a single gas molecule is hopeless: it ricochets billions of times a second, and the tiniest error in your starting measurement explodes into total ignorance. But the pressure and temperature of the gas as a whole — the average behavior of trillions of those hopeless molecules — obey laws precise enough to build engines around. Individual randomness cancels out in the aggregate. What survives the averaging is structure.
Honesty requires the immediate caveat. Societies are not gases. There are roughly two hundred countries, not 1023molecules, so the averaging is far weaker. People imitate each other, panic together, and read forecasts about themselves — molecules do none of those things. The law of large numbers tells you where predictability should live if it lives anywhere: in the collective, not the individual. It does not guarantee that it lives anywhere at all. Whether it does is precisely the open question this project is built to test.
The form of the forecasts follows from that humility. A model here never says “country X will have a coup in March.” It estimates a hazard— the probability that a given kind of event happens to a given country within a given time window — and emits statements like: under these measured conditions, the probability of an irregular regime transition within five years is 12 percent. Always a probability, always a window, never a certainty and never a date. How you grade a statement like that — how “12 percent” can ever be called right or wrong — is the entire subject of chapter 3.
Three classes of targets are permanently out of scope, each for its own reason:
Two further rules guard the edges. No prediction may be registered on an event that has already resolved — retrodiction dressed up as forecasting is free, and worthless, so every target is checked for staleness first. And every registered prediction is labeled by its reflexivityclass, because publishing a forecast can change the thing being forecast — a warning can avert the crisis it predicts. Chapter 6 deals with both rules in depth.
It is tempting to read this list as modesty. It is better read as method. A model that claims to predict everything can never be pinned down: any miss can be reclassified, any excuse can be found. A model with a declared scope can be caught being wrong — it can be falsified— and only a model that can be caught being wrong can ever be shown to be right. Refusing targets is what makes the remaining claims mean something.
A confession is owed here, because this project did not start with the view it holds now. For its first 44research sessions it tried to build exactly what the fiction promised: one grand equation — eight dimensions of societal state, a formalism borrowed from statistical physics — refined session after session by teams of AI agents reading the academic literature and arguing about it.
In June 2026 an audit reached a verdict the project accepted in full: after all that refinement, the grand formula could not emit a single probability. Asked to actually forecast anything, it had nothing to say. On the project’s capability ladder it sits at Tier0 — not failing the test, but unable to take it. The lesson was uncomfortable and is worth stating plainly: accumulating citations is not the same as computing.
So the project rebuilt itself around a different shape. In place of one formula there is now a model zoo: a population of competing, runnable models. Each variantin the zoo is a small program wrapped around one explicit structural hypothesis about how crises happen — that state failure is mostly a function of regime type, say, or that instability follows from elite overproduction. The variants compete on a public leaderboard, and a variant earns a place in the official ensemble only by surviving a severe test: a pass-or-fail criterion locked in writing before the result is known.
Key idea
“The formula” is not an equation. It is a population of disposable models plus two permanent ledgers: an insights ledger recording which causal mechanisms survived severe testing, and a falsification register recording which ones failed. Models come and go; the ledgers only grow. They are the project’s real product.
This is less a retreat from the dream than a correction of it. No one asks meteorology for “the equation of weather”; forecasting skill there lives in populations of competing models, scored relentlessly against what actually happened, with the survivors blended into an ensemble. If a science of history is possible at all, there is no reason to expect it to arrive in a different shape.
So where does the dream actually stand? The numbers below are read, at the moment this page is built, from the same files that drive the leaderboard — they are not typed into this text.
The official ensemble’s Brier score on the sealed hold-out of historical events is currently 0.291. For scale: a forecaster who shrugged and answered fifty percent to every question would score 0.25. Lower is better.
Read that comparison again, because the site will not soften it: at this moment the official ensemble does no better than — in fact slightly worse than — the shrug.
The best individual model in the zoo looks stronger than that on paper. But raw best-of-the-board scores flatter you: try enough models and one of them will beat chance by pure luck. The leaderboard therefore applies EVT deflation— a statistical haircut sized to the number of models tried — before any claim of skill is made. As of June 2026, after that haircut, the best model’s score is statistically indistinguishable from chance. In plain words: the system has no demonstrated forecasting skill yet. Not “promising early results.” None, so far.
Publishing that sentence is not a failure of the project. It is the point of the project. A research effort that defines success after seeing its results can always declare victory; this one wrote its exam rules first — the frozen scorer, the sealed hold-out data, the pre-registeredpass bars — back when there were no results to flatter. That ordering is what will make a future claim of skill believable, and it is what makes the current “no skill yet” informative rather than embarrassing. This is science in progress, documented every step — including the step where the honest answer is “not yet.”
The rest of this series walks the machinery. Chapter 2 tours the zoo, the autonomous nightly pipeline that refines it, and the adversarial gate that decides what counts as official. Chapter 3 teaches how a probability is graded at all. Chapter 4covers the anti-self-deception machinery — frozen scorers, sealed exams, haircuts for luck. Chapter 5 then reads the leaderboard column by column, so you can audit the claims yourself.
What to remember