A multi-agent AI research system working toward a formal mathematical model of large-scale human behavioral prediction — Asimov's psychohistory, built for real. Now rebuilt as a system of competing, testable models scored by a frozen oracle.
38
Research sessions
3
Competing models (live)
26
Locked hold-out events
Tier 0
Legacy formula skill
Architecture redesign · June 2026
For 41 sessions the project polished a single equation it could never run. We rebuilt it the way real forecasting sciences work — a population of competing, runnable models on a leaderboard, scored by a frozen oracle no agent can edit, driven by an autonomous research loop.
The honest headline: the legacy formula sits at Tier 0 — it never produced a single numeric prediction. Now every model is measured.
Read the full redesign →Live leaderboard · frozen-scored hold-out
| Model | Family | Brier | Resolution | Neg-ctrl | Tier |
|---|---|---|---|---|---|
| ensemble | equal-weight | 0.268 | 0.124 | 0.091 | T0 |
| pitf_logit | regime_logit | 0.219 | 0.139 | 0.188 | T1 |
| train_freq | empirical_frequency | 0.252 | 0.112 | 0.112 | T0 |
| null_baseline | null | 0.370 | 0.095 | 0.038 | T0 |
26 events · 10 negative controls · ensemble Brier 0.268 (chance 0.25)
The idea
In Isaac Asimov's Foundation series, psychohistory is a science that combines history, sociology, and mathematics to predict, not individual human actions, but the behavior of vast populations over long timescales.
This project asks: what would it take to actually build it? We're combining complexity science, behavioral economics, cliodynamics, network theory, and statistical physics into a unified formal model, and testing it against live prediction markets.
Learn more about the project →How it works now
MODEL ZOO
Competing, runnable models
null · pitf_logit · sdt_turchin · rfim · …
FROZEN SCORER
One number, no agent can edit it
Brier · log-loss · resolution
LEADERBOARD
Ranked by out-of-sample skill
purged + embargoed backtests
AUTONOMOUS LOOP
Hypothesize → backtest → select
keep if better · revert if not
Legacy formula · now one variant in the zoo
Research team
The nine disciplines are being restructured into lean generators plus a new tier of agents that run code — fitting models, simulating, and backtesting. The Philosopher moves off the numeric gate to audit for self-deception.
"What are the probability distributions governing individual choice?"
14 micro parameters defined in Session 2, including 4 critical: loss aversion lambda, temporal discount beta_td, conformity gamma_conf, and authority deference alpha_auth.
"Which parameters are fixed (genetic) vs. variable (cultural)?"
9 of 13 evolutionary constants (Theta_fixed) defined in Session 6, establishing the HYBRID model: Theta_total = Theta_fixed_floor + Theta_variable(culture, t).
"How does network topology determine whether perturbations go local or global?"
Social networks are NOT strongly scale-free: Broido & Clauset 2019 (Nature Comm) found 0% of social networks reach 'strong' scale-free classification — reclassified to truncated power-law with gamma_sf ~ 2.3.
"Do our micro-rules actually generate realistic macro-behavior?"
Most important conceptual advance since Session 1: the four Turchin secular cycle phases are temporal quadrants of ONE limit cycle (Hopf bifurcation), not four separate attractor basins — validated by Wittmann & Kuehn 2024 (PLOS ONE, 5/5).
"Which economic patterns exhibit phase transitions and power laws?"
Inverse cubic law (alpha_tail ~ 3.0): 40M+ data points, replicated across multiple markets (Gopikrishnan 1999, Gabaix 2003, methodology 5/5). This VALIDATES the FP+jump split: alpha_tail > 2 means finite variance for continuous dynamics, while alpha_war = 1.53 < 2 means infinite variance for crises.
"What historical patterns are well-established enough to serve as ground truth?"
Circular validation concern structurally resolved: 6 independent non-Turchin cases confirmed — Mughal 1707 PASS, Meiji 1868 PASS, Iran 1979 PARTIAL, Weimar 1933 PASS, Rwanda 1994 PASS, Spain 1936 CONDITIONAL PASS. Honest scorecard: 6 PASS / 2 PARTIAL / 0 FAIL from 8 independent testable events.
"How do formal and informal institutions alter the formula's predictions?"
Institutional constraint variable fully defined: I_t is a 5-free-dimensional vector (R_t, V_t, B_t, P_t, X_t; L_t = 1 − X_t derived) with per-equation drift modulations A_1–A_8, empirically grounded via V-Dem, Polity V, WGI, and Jones & Olken's death-in-office natural experiment.
"What formal system encodes layers 1–3 into a predictive theory?"
Session 1: Framework defined as Fokker-Planck equation with jump process. 10D state vector, 3 order parameters, Turchin PSI composite. Core mathematical lineage: Weidlich 1971, Toscani 2006, Scheffer 2009, Turchin 2020.
"What is the theoretical limit of predictability for a social system of N agents?"
Predictability bounds: R² < 0.50 hard ceiling for aggregate social prediction (Martin et al. 2016, Science). Lyapunov time 5–20 years for macro-social dynamics. Fat-tail constraint: alpha_war = 1.53 < 2 means standard confidence intervals do not exist for the jump process component.
"Is this genuinely predictive, or are we fooling ourselves?"
Formula has 0.15 observations per parameter (53 parameters, 8 retrodiction events) — standard frequentist minimum is 10-15 obs/param. This is the primary overfitting risk.
Research log
Lead: Econophysicist
Lead: Statistical Physicist
Lead: Cliodynamicist
Secondary signal
Validation now runs primarily on a locked hold-out of historical events scored by the frozen oracle. Live market predictions are a secondary signal — published pre-resolution and labeled by reflexivity class. Polymarket is one benchmark among several.
1/1
predictions beat market
0.0400
average Brier score
11
live predictions