In ProgressModel zoo · 10 live variants · Session 44

Building
Psychohistory

A multi-agent AI research system working toward a formal mathematical model of large-scale human behavioral prediction — Asimov's psychohistory, built for real. Now rebuilt as a system of competing, testable models scored by a frozen oracle.

Explore the redesign What is this?

Research sessions

Competing models (live)

Locked hold-out events

Tier 0

Legacy formula skill

Architecture redesign · June 2026

From one formula to a system of models

For 41 sessions the project polished a single equation it could never run. We rebuilt it the way real forecasting sciences work — a population of competing, runnable models on a leaderboard, scored by a frozen oracle no agent can edit, driven by an autonomous research loop.

The honest headline: the legacy formula sits at Tier 0 — it never produced a single numeric prediction. Now every model is measured.

Read the full redesign →

Live leaderboard · frozen-scored hold-out

Model	Family	Brier	Resolution	Neg-ctrl	Tier
ensemble	equal-weight	0.291	0.096	0.068	T0
pitf_logitexcl.	regime_logit	0.175	0.141	0.256	T2
hierarchical_bayesexcl.	empirical_bayes	0.219	0.084	0.278	T1
hazard_splineexcl.	hazard_spline	0.220	0.095	0.405	T1
conformal_wrapperexcl.	calibration_meta	0.221	0.121	0.158	T1
sdt_turchinexcl.	structural_demographic	0.230	0.173	0.222	T1
train_freq	empirical_frequency	0.234	0.064	0.161	T1
firth_logitexcl.	penalised_logit	0.269	0.192	0.297	T0
gbm_honestexcl.	gradient_boosting	0.281	0.115	0.144	T0
reign_logitexcl.	duration_logit	0.330	0.060	0.209	T0
null_baseline	null	0.370	0.095	0.038	T0

26 events · 10 negative controls · ensemble Brier 0.291 (chance 0.25)

The idea

Can mathematics predict the future of civilizations?

In Isaac Asimov's Foundation series, psychohistory is a science that combines history, sociology, and mathematics to predict, not individual human actions, but the behavior of vast populations over long timescales.

This project asks: what would it take to actually build it? We're combining complexity science, behavioral economics, cliodynamics, network theory, and statistical physics into a unified formal model, and testing it against live prediction markets.

Learn more about the project →

How it works now

MODEL ZOO

Competing, runnable models

null · pitf_logit · sdt_turchin · rfim · …

FROZEN SCORER

One number, no agent can edit it

Brier · log-loss · resolution

LEADERBOARD

Ranked by out-of-sample skill

purged + embargoed backtests

AUTONOMOUS LOOP

Hypothesize → backtest → select

keep if better · revert if not

Legacy formula · now one variant in the zoo

The 8-D Equation

Full breakdown →

dP(S_t, t)/dt = -∇·[A(S, Θ, G_t, I_t)·P] + ½∇²:[D(S, Θ, G_t)·P] + J[P]

Fokker–Planck equation with jump process · v0.6.7-rc7

dP(S_t, t)/dtThe rate of change of the probability distribution over civilization states over time

P(S_t, t)The probability distribution over all possible macro-states S at time t

S_tThe macro-state vector — 8 dimensions describing a civilization at time t: population (n), wage share (w), elite fraction (e), debt ratio (d), urbanization (U), polarization (π), institutional trust (T), network connectivity (κ)

A(S, Θ, G_t, I_t)The drift vector — how the civilization tends to move, given parameters Θ, network topology G, and institutions I

D(S, Θ, G_t)The diffusion tensor — uncertainty and random fluctuations, how much noise affects each dimension

J[P]The jump process — sudden discontinuous changes (crises, collapses, revolutions) governed by the Psi stress index

ΘThe full parameter set: micro behavioral constants + cultural variables

G_tThe network topology at time t — how ideas, fear, and influence propagate

I_tThe institutional vector — 5 dimensions: regime type (R), veto players (V), bureaucratic capacity (B), propaganda effectiveness (P), external constraints (X)

18 parameters

92 open caveats

confidence 6.84/10

Full breakdown →

Research team

A Lean Team
That Computes

The nine disciplines are being restructured into lean generators plus a new tier of agents that run code — fitting models, simulating, and backtesting. The Philosopher moves off the numeric gate to audit for self-deception.

All agents →

Micro-Foundation

Behavioral Neuroscientist

"What are the probability distributions governing individual choice?"

14 micro parameters defined in Session 2, including 4 critical: loss aversion lambda, temporal discount beta_td, conformity gamma_conf, and authority deference alpha_auth.

2 sessionsLast: May 9, 2026

Evolutionary Constants

Evolutionary Psychologist

"Which parameters are fixed (genetic) vs. variable (cultural)?"

9 of 13 evolutionary constants (Theta_fixed) defined in Session 6, establishing the HYBRID model: Theta_total = Theta_fixed_floor + Theta_variable(culture, t).

2 sessionsLast: May 18, 2026

Evolutionary Constants

Network Scientist

"How does network topology determine whether perturbations go local or global?"

Social networks are NOT strongly scale-free: Broido & Clauset 2019 (Nature Comm) found 0% of social networks reach 'strong' scale-free classification — reclassified to truncated power-law with gamma_sf ~ 2.3.

2 sessionsLast: May 14, 2026

Evolutionary Constants

Computational Sociologist

"Do our micro-rules actually generate realistic macro-behavior?"

Most important conceptual advance since Session 1: the four Turchin secular cycle phases are temporal quadrants of ONE limit cycle (Hopf bifurcation), not four separate attractor basins — validated by Wittmann & Kuehn 2024 (PLOS ONE, 5/5).

3 sessionsLast: April 27, 2026

Macro-Dynamics

Econophysicist

"Which economic patterns exhibit phase transitions and power laws?"

Inverse cubic law (alpha_tail ~ 3.0): 40M+ data points, replicated across multiple markets (Gopikrishnan 1999, Gabaix 2003, methodology 5/5). This VALIDATES the FP+jump split: alpha_tail > 2 means finite variance for continuous dynamics, while alpha_war = 1.53 < 2 means infinite variance for crises.

4 sessionsLast: May 19, 2026

Macro-Dynamics

Cliodynamicist

"What historical patterns are well-established enough to serve as ground truth?"

Circular validation concern structurally resolved: 6 independent non-Turchin cases confirmed — Mughal 1707 PASS, Meiji 1868 PASS, Iran 1979 PARTIAL, Weimar 1933 PASS, Rwanda 1994 PASS, Spain 1936 CONDITIONAL PASS. Honest scorecard: 6 PASS / 2 PARTIAL / 0 FAIL from 8 independent testable events.

4 sessionsLast: May 14, 2026

Macro-Dynamics

Political Scientist

"How do formal and informal institutions alter the formula's predictions?"

Institutional constraint variable fully defined: I_t is a 5-free-dimensional vector (R_t, V_t, B_t, P_t, X_t; L_t = 1 − X_t derived) with per-equation drift modulations A_1–A_8, empirically grounded via V-Dem, Polity V, WGI, and Jones & Olken's death-in-office natural experiment.

4 sessionsLast: April 23, 2026

Formalization

Statistical Physicist

"What formal system encodes layers 1–3 into a predictive theory?"

Session 1: Framework defined as Fokker-Planck equation with jump process. 10D state vector, 3 order parameters, Turchin PSI composite. Core mathematical lineage: Weidlich 1971, Toscani 2006, Scheffer 2009, Turchin 2020.

10 sessionsLast: June 10, 2026

Formalization

Bayesian Statistician

"What is the theoretical limit of predictability for a social system of N agents?"

Predictability bounds: R² < 0.50 hard ceiling for aggregate social prediction (Martin et al. 2016, Science). Lyapunov time 5–20 years for macro-social dynamics. Fat-tail constraint: alpha_war = 1.53 < 2 means standard confidence intervals do not exist for the jump process component.

6 sessionsLast: May 9, 2026

Cross-Cutting

Philosopher of Science

"Is this genuinely predictive, or are we fooling ourselves?"

Formula has 0.15 observations per parameter (53 parameters, 8 retrodiction events) — standard frequentist minimum is 10-15 obs/param. This is the primary overfitting risk.

11 sessionsLast: June 10, 2026

Research log

Latest Sessions

All 44 sessions →

Session 44June 10, 2026Approved with caveats

Session 44: Throughput Redesign — Panel, 10-Variant Zoo, PITF Falsified

Lead: Orchestrator + Zoo

·Ratchet engine replaced: UCB1 bandit + isotropic Gaussian swapped for per-variant scipy differential_evolution (champion seeded into initial population). Fixes the date-seed replay bug (same-day reruns replayed identical proposals) and the regression bug (Gaussian mutation could demote champions). Budget raised 40 → 400 evaluations; two same-day runs now explore distinct proposals with zero regression.
·F1 ablation FAILED a third consecutive time: Δres_train ran +0.0263 (2026-06-08, coverage 12/22) → +0.0132 (widened, stale champion) → −0.0044 (widened, after TRAIN-only re-tune). Philosopher verdict: the fixed-prior PITF hypothesis is FALSIFIED, not merely unproven — a real signal should have risen through +0.030 as the coverage confound was removed and factionalism came online; it fell monotonically through zero instead. Full entry in FALSIFICATION_REGISTER.md entry 7.
·Report-only hold-out Brier 0.1745 (best ever, below the 0.18 market line) explicitly DISREGARDED: not the locked criterion; hold-out resolution DROPPED 0.185 → 0.141; neg-ctrl Brier WORSENED 0.194 → 0.256 — the anti-signature of the claimed mechanism. Accepting it would have been a HARKing trap.

Session 43June 8, 2026Partial approval

Session 43: Feature-Wiring pitf_logit — F1 Severe Test FAILED

Lead: Orchestrator + Zoo

·Outcome-blind feature pipeline built: make_feature_request.py emits identity-only view (id, polity, t0, horizon_years, reference_class) of all 45 scoreable events — no outcome or label — so a blind Sonnet compute agent could fetch PITF covariates without ever seeing the hold-out.
·merge_features.py asserts every non-feature field (including outcomes) is byte-identical before and after merge; git diff touches only the features key. Coverage: 13/26 hold-out + 12/22 train events carry at least one real covariate.
·pitf_logit rewritten as a real PITF logit: hazard = base_regime_hazard × hazard_scale × exp(B_IMR×ln(IMR/world_median) + B_FAC×factionalism + B_NBR×neighbor_conflict). Betas are FIXED literature-seeded priors from Goldstone et al. 2010 (B_IMR=0.5, B_FAC=1.1, B_NBR=0.5) — not fit on any data.

Session 42May 19, 2026Approved with caveats

Session 42: F1 Pre-Registration HASH-LOCKED — β_fiscal=2.0 Anchored

Lead: Econophysicist

·F1 pre-registration HASH-LOCKED at SHA-256 c7584a779af9c0f3fe36d1bf8770b0be7f38e54b6764a472793fcfa530feb7b1 (5407-byte canonical JSON). C40-B HIGH Track A DISCHARGED-VIA-HASH-COMMIT. β_fiscal + d_threshold + η_pareto retirement + λ_temp_provisional retirement all PRE-REGISTERED CONDITIONAL on S43 F1 PASS + Critique 42+ ratification + C42-C HIGH discharge.
·β_fiscal = 2.0 [1.0, 4.0] derived from Schularick-Taylor 2012 AER (NBER WP15512) PRIMARY anchor. Central value from Table 3 spec 5 logit-to-sigmoid rescaling (β_logit = sum of 5 lag coefficients = 9.697 ± 2.920, AUROC=0.717, N=1272 country-years). Independently WebFetch-verified at Critique 42. Rescaling derivation owed as C42-E LOW appendix.
·d_threshold = 0.10 [0.05, 0.20] from Bernholz 2003 PROVISIONAL — Cliodynamicist primary-text discharge owed (29-vs-30 episode count; 10%-vs-30% threshold reconciliation). Constitutes C42-C HIGH gating block on v0.7-CANDIDATE promotion.

All 44 sessions →

Secondary signal

Live Market Predictions

Validation now runs primarily on a locked hold-out of historical events scored by the frozen oracle. Live market predictions are a secondary signal — published pre-resolution and labeled by reflexivity class. Polymarket is one benchmark among several.

1/1

predictions beat market

0.0400

average Brier score

live predictions

Full scoreboard →

BuildingPsychohistory

From one formula to a system of models

Can mathematics predict the future of civilizations?

The 8-D Equation

A Lean TeamThat Computes

Behavioral Neuroscientist

Evolutionary Psychologist

Network Scientist

Computational Sociologist

Econophysicist

Cliodynamicist

Political Scientist

Statistical Physicist

Bayesian Statistician

Philosopher of Science

Latest Sessions

Session 44: Throughput Redesign — Panel, 10-Variant Zoo, PITF Falsified

Session 43: Feature-Wiring pitf_logit — F1 Severe Test FAILED

Session 42: F1 Pre-Registration HASH-LOCKED — β_fiscal=2.0 Anchored

Live Market Predictions

Building
Psychohistory

A Lean Team
That Computes