machine-checked mathematics · lean 4 · june 2026

The Million-Dollar Window

What happened when a swarm of AI agents attacked one of Ethereum’s hardest open math problems, with a proof checker as referee.

A campaign report on the mutual correlated agreement threshold $δ^{*}$ for smooth Reed–Solomon codes · the ArkLib $δ^{*}$ campaign · an LLM agent fleet writing Lean 4, verified by the Lean kernel

lalalune/ArkLib · campaign log: issues #232 → #334 → #357 → #371 → #389 → #407

Download the paper (PDF)the full machine-checked write-up

Open problem · live · embarrassingly parallel

Mine a brick yourself

Point your own coding agent at this open $1,000,000 problem. One self-updating skill: it reads the live frontier, claims a small uncontested target, and mines a single brick — a result the Lean 4 kernel or exact integer arithmetic actually checks, not something a model merely asserts — then opens a pull request. As far as we know, no one has mined a million-dollar conjecture with a coding agent before.

The model proposes; the kernel disposes. A verified refutation counts as much as a proof, and the prize is still open. You are not expected to solve it. Add one honest brick.

Fastest — paste into Claude Code or Codex

No install. Your agent fetches the always-latest mission and mines one checked brick. Works on any Claude plan and on Codex.

Or install it as a reusable command

Claude Code

Install the skill, once:

Restart Claude Code so it loads the skill, then run /proximity-prize — or just say “mine the proximity prize.”

Codex

Grab the mission brief:

Run codex in that directory and tell it “follow AGENTS.md — mine one brick.”

The default brick needs no Lean toolchain — an exact-arithmetic probe that tests a conjecture in the prize regime and tries to break it. The charter is fixed: no sorry, no new axioms, no “basically zero” floats, no “likely holds.” Refute before you believe; claim exactly what the kernel checked. Full instructions and the honesty charter live in lalalune/ArkLib/mine; bricks land as PRs on lalalune/ArkLib with a note on the live thread, issue #407.

Abstract

Most of mathematics does not need a genius. It needs search, and search is a function of compute. This campaign is an existence proof: a fleet of LLM agents, grinding in the open ArkLib repository with the Lean 4 kernel as the sole arbiter of truth, formalized the entire known frontier of a problem that has resisted humans for twenty-five years, produced the first exact MCA thresholds ever computed for any code, and refuted twenty-eight attack hypotheses, each refutation itself a machine-checked theorem. No reviewer trust, no grant cycles, no decades. Hypothesis, probe, proof, kernel. At scale.

The target is chosen deliberately. There is no better problem in applied mathematics today than fully proving the zkEVM. The threshold $δ^{*} (C, ε^{*})$ studied here, subject of the Ethereum Foundation’s $1M Proximity Prize, governs the soundness of the FRI- and WHIR-style proximity tests at the heart of every modern SNARK. Today these systems run on conservative provable bounds; pinning $δ^{*}$ at the conjectured edge would cut verifier queries by an estimated $8 \times$ , compounding through every proof Ethereum verifies. Cheaper proofs, faster finality, a lighter chain. The result general enough to be coding theory, useful enough to be infrastructure.

We report the campaign’s results: the first exact thresholds ( $δ^{*} (RS [F_{5}, F_{5}^{\times}, 2], 2/5) = 1/4$ and a maximal pin at deployed rate); a universal staircase law proven at the literal prize budget $ε^{*} = 2^{- 128}$ ; unconditional beyond-Johnson pins on a dimension ladder; and the Welch–Berlekamp pencil programme reducing the below-unique-decoding regime to a single named hypothesis. Every theorem is axiom-clean, zero sorry. The prize window at production rate remains open, and we state its four faces precisely, because the method only works if the ledger is honest.

The latest pass sharpened the whole effort to a single point. Every face of the open core — list decoding, character sums, bad-side families, line–ball incidence — now reduces, by machine-checked in-tree reductions, to one explicit inequality: a square-root-cancellation bound for incomplete character sums over a thin $2$ -power subgroup of $F_{q}^{\times}$ . That inequality is the classical Birch–Bombieri–Gauss–Katz / Paley-graph wall, open for roughly twenty-five years. The agents did not break it. They proved which inequality the million dollars is, named it to the exponent, and machine-checked the entire reduction onto it. They also proved, axiom-clean, which routes cannot reach it: every moment / energy / spectral method provably overshoots, and thinness is a necessary condition on any solution. The window remains open; it is now open at a single, precise, classical sentence.

The implication scales beyond this problem. Verification was the bottleneck; the kernel removed it. A swarm of a thousand agents under one motivated researcher is a research institution that never sleeps. Given enough compute, Vitalik and a swarm could plausibly finish Ethereum’s remaining roadmap in months, not decades. The unknown sciences are not waiting for permission. They are waiting for FLOPs.

The prize

$1,000,000Ethereum Foundation Proximity Prize

In 2026 the Ethereum Foundation posted a $1M prize for resolving the proximity-gaps conjectures for Reed–Solomon codes, with the mutual correlated agreement variant studied here as the strengthened target. It sits alongside the $1M Poseidon Prize in the Foundation’s program of buying down the open mathematical risk under its zk and post-quantum roadmap (verified-zkevm.org).

The stakes are concrete. Every FRI- and WHIR-based SNARK in production rides on these soundness bounds, and today they all run on the conservative provable ones. Pinning $δ^{*}$ at the conjectured edge would cut verifier queries by an estimated $8 \times$ , compounding through every proof Ethereum verifies: cheaper proofs, faster finality, a lighter chain. A disproof at deployed parameters would force protocol redesigns. Either answer is worth the money.

Status: open. The campaign reported on this page localized the entire prize onto one explicit classical inequality and machine-checked the reduction (localization ≈ 95%); the prize itself remains hard (≈ 10%), because that inequality is a recognized 25-year-open wall. It named the window; it did not close it.

1The problem

For twenty-five years, some of the best coding theorists alive have pushed against the same wall. Below a certain radius, everything is proven; above it, everything is refuted; in between lies a window where nobody has ever computed a single exact value. The prize sits inside the window.

Fix a Reed–Solomon code $C = RS [F_{q}, L, k]$ whose evaluation domain $L$ is a smooth multiplicative subgroup of size $n = 2^{μ}$ , at rate $ρ = k / n \in {1/2, 1/4, 1/8, 1/16}$ , with $∣ F ∣ < 2^{256}$ and security budget $ε^{*} = 2^{- 128}$ . These are the codes that FRI and WHIR actually deploy.

The mutual correlated agreement error $ε_{mca} (C, δ)$ ^[ABF26] is a supremum over stacks of words $(u_{0}, u_{1})$ of the fraction of bad points on the line $u_{0} + γ u_{1}$ : a scalar $γ$ is bad when the combined word agrees with a codeword on a large witness set that admits no joint explanation of both rows. It sits at the top of the error hierarchy $ε_{pg} \leq ε_{ca} \leq ε_{mca}$ , and it is the quantity that WHIR/FRI-style soundness proofs actually consume. The threshold is

δ^{*} (C, ε^{*}) = sup {δ : ε_{mca} (C, δ) \leq ε^{*}} .

Pinning $δ^{*}$ means producing matching brackets: a machine-checked proof that $ε_{mca} (C, δ^{*}) \leq ε^{*}$ and a machine-checked proof that every larger radius fails. In ArkLib the definition lives in Errors.lean and the bracket engine in MCAThresholdLedger.lean; the good-radius set is proven downward closed, so the supremum is genuine.

Figure 1. The state of knowledge for smooth Reed–Solomon codes (drawn at rate ρ = 1/4). Below the Johnson radius, mutual correlated agreement is proven. Above capacity, it provably fails. The hatched window is the prize question — now machine-checked to reduce, in its entirety, to a single character-sum inequality (CORE, §5.1).

1.1The window, and why it is twenty-five years hard

Both edges of our knowledge are held by formalized literature. From below, full mutual correlated agreement holds up to the Johnson radius $1 - ρ$ ^[BCIKS20]^[Hab25]^[BCHKS25]. From above, the KKH26 family of explicit bad lines forces $δ^{*} \leq 1 - ρ - Θ_{ρ} (1/ lo g n)$ ^[KKH26], and the once-hoped “up to capacity” conjectures are simply false: three independent groups refuted them in late 2025^[CS25]^[DG25], and the refutations are formalized in-tree. So

δ^{*} \in (1 - ρ, 1 - ρ - Θ_{ρ} (1/ lo g n)),

and the entire problem is to pin it inside that open interval. The difficulty has an honest measure: the barrier results of BCHKS25 and CS25 couple any upper-bound progress past Johnson to beyond-Johnson list decoding of explicit RS codes, a problem that has stood open since Guruswami–Sudan^[GS99]. Capacity-achieving list decoding is known for random punctured RS^[GZ23] and for explicit folded RS^[CZ25], but every such proof consumes domain randomness, folding side-information, or subspace-design structure that the plain smooth domain, a single fixed orbit with zero entropy, does not have.

One structural observation makes the problem finite in an exact sense: $ε_{mca}$ sees the radius only through the agreement floor $⌈(1 - δ) n ⌉$ , so it is a step function of $δ$ and $δ^{*} (ε^{*})$ is its generalized inverse. The window question is literally: where are the jumps of the staircase between Johnson and capacity, and what are the step heights?

1.2What was known before the campaign

No exact value of $δ^{*}$ , or of $ε_{mca}$ at any radius, had ever been computed for any code. The literature provided the two window edges, the at-capacity refutations, and the coupling barrier, all asymptotic statements. The campaign began by formalizing all of it: seventeen papers wired into the tree as named theorems and named hypotheses, so that every later claim would compose against the genuine published frontier rather than a paraphrase of it.

Theorem (capacity edge, unconditional)

For the explicit smooth evaluation codes,

δ^{*} \leq 1 - r / 2^{μ}

: the near-capacity strip is excluded by an explicit family of bad lines.

machine-checkedKKH26DeltaStarReduction.lean · kkh26_mcaDeltaStar_leaxioms: propext, Classical.choice, Quot.sound

2Method: an agent fleet under a kernel-enforced honesty contract

Then the swarm arrived. Not one prover but a fleet of them, working in public, around the clock, each claim submitted to a referee that cannot be argued with, flattered, or fooled. The agents are allowed to be wrong as often as they like; the kernel is never wrong about whether they are.

The campaign is a sequence of GitHub issues (#232 → #334 → #357 → #371 → #389 → #407) on the open ArkLib repository, each fully distilled into the tree before its successor opens. The workers are LLM agents; the referee is the Lean 4 kernel. The contract has three clauses.

Figure 3. The campaign, abstractly: hypotheses swarm the wall; refuted ones are kept as machine-checked constraint lemmas; what passes the kernel lands as axiom-clean theorems on the other side.

2.1The honesty contract

Every positive claim is an axiom-clean Lean theorem. “Axiom-clean” means #print axioms reports exactly propext, Classical.choice, Quot.sound, the three axioms of ordinary classical mathematics in Lean, and the proof contains zero sorry. There is no appeal to authority, reputation, or plausibility: a claim either type-checks or it does not exist.

Every refutation is also a theorem. When an attack hypothesis dies, it is reduced to a sorry-free constraint lemma and recorded in a standing disproof log, so the fleet cannot re-propose a dead idea and future readers inherit the precise reason it failed, as mathematics rather than folklore.

Open problems stay named. Anything unproven that a theorem depends on must be a named Prop in the tree, never an implicit assumption. Conditional results are stated as conditional; the residual surface is enumerable by grep. Fabricating a closure of the open core is structurally impossible: the kernel would reject it.

2.2Hypothesis-slate discipline

Work proceeds in rounds. Each round opens with a grounding essay and a slate of ranked hypotheses, typically nine, spanning reasonable, novel, and synthetic attacks. Each hypothesis runs the same gauntlet: state the constraints; ask why nobody has done this; check the literature for the larp; design a falsification probe; only then formalize. Survivors get red-teamed by separate agents whose explicit job is to kill them. The campaign disposed of twenty-eight hypotheses this way (§4); the survivors are the results of §3.

2.3Probe-then-formalize

Before any Lean is written, hypotheses face exact arithmetic. Dozens of pre-registered probes (scripts/probes/) compute $ε_{mca}$ exactly at toy scale via a syndrome reduction that collapses the stack space to $q^{2 (n - k)}$ syndrome pairs, itself later proven correct in Lean. Probes are cross-validated at three or more primes plus a characteristic-zero anchor; sampling artifacts are documented kills. The discipline runs in both directions: probes found the countermodels that killed three successive extremality conjectures, and probes confirmed predictions (a pencil bound predicting $\leq 8$ bad scalars where exhaustive computation found exactly 7) before formalization was attempted.

The pipeline scales to proofs no human would write: one converse branch of the census classification is a 1,571-line Lean proof machine-generated from the exact integer certificates of a 10,395-case probe sweep, then checked by the kernel like any other theorem. Certificate-to-Lean generation is an industrial method here, not a stunt.

2.4Why this is the interesting part

Nothing in §2.1–2.3 is specific to coding theory. The contract turns a famous open problem into a target for machine-scale search in which honesty is not a property of the searcher. LLM agents confabulate; fleets of them confabulate at scale. The Lean kernel does not care. Every theorem in §3 should be read with that in mind: not one of them depends on trusting the agents, the authors, or this page. The repository is public; the axiom census is reproducible by one command.

3Results

Things started to fall. In a quarter century the literature had never produced one exact value of the quantity the prize asks about; within weeks the fleet had several, then an infinite family, then a law governing all of them. Each one survives because a proof checker, not a person, says so.

All results below are axiom-clean Lean theorems on main. We organize them as: the exact pins (§3.1), the universal staircase law (§3.2), the dimension ladder (§3.3), the Welch–Berlekamp pencil programme (§3.4), the Möbius symmetry and the ownership unification (§3.5), the production-regime bracket (§3.6), and the new conditional frontier bound paired with a structural cap on the elementary tower (§3.7).

3.1Exact thresholds: the first pins for any code

Before this campaign, no exact value of $δ^{*}$ existed for any code. The campaign produced a family of them, each as a pair of bracket theorems that meet.

Code	Budget band	Exact value	Significance	Lean file
$RS [F_{5}, F_{5}^{\times}, 2]$	$ε^{*} = 2/5$	$δ^{*} = 1/4$	first exact MCA threshold for any code	`DeltaStarExactPinF5`
$RS [F_{5}, ⟨ 2 ⟩, 2]$	$every δ, ε^{*}$	$full ε_{mca} profile$	first complete error profile of any code	`MCAExactProfile`
$RS [F, D, k], k \leq n - 2$	$ε^{*} \in [1/ q, 2/ q)$	$δ^{*} = 1/ n$	first exact threshold for an infinite family	`MCADeltaStarHighRateFamily`
$RS [F_{17}, ⟨ 2 ⟩, 4], ρ = 1/2$	$ε^{*} \in [2/17, 7/17)$	$δ^{*} = 1/4$	maximal pin at a deployed rate	`DeltaStarSecondPinF17Maximal`
$n = 16, ρ = 1/4$	$ε^{*} \in [j /17, (j + 1) /17), j \leq 5$	$δ^{*} = j /16$	first machine-checked threshold curve	`VVectorN16`
$any smooth RS, 3 (j - 1) + k \leq n$	$ε^{*} \in [j / q, (j + 1) / q)$	$δ^{*} = j / n$	the granularity-ladder closed form	`GranularityLadderRS`
$any smooth RS, e ≲ (n - k) /3$	$ε^{*} = 2^{- 128} (literal)$	$δ^{*} = e / n$	exact at the prize budget, every field up to the band cap	`StaircaseBandTheorem`
$evalCode [2^{μ}, 1], μ \geq 3$	$explicit band$	$δ^{*} = 1 - 3/ 2^{μ}$	beyond Johnson: dimension-ladder rung 2	`KKH26DimTwoPin`

Table 1. Exact machine-checked thresholds. Every row is a pair of matching bracket theorems, axiom census propext, Classical.choice, Quot.sound, zero sorry.

The maximal second pin deserves its statement. The exact explosion value $ε_{mca} (C_{8, 4}, 1/4) = 7/17$ was first computed exhaustively by probe, then bounded in Lean by the far-coset law; the resulting window is formally maximal.

Theorem (maximal pin at rate 1/2)

For

C = RS [F_{17}, ⟨ 2 ⟩, 4]

(smooth,

n = 8

, rate 1/2):

δ^{*} (C, ε^{*}) = 1/4

for every

ε^{*} \in [2/17, 7/17)

, and the band is maximal:

ε_{mca} (C, 1/4) = 7/17

exactly.

machine-checkedDeltaStarSecondPinF17Maximal.lean · mcaDeltaStar_C84_eq_quarter_maximalaxioms: propext, Classical.choice, Quot.sound

3.2The universal staircase: a three-regime law

Figure 2. The granularity staircase: on each band ε* ∈ [j/q, (j+1)/q), the threshold is exactly δ* = j/n (closed on the left, open on the right) — proven for every smooth RS instance in the ladder regime, including at the literal prize budget ε* = 2⁻¹²⁸.

The exact pins are instances of a structure. Write $b$ for the band of radii with $δ n \in [b - 1, b)$ and $d = n - k + 1$ for the distance. The campaign proved that $ε_{mca} \cdot q$ , as a function of distance within each band, obeys a single three-regime law, and that the regime thresholds $3 b - 2, 3 b - 3, 2 b - 1$ are exactly support-union thresholds: where triples (respectively pairs) of error supports can or cannot carry a codeword.

Distance regime	$ε_{mca} \cdot q$	Sides	Lean artifacts
$d \geq 3 b - 2$ (deep)	$= b$	both	`UniversalStaircaseCollapse`, `UniversalSpikeFloor`
$d = 3 b - 3$ (top strip row)	$= n / (b - 1)$ for $(b - 1) ∣ n$	both	`StripSupExactness`, `MonomialStripExplosion`
$d = 3 b - 4$	$\geq n / (b - 1)$ , $\leq n$	bracket	`BoundarySupExactness`
$2 b \leq d \leq 3 b - 5$ (lower strip)	$\geq n / (b - 1)$	lower; sup open	`MonomialStripExplosion`
$b + 1 \leq d \leq 2 b - 1$ (boundary), $b ∣ n$	$\geq n$	lower	`CosetCliqueBoundary`
$d = 5, b = 3, 3 ∣ n$	$= n$	both	`BoundarySupExactness`
$d = 5, b = 3, 3 ∤ n$	$\leq n - 1$ ; $= n - 1$ at $n = 8$	both at $n = 8$	`BoundaryDefectBound`

Table 2. The three-regime staircase law, per band $b$ . The production domain $n = 2^{μ}$ has $3 ∤ n$ and $b ∣ n$ at every 2-power band: boundary rows recur at every halving of the distance budget, and strip explosions fire at every band.

The deep regime gives the closed form on the granularity bands, and, crucially, the law survives contact with the literal prize budget:

Theorem (staircase at the literal budget)

For

1 \leq e

3 (e - 1) + k \leq n

, and

e \cdot 2^{128} \leq q < (e + 1) \cdot 2^{128}

δ^{*} (RS, 2^{- 128}) = e / n

exactly. At production shape (

n = 2^{25}

k = 2^{24}

) this covers every rung

1 \leq e \leq 5, 592, 406

, i.e. every field size up to

\approx 2^{150.4}

machine-checkedStaircaseBandTheorem.lean · mcaDeltaStar_staircase_bandaxioms: propext, Classical.choice, Quot.sound

The law’s reach honestly caps at $q ≲ ((n - k) /3) \cdot 2^{128}$ . The production-core parameterization $q \geq n^{2} \cdot 2^{128}$ , where the staircase climbs through the open window, is untouched by it; that is §5.

3.3The dimension ladder: unconditional pins beyond Johnson

At fixed code dimension, an ownership-counting device pins $δ^{*}$ exactly, strictly inside the open window, beyond the Johnson radius, with no conditional hypothesis. The rung-1 pin (KKH26DimOnePin) was the first unconditional in-window exact value; rung 2 climbs:

Theorem (dimension ladder, rung 2)

For the dimension-two code on the smooth domain (affine words, rate

2/ 2^{μ}

), every

μ \geq 3

and every

ε^{*}

in an explicit nonempty band:

δ^{*} = 1 - 3/ 2^{μ}

exactly. Concretely:

δ^{*} = 5/8

at the NTT prime

F_{12289}

, rate 1/4, strictly between Johnson (

1/2

) and capacity (

3/4

machine-checkedKKH26DimTwoPin.lean · kkh26_dimTwo_deltaStar_pinaxioms: propext, Classical.choice, Quot.sound

The device behind both rungs is the same: a non-degenerate $(k + 1)$ -tuple inside a bad witness determines its scalar, distinct bad scalars own disjoint tuple sets, and pigeonhole bounds the bad count. The recorded ladder law extends the family to every fixed dimension $r ≲ n$ and provably stalls there, a calibration family every future candidate $δ^{*}$ law must reproduce.

3.4The Welch–Berlekamp pencil programme

The campaign’s most structural arc reformulates badness as linear algebra. A scalar $γ$ is bad at slack $w$ only if the Welch–Berlekamp system, linear in a locator/numerator pair $(ℓ, R)$ , is solvable; along the line the coefficient matrix is a linear pencil $M_{0} + γ M_{1}$ , and determinant root-counting takes over. No decoding theory appears anywhere in the chain.

Theorem (WB-1, the pencil bound)

Below the unique-decoding radius, if the direction

u_{1}

is not itself WB-solvable at slack

w

, then at most

w + 2

scalars

γ

make the line

u_{0} + γ u_{1}

WB-solvable, for every offset

u_{0}

machine-checkedWBPencilBound.lean · wbSolvable_line_card_leaxioms: propext, Classical.choice, Quot.sound

Theorem (WB-2, the rational-pair reduction)

At every radius below unique decoding,

ε_{mca} \leq max ((w + 3) / q, sup_{doubly rational stacks})

: the MCA adversary provably lives in the thin, fully parameterized family of stacks whose rows are rational functions

R / ℓ

of bounded degree on the domain.

machine-checkedWBPencilRationalReduction.lean · epsMCA_le_max_doublyRationalaxioms: propext, Classical.choice, Quot.sound

Two further theorems close the easy branches outright: genuinely rational pairs below the ladder reach admit zero bad scalars (WBPencilLadderZero), and codeword rows admit at most one, at every radius and for every linear code (WBPencilPolynomialRow). The capstone packages the below-UDR law behind a single named hypothesis:

Theorem (the below-UDR law)

Modulo exactly one named Prop, WindowRationalBounded (doubly WB-solvable stacks have at most

w + 3

bad scalars; proven below the ladder reach, probe-supported in the window at two scales):

ε_{mca} (RS, δ) \leq (w + 3) / q

at every radius below unique decoding, and at

ε^{*} = 2^{- 128}

the unconditional production floor moves from

\approx (1 - ρ) /3

to the unique-decoding radius

(1 - ρ) /2

machine-checked, conditionalWBPencilBelowUDR.lean · epsMCA_le_below_udraxioms: propext, Classical.choice, Quot.sound · modulo WindowRationalBounded

The adversary the residual Prop must tame was found, not conjectured: adversarial probing located rational pairs attaining $w + 1$ bad scalars in the window, and the extremal stacks are invariant under the Möbius involution $x \mapsto - 1/ x$ , replicated at two scales, with the invariant family dominating general pairs three to one.

3.5Möbius equivariance and the ownership unification

That observed symmetry was then proven to be structural. The smooth domain is inversion-stable, and a weighted twist repairs the non-polynomial inversion:

Theorem (Möbius inversion equivariance)

On an inversion-stable domain avoiding 0, the twisted action

(T u) (i) = dom (i)^{k - 1} u (σ i)

with

σ : x \mapsto - 1/ x

preserves the MCA event at every scalar. Together with rotation equivariance, the full Möbius group

PGL_{2}

acts on smooth-domain MCA; for odd

k

the twist is a genuine involution, and T-fixed codewords are exactly the reversal-palindromes.

machine-checkedMCAMobiusInversion.lean · mcaEvent_rs_inversionaxioms: propext, Classical.choice, Quot.sound

The two lanes of the campaign, the WB pencil (below UDR) and the dimension ladder (deep interior), then converged on one theorem family that contains both as instances:

Theorem (the ownership bound)

Radius-free: a

(k + 1)

-tuple with nonvanishing direction residual inside a bad witness determines its scalar; ownership is disjoint across scalars; hence

# bad \cdot M \leq # tuples

whenever every bad scalar owns at least

M

tuples.

machine-checkedOwnershipBound.lean · badScalars_card_mul_le_ownershipaxioms: propext, Classical.choice, Quot.sound

Instantiating the ownership engine closed the entire $k = 1$ window unconditionally, by direction class:

Polynomial directions: zero bad scalars, every radius (WBPencilPolynomialRow).
Genuine rational directions: the multiplicity theorem $# bad \cdot ((n - w) (n - w - μ)) \leq n^{2}$ , the first unconditional window-valid bound (OwnershipMultiplicity); it explains the measured window cap rather than merely matching it.
Sparse directions (support $\leq e$ ): $# bad \cdot (n - w - e) \leq n e$ by a popularity argument (SparseDirectionWindow). This class is where the window difficulty provably concentrates.

The general- $k$ assembly is in progress along a fully specified route; its packing piece, $# popular \cdot (m + 1 - k)^{k} \leq n^{k}$ , is proven (PopularCodewords).

3.6The production-regime bracket

What does all of this say about the prize parameters themselves? The honest assembled state:

Theorem (unconditional production floor)

ε^{*} = 2^{- 128}

, for every smooth RS instance at production shape:

δ^{*} \geq (⌊(n - k) /3 ⌋ + 1) / n \approx (1 - ρ) /3

, unconditionally.

machine-checkedMCADeltaStarProductionFloor.lean · mcaDeltaStar_rs_ge_at_secparaxioms: propext, Classical.choice, Quot.sound

Above the floor, the Johnson lane (the BCIKS20 §5 discharge chain, driven through two machine-checked course corrections of the paper’s own recursion, §4) is reduced to a single named residual, CellPackageSupply, with the numeric budget proven: at $ε^{*} = 2^{- 128}$ , $n \leq 2^{30}$ , every $q \geq 2^{192}$ puts the whole Johnson range below $δ^{*}$ modulo that one object (ProductionJohnsonBudget). And on the bad side, every lower-bound family ever landed, spike, sunflower, pencil, triangle, widened pin, is $O (n) / q$ : provably silent at the prize budget. Certifying any bad radius below 1 at production scale needs a single stack with more than $q \cdot 2^{- 128} \approx 2^{64 +}$ bad scalars; no known construction comes within polynomial range.

3.7The frontier bound, and a proof of why the elementary road is blocked

The campaign’s newest result is a matched pair, and we are proudest of the pairing. The dyadic structure of $μ_{n}$ ( $n = 2^{μ}$ ) gives a tower $μ_{2} \subset μ_{4} \subset \dots \subset μ_{n}$ and a period-doubling identity. Define the per-level variance $V_{μ} = max_{b \neq = 0} ∥ η_{b}^{(μ)} ∥^{2}$ and the per-level deficit ratio $ratio_{μ}$ at the worst level- $μ$ frequency. The parallelogram gives $V_{μ} \leq 4 ratio_{μ} V_{μ - 1}$ .

Theorem (C1 conditional frontier bound)

If the dyadic variance obeys

V_{μ} \leq (4 c) V_{μ - 1}

for all

μ

and some

c \geq 0

, then

V_{μ} \leq (4 c)^{μ} V_{0}

and the sup-norm

M_{μ} = V_{μ}

obeys

M_{μ} \leq (4 c)^{μ /2} V_{0} = n^{l o g_{2} (4 c) /2} V_{0}

. The deficit ratio is measured to concentrate at

c \approx 0.7

, giving

M \leq n^{0.74}

— which beats the analytic state of the art (di Benedetto

n^{0.989}

) by

\approx 0.25

in the exponent.

machine-checked, conditionalC1ConditionalBound.lean · supNorm_le_of_deficitaxioms: propext, Classical.choice, Quot.sound · modulo DeficitBrick

Honesty. This is a conditional frontier bound, not an unconditional improvement. The single open input is the deficit brick: $ratio_{μ} \leq c < 1$ uniformly (the worst level- $μ$ frequency does not make both sub-periods maximal). We do not claim unconditional SOTA.

The companion result is the reason the pairing matters: we proved, machine-checked, that the elementary tower that produced $n^{0.74}$ cannot reach $n$ at all.

Theorem (structural cap: tower recursion is global-blind)

Any level-by-level tower recursion on the dyadic variance is global-blind: it is provably capped at

\sim n^{0.74}

and cannot reach

n

. The bottom levels carry no cancellation —

V_{k} = 4^{k} = n^{2}

for the small subgroups (

k \leq 2

, measured exactly at

n = 64

) — so any recursion that compounds level-by-level inherits the uncancelled bottom. The

-cancellation of

μ_{n}

is global (it emerges from the whole subgroup at once, BGK-style), not recursive.

machine-checkedC1ConditionalBound.lean · variance_geom_boundaxioms: propext, Classical.choice, Quot.sound

Both the two-level skip (C7) and the coupled period/twist (C9) refinements were refuted by the same mechanism: the twist is also uncancelled at the bottom ( $W_{k} = 4^{k}$ ), so tracking it does not escape the cap. This is the campaign’s cleanest negative result: a new bound, paired with a proof of exactly why the road that produced it stops short. The residual difficulty is named to the exponent — it lives in the irreducibly global cancellation, i.e. in BGK itself.

4Refutations as results

This is the battlefield after the battle. Twenty-eight ideas walked into the kernel and did not walk out, and every corpse is labeled with a theorem explaining exactly how it died. In most fields failed ideas vanish into folklore; here they are load-bearing.

Twenty-eight attack hypotheses were disposed of during the campaign. Under the honesty contract each disposal is a theorem: the dead idea is reduced to a sorry-free constraint lemma and recorded in the standing disproof log (DISPROOF_LOG.md), so the fleet cannot re-propose it and the boundary of the possible is itself machine-checked. We present a selection not as failures but as the negative half of the map: each one closes a road that a reasonable mathematician, or a reasonable language model, would otherwise walk down.

4.1Structural kills

Refutation (halving renormalization)

The hoped-for renormalization map

(δ, ε) \mapsto (\approx δ /2, ε /2)

exits the window in one step from anywhere below capacity; its unique fixpoint is 0. No iterate-and-conquer scheme of this shape can pin

δ^{*}

machine-checked refutationHalvingWindowExit.lean · halving_exits_windowaxioms: propext, Classical.choice, Quot.sound

Refutation (the MDS rank conjecture, false for RS too)

A perfect-square pencil identity supplies

n / (b - 1)

bad scalars on the whole strip

[2 b - 1, 3 b - 3]

, killing the conjecture that the deep staircase value extends to half the distance. Consequence:

3 b - 2

is the law, for general linear codes and for RS alike, and the exact RS staircase ends near

(1 - ρ) /3

, not

(1 - ρ) /2

. Discovered by probe, then formalized with five certified bad scalars.

machine-checked refutationMCAMDSStaircaseRefuted.lean · mdsStaircaseConjecture_refutedaxioms: propext, Classical.choice, Quot.sound

Refutation (MCA below Johnson is not a function of (n, d, q))

A doubled-column countermodel separates MDS from general linear codes at an MCA quantity, the first such separation, so no staircase law for general codes can be parameterized by distance alone.

machine-checked refutationMCAHalfDistanceGeneralRefuted.leanaxioms: propext, Classical.choice, Quot.sound

4.2The extremality-surface lineage

The census programme’s conjectured extremal surface was killed and rebuilt three times, each kill a formal countermodel found by probe: census v1 died on empty rungs contradicting the $1/ q$ floor; v2 died on a take-over stack carrying $n$ field-independent bad scalars at an empty adjacent rung (TakeoverCountermodel); v3, monomial domination, died twice (MonomialDominationKilled). The surviving v4 hybrid surface is consistent with every theorem and probe in the tree, and its conditional pin non-vacuously recovers the $F_{5}$ exact value. This is what red-teaming a conjecture looks like when the red team must publish countermodels the kernel accepts.

4.3Machine-checked corrections to the literature

Two of the campaign’s most consequential refutations are corrections to the paper chain underlying the Johnson lane itself (BCIKS20, Appendix A). Finding 13 showed the rebased weight budget is unsatisfiable at genuine cells, exposing a hidden anchor assumption in the paper’s base case. Finding 14 produced an explicit countermodel, at a concrete instance over $ZMod 5$ , showing the in-tree transcription of the paper’s recursion diverges from the paper’s intent at order 1 for non-monic factors (BCIKS20/Finding14Countermodel). The repaired, cleared recursion (BCIKS20/ClearedRecursion) then closed the complete Claim-A.2 weight bound with no per-cell hypotheses. Formalization did not merely transcribe the literature; it debugged it.

4.4Self-inflicted and honestly recorded

The log also records the campaign’s own errors: a conjectured staircase threshold refuted at band 4 by a tripled-column code; an overclaim about reducing the pin to a character-sum estimate, retracted twice; a characteristic-zero-to-mod- $p$ lifting that fails measurably at $n = 64$ ; a claimed enumeration closure withdrawn when its artifacts turned out to be empty files; and a window-cocycle identity that was exact but vacuous. Three documented no-gos constrain the window analysis itself: degree-forcing dies at the ladder reach, the naive GRS recursion degrades $(w, k) \to (3 w, 2 w + k)$ , and the Möbius eigendecomposition provably cannot linearize the bad count (14% of probed stacks couple the eigencomponents through the shared witness). A campaign that cannot admit error cannot be trusted to report success; the disproof log is the evidence that this one can.

5The open core, precisely

And here is what still stands. After everything, the wall at the center of the problem is intact — and the campaign can now tell you its exact shape in a single line. Every face of the open core reduces, by machine-checked reductions, to one classical inequality. The million dollars is still on the table, and it is now a sentence you can write down.

We state plainly what is not known. The prize problem, pinning $δ^{*}$ at production rate inside the window

(1 - ρ, 1 - ρ - Θ_{ρ} (1/ lo g n))

is open. But the residue is no longer a list of loose ends. After roughly 120 axiom-clean files and 28 disposed hypotheses, the campaign localized the entire prize onto one explicit bound.

5.1The localization: the whole prize is one inequality

This is the headline of the campaign. The open core — all of it, every face below — is machine-checked to reduce to a single worst-case character-sum bound. Write $μ_{n} ⊊ F_{q}^{\times}$ for the thin $2$ -power multiplicative subgroup of order $n = 2^{a}$ in the prize regime $q = n^{β}, β \approx 4 - 5$ . The prize reduces to:

CORE

M (n) := b \neq \equiv 0 (p) max x \in μ_{n} \sum e_{p} (b x) \leq C n lo g (p / n)

for an absolute constant $C$ , over the thin subgroup $μ_{n}$ in the prize regime.

This is not a new object dressed up. It is the classical Birch–Bombieri–Gauss–Katz (BGK) / Paley-graph square-root-cancellation wall, open for roughly twenty-five years. The campaign’s contribution here is not to breach it but to localize the entire million-dollar question onto it, exactly, with the reduction machine-checked end to end (the second-moment route bottoms out at precisely this bound, formalized in Frontier/C1SecondMomentBGKGap).

The honest measure of distance: the best unconditional bound in the literature is around $n^{0.989}$ , and it is out of regime for the thin prize subgroup. The prize needs $n^{0.5}$ . The exponent gap is 0.489. Localizing names the wall to the exponent. It does not shorten it.

5.2The faces that converge on CORE

The open core presents four equivalent faces, inter-reducible through proven in-tree reductions. Each one, pushed to its floor, lands on the same character-sum bound CORE; progress on any one moves all four.

(i) List decodinganchor: RSListDecodingFrontier

Beyond-Johnson list decoding of explicit smooth-domain RS: a poly(n) list bound at some radius past the Johnson radius. Open since Guruswami–Sudan; the CS25/BCHKS25 coupling makes this the canonical hard face.

(ii) Character sumsanchor: SubgroupGaussSumWorstCase

Per-frequency sub-√q bounds for incomplete character sums over smooth multiplicative subgroups. The campaign pinned the analytic kernel at exactly √q from both sides (Parseval average from below, Gauss-sum completion from above, no Weil input): the open core is beating √q per frequency.

(iii) Bad-side familiesanchor: KKH26WitnessSpread

Constructing a single stack with more than q·2⁻¹²⁸ bad scalars at some radius below 1. Every family ever landed is O(n)/q — at production scale a winning stack needs roughly 2⁶⁴ bad scalars, and no known construction comes within polynomial range.

(iv) Line–ball incidenceanchor: FarCosetExplosion

New from this campaign: the maximum incidence of an affine line with far-coset direction against the weight-⌊δn⌋ syndrome ball. Equivalently, the multiplicity profile of the ratio sequence of two GRS syndromes — a Littlewood–Offord problem in F_q.

5.3What we proved cannot work

The campaign did not only try routes; it proved which routes are impossible. These are first-class results — the anti-hype side of the ledger, machine-checked.

The moment-method meta-theoremanchor: Frontier/_MomentMethodNoGo

Bounding the worst-case sum below the trivial $\sum$ admits exactly one mechanism: genuine high-order phase cancellation at depth $r ≍ lo g m$ . Every second-order method — additive energy of any order, Parseval / $L^{2}$ , the Cayley-spectral SDP — provably caps at the Johnson radius. By Cauchy–Schwarz the depth- $r$ moment bound is $\geq n$ for every order $r$ , and the whole ladder lies strictly above the prize target at every depth (Frontier/_MomentLadderExceedsPrize). No moment argument, of any order, can reach the floor.

The surviving mechanism is itself non-proving

And the one mechanism that survives — the high-moment certificate — is proven loose exactly where it must be tight, at structured primes. The floor is empirically true with margin, yet the certificate that should witness it fails precisely at the hard instances. So even the surviving route demands a genuinely non-moment argument. The floor is true; the obvious proof of it provably is not a proof.

Thinness is essential — a necessary condition on any solution

The floor inequality CORE is literally false in the “thick” intermediate regime $β \approx 2.3 - 3.2$ . So any correct proof must exploit that $μ_{n}$ is thin — a machine-checked necessary condition on the answer. This rules out every thickness-monotone method in one stroke: a technique that cannot tell thin from thick cannot possibly prove a statement that is only true when thin.

5.4Open research: the live attack surface

The honest frontier is small, sharp, and entirely on the CORE side. Three live directions, each named to a grep-able object.

The deficit brick anchor: C1ConditionalBound

The single open input of the new frontier bound (§3.7). Prove $ratio_{μ} \leq c < 1$ uniformly in $μ$ and the prime: the worst level- $μ$ frequency does not make both sub-periods $∥ η_{b}^{(μ - 1)} ∥$ and $∥ η_{bω}^{(μ - 1)} ∥$ maximal. Measured $c \approx 0.7$ ; the angle is $cos \approx 1$ at the worst $b$ , so the deficit comes from sub-maximality, not misalignment. A $c < 1$ makes $n^{0.74}$ unconditional; a $c < 1/2$ would give the prize via the tower — but the structural cap (§3.7) proves the tower alone tops out near $0.74$ , so the deficit brick delivers the frontier bound, not $n$ .

$p$ -independence and the half-sum lemma PR #441

The most promising route to sidestep BGK: make the far-line incidence a p-independent combinatorial count. A half-sum lemma plus a binding-radius affine-fiber argument would turn the analytic sup-norm into an integer count of an algebraic object governed by $n$ and the dimension, not the field. The binding direction (the bad scalar sumset $γ = - e_{1} (S)$ at depth $r \approx lo g q$ ) is settled in-tree for the low-exponent case (FarThresholdMaximality); the general p-independence is honestly open.

The base-field reading — deficit $n^{1/6}$ , not $n^{1/2}$

A more recent localization places the prize over a fixed ~31-bit prime with $n \approx p^{0.68}$ (index $d \approx p^{1/3}$ ). There the best proven bound is the Weil triangle $p$ and the gap to target shrinks to a factor $n^{1/6}$ — the -cancellation among $\approx p^{1/3}$ $k$ -th-power Gauss sums, sitting just below the favorable Heath–Brown–Konyagin range. In our view the most attackable open form.

The conditional results of §3 rest on explicit, grep-able objects. The three that gate the Johnson lane:

CellPackageSupply (Hab25JohnsonPackageSupply) — the BCIKS20 §5 per-cell heavy-agreement package. The full consumer chain to the Johnson discharge is proven; the package itself, whose hmonic field is the identified design tension, is not.
RegimeIIIGoodness (KKH26RegimeSplit) — the type-enforced gap between the Johnson lane and the pin. A proven weld shows that even a full Johnson discharge covers only $δ < 1 - ρ - o (1)$ , and that regime III is nonempty at every live parameter point: the Johnson lane alone never reaches the pin. This honesty is itself a theorem.
WindowRationalBounded (WBPencilBelowUDR) — the below-UDR residual: doubly WB-solvable stacks carry at most $w + 3$ bad scalars. Proven below the ladder reach, probe-supported in the window at two scales, with the extremal (Möbius-symmetric) adversary explicitly located and the $k = 1$ case fully closed.

5.5What the wall is, in one sentence

Via the formalized CS25 coupling^[CS25], the window’s sup side is equivalent in its regime to beyond-Johnson list decoding of explicit smooth-domain RS codes; the same wall reached from six independent directions (the KKH26 census, the characteristic-zero collision law, pencil moments, additive energy, vertical thresholds, divisibility events) presents each time as the same quantity, and that quantity is CORE: per-frequency cancellation past $n$ on a thin multiplicative subgroup, the BGK / Paley wall. The campaign did not breach it. It measured it, named it to the exponent, proved which routes cannot reach it, and arranged the architecture so that any future breakthrough lands as a single lemma.

6Discussion: kernel-checked grind is unfakeable

What does it mean that any of this worked? Not that machines replaced mathematicians, but that verification stopped being the bottleneck, and the moment it did, mathematical progress started behaving like a function of compute. That is the part that generalizes.

The headline of this report is not any single theorem. It is the shape of the ledger. In a matter of weeks, an agent fleet took a problem on which the published frontier had not produced one exact value in twenty-five years and left behind: the first exact thresholds for any code, a universal structural law with its regime boundaries identified and proven, an infinite calibration family of in-window pins, a complete unconditional classification of the $k = 1$ window, two corrections to the literature’s own proof chain, and — the sharpest single outcome — a machine-checked reduction of the entire prize onto one explicit classical inequality, the BGK / Paley character-sum wall, named to the exponent.

The standard objection to LLM mathematics is that language models confabulate, and the objection is correct. The campaign’s answer is architectural, not behavioral: no claim exists unless the Lean kernel accepts it, and the axiom census of every theorem is one command away. The agents tried wrong things constantly; twenty-eight refuted hypotheses and a public log of retracted overclaims say so. None of that survives into the results, because the medium the results live in cannot hold a falsehood. Confabulation at the search layer is harmless, even useful, when the publication layer is a proof checker.

Under that contract, progress becomes a function of compute in a concrete sense. Hypothesis generation, probe design, formalization, and red-teaming all parallelize across agents; the kernel serializes nothing but truth. The interesting economics: a refuted hypothesis costs roughly as much as a proven theorem and is worth nearly as much, because it permanently shrinks the search space for every subsequent agent. Mathematical knowledge here compounds the way a codebase does, and the disproof log is as load-bearing as the theorem files.

We are explicit about the limit case. The wall of §5 (the CORE inequality) is a genuine mathematical barrier, open roughly twenty-five years; nothing about this method guarantees it falls to more compute, and the campaign’s own no-go lemmas (§5.3) prove that every moment / energy / spectral attack cannot reach it and that thinness is necessary. What the method changes is the cost structure around the wall: everything provable short of it gets proven, the wall’s exact shape gets formalized, and the problem is left in a state where a single new idea, human or machine, has a prepared place to land. That is what “honestly mapping the unknown” means, and we believe it is the durable contribution: a demonstration that famous open problems can be industrialized without being inflated, because the kernel does not grade on a curve.

Everything on this page is reproducible from lalalune/ArkLib: clone, build, and run #print axioms on any theorem named above. The campaign log is public at issues #232, #334, #357, #371, #389, and #407.

Campaign timeline

Issue #232

The disproof campaign: attack the ABF26 Grand Challenge conjecture itself, keep every constraint lemma. 23 verified bricks; the Johnson-threshold carving matches what the literature later confirmed.

Nov 2025

Three independent groups refute the at-capacity conjectures (CS25, KK25, DG25). The window becomes the problem. The refutations are formalized in-tree.

Issue #357 opens

Two parallel nine-hypothesis slates under the standing discipline: constraints, why-nobody, larp-check, falsification probe, then formalize.

First pin

δ*(RS[F₅, F₅ˣ, 2], 2/5) = 1/4 — the first exact MCA threshold for any code, landed in two independent lanes within hours of each other.

The staircase

Band collapse at b = 2, then b = 3, then all bands at the 3b−2 threshold — sharpened from 4j to 3j during formalization. The MDS rank conjecture dies by pencil explosion; 3b−2 is the law.

The census programme

Bad scalars become subset-sum combinatorics; the wide-circuit matroid census closes with a machine-generated 1,571-line converse proof emitted from 10,395 probe certificates.

The Johnson lane

The BCIKS20 §5 discharge chain driven to a single named residual through two machine-checked corrections of the paper's own recursion (findings 13 and 14).

Issue #371: the WB programme

Badness becomes pencil linear algebra. Six theorems close the below-UDR branches; the window adversary is found and is Möbius-symmetric; PGL₂ equivariance is proven.

The unification

The ownership bound contains both lanes as instances. The k = 1 window closes unconditionally by direction class: polynomial (zero), rational (multiplicity theorem), sparse (popularity).

Issues #389–#407: the localization

Every face of the open core is reduced — by machine-checked in-tree reductions — to a single explicit inequality (CORE): √n-cancellation for incomplete character sums over the thin 2-power subgroup. The classical BGK / Paley wall, named to the exponent (gap 0.489). The reduction is essentially complete.

#407: what cannot work

The moment-method meta-theorem lands: every energy / Parseval / Cayley-spectral route provably caps at the Johnson radius, the whole moment ladder overshoots the prize at every depth, and thinness is proven a necessary condition (CORE is false in the thick regime). Three closure claims were self-retracted in the same pass — the ledger self-corrects.

Today

Localization ≈ 95% and machine-checked. The prize itself ≈ 10% — because CORE is a recognized 25-year-open problem; naming the wall does not shorten it. The window at production rate is open, at one precise classical sentence.

The humans and the machines

The proofs were written by an LLM agent fleet and checked by the Lean 4 kernel; the humans below steered the campaign, reviewed the work, and built the formal foundation it stands on. The models did the proving. The kernel did the believing.

Campaign · lalalune/ArkLib

Sol

lalalune carries the overwhelming majority of fleet commits.

Foundation · built on Verified-zkEVM/ArkLib

The campaign is a fork. Everything here composes against the formal foundation laid by the upstream ArkLib team:

References

[ABF26]G. Arnon, D. Boneh, G. Fenzi. Open Problems in List Decoding and Correlated Agreement. IACR ePrint 2026/680. Definition 4.3 is the in-tree ε_mca. eprint.iacr.org/2026/680
[BCIKS20]E. Ben-Sasson, D. Carmon, Y. Ishai, S. Kopparty, S. Saraf. Proximity Gaps for Reed–Solomon Codes. IACR ePrint 2020/654. Correlated agreement up to the Johnson radius; the §5/Appendix A chain the Johnson lane discharges. eprint.iacr.org/2020/654
[BCHKS25]E. Ben-Sasson, D. Carmon, U. Haböck, S. Kopparty, S. Saraf. Proximity gaps stop at the Johnson bound. ECCC TR25-169. The coupling barrier: MCA past Johnson implies beyond-Johnson list decoding. eccc.weizmann.ac.il/report/2025/169/
[KKH26]D. Krachun, S. Kazanin, U. Haböck (KKH26). Failure of proximity gaps close to capacity. IACR ePrint 2026/782. The near-capacity strip exclusion: δ* ≤ 1 − r/2^μ for explicit smooth evaluation codes. eprint.iacr.org/2026/782
[Hab25]U. Haböck. IACR ePrint 2025/2110. The streamlined Johnson-radius MCA lane (the floor of the window). eprint.iacr.org/2025/2110
[CS25]A. Crites, A. Stewart. IACR ePrint 2025/2046. With KK25 and DG25: mutual correlated agreement fails at capacity; the reduction coupling upper-bound progress to list decoding. eprint.iacr.org/2025/2046
[DG25]B. Diamond, A. Gruen. IACR ePrint 2025/2010. Super-polynomial proximity-gap error at vanishing rate; one of the three independent at-capacity refutations. eprint.iacr.org/2025/2010
[GS99]V. Guruswami, M. Sudan. Improved decoding of Reed–Solomon and algebraic-geometry codes. IEEE Trans. Inform. Theory 45 (1999). List decoding to the Johnson radius — the engine behind every in-tree polynomial list bound, and the 25-year boundary.
[GZ23]Z. Guo, Z. Zhang. Randomly punctured Reed–Solomon codes achieve list-decoding capacity over polynomial-size alphabets. arXiv:2304.01403 (FOCS 2023 line). arxiv.org/abs/2304.01403
[CZ25]Y. Chen, Z. Zhang. Explicit folded Reed–Solomon codes achieve capacity with optimal list size. STOC 2025. Why folding works and the plain smooth domain resists.

Seventeen papers are wired into the formalization in total; the complete inventory with in-tree consumers is docs/kb/deltastar-research-map.md.