AlphaEvolve Deep Dive: How an AI Broke a 56-Year-Old Math Record
Google DeepMind's AlphaEvolve is a concrete answer to the question of whether an AI can discover new algorithms on its own. Through an evolutionary loop in which Gemini proposes code and an automated evaluator verifies it, it overturned a matrix-multiplication record that had stood unbroken for 56 years since 1969. It also took on more than 50 open math problems and improved some of them, and it even optimized Google's own data centers and Gemini's training. This article lays out what AlphaEvolve is, how it works, what it achieved, and its connection to recursive self-improvement (RSI).
What Is AlphaEvolve?
AlphaEvolve is a "Gemini-based evolutionary coding agent" released by Google DeepMind. To discover and optimize general-purpose algorithms, it combines the creative proposals of a large language model with verification by an automated evaluator inside an evolutionary framework.
The key point is that "humans don't tell it the answer." Given only the problem and the evaluation criteria, AlphaEvolve generates candidate solutions, scores them itself, and evolves toward better ones.
How It Works
The way it works is evolution itself. Gemini proposes code variations (candidate solutions), an automated evaluator actually runs that code and scores it, and the higher-scoring candidates survive to become the starting point for the next generation.
As this cycle of "propose → verify → select → repeat" runs thousands of times, it reaches solutions that would be hard for a human to think of. The core premise is that the problem must be "automatically verifiable," which is why it's especially strong in algorithmic and mathematical domains where answers can be scored.
Breaking a 56-Year Matrix-Multiplication Record
Its most famous achievement is in matrix multiplication. AlphaEvolve found an algorithm that computes the product of two 4×4 complex matrices using 48 scalar multiplications, surpassing for the first time the method by Strassen (1969), which had been the best known for this setting.
This is a record that had stood unbroken for fully 56 years. Matrix multiplication underlies nearly all computation, from AI training to graphics, so a single improvement can ripple out into wide-ranging efficiency gains.
Results in Math and Infrastructure
In mathematics, too, AlphaEvolve took on more than 50 open problems across analysis, geometry, combinatorics, and number theory. In about 75% it found solutions matching the best known results, and in about 20% it improved on the best known solution.
Its real-world infrastructure results are also significant. It improved scheduling in Google's data centers to recover computing resources, and it made a key matrix-operation kernel in Gemini's training 23% faster, cutting overall training time by 1%. At Gemini's scale, 1% amounts to hundreds of thousands of GPU-hours.
The Connection to RSI — and the Limits
Notably, AlphaEvolve even optimized the training of the very Gemini that underpins it. That comes close to a real-world case of recursive self-improvement (RSI), in which an AI improves the infrastructure of AI.
That said, it's no cure-all. AlphaEvolve's power comes from "problems that can be automatically scored," so it doesn't transfer directly to domains where it's hard to verify a correct answer cleanly. In other words, designing a "verifiable evaluator" is the true key to this approach.
References: Google DeepMind — AlphaEvolve · VentureBeat