Google's Gemini 3.5 Flash beat last year's flagship: a 4x faster agent model

Google's small, fast model Gemini 3.5 Flash beat last year's higher-tier model on coding and agents. Released in 2026, Gemini 3.5 Flash scored 76.2% on Terminal-Bench 2.1, beating the higher-tier Gemini 3.1 Pro, and its output is 4x faster than frontier models. It shows the capability deflation of a small model surpassing a flagship from a year ago. ASAP summarizes the result from the primary source.

A small model beat last year's flagship

Gemini 3.5 Flash is a small, cheap model that surpassed last year's higher-tier model. It scored 76.2% on Terminal-Bench 2.1 and 83.6% on MCP Atlas, beating Gemini 3.1 Pro on coding and agent benchmarks. A lower-tier model surpassing a year-old flagship has become routine.

It is 4x faster

The speed gap is large too. Gemini 3.5 Flash processes output tokens 4x faster than other frontier models. It is built for multi-step agent work and long tasks, focused on action rather than simple chat.

Specs in numbers

The model is built with a large input window and reasonable pricing, about 1M tokens. The input window is about 1 million tokens and output up to 65,000 tokens. Pricing is $1.50 per million input and $9.00 per million output tokens, 3x the previous Flash but 25% cheaper than Gemini 3.1 Pro.

Capability deflation

The result from Gemini 3.5 Flash shows the price of capability is dropping fast every year. Last year's top performance now comes cheaper from a small, fast model this year. The same task can run on a lower-tier model, making it more important to pick a model per task.

What it means — model choice is a cost strategy

Gemini 3.5 Flash again shows that picking the model that fits the task decides cost. Reasoning and coding are verifiable domains, so a small model is increasingly enough. Using the most expensive model for every task gets steadily more wasteful.

Wrap-up

Gemini 3.5 Flash is a case of a small, fast model beating last year's flagship. Terminal-Bench 76.2%, 4x speed, and reasonable pricing are the basis. In an era when the price of capability falls, picking a model per task is a cost strategy.

Source: ASAP summary of reporting on Google's Gemini 3.5 Flash (2026; Terminal-Bench 2.1 76.2%, MCP Atlas 83.6%, 4x output, $1.50 input / $9.00 output per million).