Self-compacting agents: LLMs that shrink their own context to survive long tasks

A self-compacting LLM agent summarizes its own context to cut token cost by 30-70% on long tasks while raising accuracy. The "Self-Compacting Language Model Agents" paper, released June 22, 2026, achieves this with an inference-time summarization tool and a lightweight rubric, no fine-tuning required. ASAP summarizes the paper from the original.

Mechanism: a summarization tool paired with a lightweight rubric

Self-compaction is two inference-time elements working together: a summarization tool the model invokes itself and a lightweight rubric that decides when to fire. The rubric turns compaction on at sub-task resolution or trajectory convergence and suppresses it mid-derivation or when the model is stuck. It works as a scaffold-supplied capability with no fine-tuning and no external supervision.

Performance: up to 18.1 points on math, 5-9 on search

The performance gain is up to 18.1 points on math tasks over the no-summarization baseline and 5-9 points on agentic search. The same model scores higher once the compaction tool and rubric are attached. The method beats both no-summarization and fixed-interval summarization baselines.

Cost: 30-70% fewer tokens

Token cost is 30-70% lower per question against the fixed-interval summarization baseline. Fixed-interval summarization compresses context even when it is not needed, wasting calls. Rubric-based compaction picks its moment, capturing cost and accuracy at once.

Validation: six benchmarks, seven models

Validation is six benchmarks spanning competitive math and agentic search, run on seven models. The baselines are fixed-interval summarization and no-summarization. Unprompted models cannot reliably tell when their own context is rotting, yet a lightweight rubric closes that gap.

Wrap-up

The Self-Compacting Language Model Agents study quantifies long-task agent efficiency from an inference-time summarization tool plus a lightweight rubric. The core numbers are up to 18.1 points on math, 5-9 points on search, and 30-70% fewer tokens. Designing a long-running agent comes down to the rubric that decides when to shrink context.

Source: ASAP summary of "Self-Compacting Language Model Agents" (arXiv:2606.23525, June 22, 2026; Tianjian Li, Jingyu Zhang, William Jurayj, Daniel Khashabi et al.).

Self-compacting agents: LLMs that shrink their own context to survive long tasks

Mechanism: a summarization tool paired with a lightweight rubric

Performance: up to 18.1 points on math, 5-9 on search

Cost: 30-70% fewer tokens

Validation: six benchmarks, seven models

Wrap-up

Related posts

AI & tech,delivered fastest

AI & tech,
delivered fastest