Self-compacting agents: LLMs that shrink their own context to survive long tasks
A self-compacting LLM agent summarizes its own context to cut token cost by 30-70% on long tasks while raising accuracy. The "Self-Compacting Language Model Agents" paper, released June 22, 2026, achieves this with an inference-time summarization tool and a lightweight rubric, no fine-tuning required. ASAP summarizes the paper from the original.
Mechanism: a summarization tool paired with a lightweight rubric
Self-compaction is two inference-time elements working together: a summarization tool the model invokes itself and a lightweight rubric that decides when to fire. The rubric turns compaction on at sub-task resolution or trajectory convergence and suppresses it mid-derivation or when the model is stuck. It works as a scaffold-supplied capability with no fine-tuning and no external supervision.
Performance: up to 18.1 points on math, 5-9 on search
The performance gain is up to 18.1 points on math tasks over the no-summarization baseline and 5-9 points on agentic search. The same model scores higher once the compaction tool and rubric are attached. The method beats both no-summarization and fixed-interval summarization baselines.
Cost: 30-70% fewer tokens
Token cost is 30-70% lower per question against the fixed-interval summarization baseline. Fixed-interval summarization compresses context even when it is not needed, wasting calls. Rubric-based compaction picks its moment, capturing cost and accuracy at once.
Validation: six benchmarks, seven models
Validation is six benchmarks spanning competitive math and agentic search, run on seven models. The baselines are fixed-interval summarization and no-summarization. Unprompted models cannot reliably tell when their own context is rotting, yet a lightweight rubric closes that gap.
Wrap-up
The Self-Compacting Language Model Agents study quantifies long-task agent efficiency from an inference-time summarization tool plus a lightweight rubric. The core numbers are up to 18.1 points on math, 5-9 points on search, and 30-70% fewer tokens. Designing a long-running agent comes down to the rubric that decides when to shrink context.
Source: ASAP summary of "Self-Compacting Language Model Agents" (arXiv:2606.23525, June 22, 2026; Tianjian Li, Jingyu Zhang, William Jurayj, Daniel Khashabi et al.).
AI & tech,
delivered fastest
Beyond the headlines — into the context and the structure
Ai Soon As Possible · asapai.co.kr
