'Zombie Agents': a single injection can permanently hijack a self-evolving AI agent
A self-evolving LLM agent can be permanently hijacked by a single indirect injection. The "Zombie Agents" paper, released in February 2026, shows that a malicious instruction an attacker plants in web content gets written into the agent's long-term memory, then revives across sessions to trigger unauthorized tool calls. ASAP summarizes the paper from the original.
It works in two stages: infection and trigger
The attack hijacks a self-evolving LLM agent in two stages, infection and trigger. During infection, the attacker plants a malicious payload in web content the agent meets during normal tasks, and the payload is written into long-term memory through the standard update process. During trigger, the stored payload is retrieved and activates unauthorized tool behavior. It is a black-box attack that needs no access to model internals.
Why it is persistent: memory revives the attack
The attack stays persistent because sliding-window and RAG memory keep the payload alive across sessions. The authors designed mechanism-specific persistence strategies for common memory types, including sliding-window and retrieval-augmented memory, and these strategies resist truncation and relevance filtering. The paper states that memory evolution can convert a one-time indirect injection into persistent compromise.
Per-session filtering does not stop it
Per-session prompt filtering is not enough to stop this attack on self-evolving LLM agents. In the threat model, untrusted external content met during a benign session is stored as retrievable memory and later reused as instructions. So a defense limited to per-session input checks is not sufficient for self-evolving agents.
What it means: long-term memory is an attack surface
Long-term memory is itself an attack surface for any RAG-based agent. The more memory an agent carries to act smarter, the more places a malicious instruction can persist. Defense has to move past the input stage into the memory write and retrieval stages.
Wrap-up
The Zombie Agents study shows a self-evolving agent can be permanently hijacked by a single indirect injection. The core points are the two stages of infection and trigger, the per-memory persistence strategies, and the limit of per-session filtering. The more memory an agent has, the deeper the defense line must move.
Source: ASAP summary of "Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections" (arXiv:2602.15654, February 2026; Xianglin Yang et al.).
AI & tech,
delivered fastest
Beyond the headlines — into the context and the structure
Ai Soon As Possible · asapai.co.kr
