'Zombie Agents': a single injection can permanently hijack a self-evolving AI agent

A self-evolving LLM agent can be permanently hijacked by a single indirect injection. The "Zombie Agents" paper, released in February 2026, shows that a malicious instruction an attacker plants in web content gets written into the agent's long-term memory, then revives across sessions to trigger unauthorized tool calls. ASAP summarizes the paper from the original.

It works in two stages: infection and trigger

The attack hijacks a self-evolving LLM agent in two stages, infection and trigger. During infection, the attacker plants a malicious payload in web content the agent meets during normal tasks, and the payload is written into long-term memory through the standard update process. During trigger, the stored payload is retrieved and activates unauthorized tool behavior. It is a black-box attack that needs no access to model internals.

Why it is persistent: memory revives the attack

The attack stays persistent because sliding-window and RAG memory keep the payload alive across sessions. The authors designed mechanism-specific persistence strategies for common memory types, including sliding-window and retrieval-augmented memory, and these strategies resist truncation and relevance filtering. The paper states that memory evolution can convert a one-time indirect injection into persistent compromise.

Per-session filtering does not stop it

Per-session prompt filtering is not enough to stop this attack on self-evolving LLM agents. In the threat model, untrusted external content met during a benign session is stored as retrievable memory and later reused as instructions. So a defense limited to per-session input checks is not sufficient for self-evolving agents.

What it means: long-term memory is an attack surface

Long-term memory is itself an attack surface for any RAG-based agent. The more memory an agent carries to act smarter, the more places a malicious instruction can persist. Defense has to move past the input stage into the memory write and retrieval stages.

Wrap-up

The Zombie Agents study shows a self-evolving agent can be permanently hijacked by a single indirect injection. The core points are the two stages of infection and trigger, the per-memory persistence strategies, and the limit of per-session filtering. The more memory an agent has, the deeper the defense line must move.

Source: ASAP summary of "Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections" (arXiv:2602.15654, February 2026; Xianglin Yang et al.).

'Zombie Agents': a single injection can permanently hijack a self-evolving AI agent

It works in two stages: infection and trigger

Why it is persistent: memory revives the attack

Per-session filtering does not stop it

What it means: long-term memory is an attack surface

Wrap-up

Related posts

AI & tech,delivered fastest

AI & tech,
delivered fastest