Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation

King's College London, The Alan Turing Institute
Arxiv 02.2026

Abstract

Agent memory systems often adopt the standard Retrieval-Augmented Generation (RAG) pipeline, yet its underlying assumptions differ in this setting. RAG targets large, heterogeneous corpora where retrieved passages are diverse, whereas agent memory is a bounded, coherent dialogue stream with highly correlated spans that are often duplicates. Under this shift, fixed top-k similarity retrieval tends to return redundant context, and post-hoc pruning can delete temporally linked prerequisites needed for correct reasoning. We argue retrieval should move beyond similarity matching and instead operate over latent components, following decoupling to aggregation: disentangle memories into semantic components, organise them into a hierarchy, and use this structure to drive retrieval. We propose xMemory, which builds a hierarchy of intact units and maintains a searchable yet faithful high-level node organisation via a sparsity–semantics objective that guides memory split and merge. At inference, xMemory retrieves top-down, selecting a compact, diverse set of themes and semantics for multi-fact queries, and expanding to episodes and raw messages only when it reduces the reader’s uncertainty. Experiments on LoCoMo and PerLTQA across the three latest LLMs show consistent gains in answer quality and token efficiency.

An overview of our method

From similarity top-k to structured retrieval for agent memory. Agent memory forms a coherent and highly correlated stream, where many spans are near duplicates; similarity top-kretrieval can therefore collapse and retrieve redundant chunks. xMemory organises memories into a hierarchy of intact units and performs structure-aware retrieval to produce a shorter but more answer-sufficient context.

An overview of our method

Overview of xMemory. xMemory couples memory structuring with top-down retrieval to address the mismatch between agent memory and the RAG pipeline. It organises a coherent stream into a hierarchy that disentangles episodic traces into semantic components while preserving intact units. A sparsity–semantics objective guides split and merge to keep the high-level organisation searchable and faithful. At retrieval, xMemory selects a diverse set of relevant themes and semantics to support aggregation reasoning, then expands to episodes and raw messages only when they decrease the reader’s uncertainty, yielding a shorter context with stronger evidence coverage.

Video and Poster are coming soon!

Main Experiment LOCOMO

An overview of our method

Main Experiment PERLTQA

An overview of our method

BibTeX

@article{hu2026beyond,
  title={Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation},
  author={Hu, Zhanghao and Zhu, Qinglin and Yan, Hanqi and He, Yulan and Gui, Lin},
  journal={arXiv preprint arXiv:2602.02007},
  year={2026}
}