Panini: Continual Learning in Token Space via Structured Memory
Shreyas Rajesh, Pavan Holur, Mehmet Yigit Turali, Chenda Duan, Vwani Roychowdhury

TL;DR
Panini introduces a novel continual learning framework using structured semantic memory called GSW, enabling language models to reason over accumulated experiences efficiently and accurately without retraining or chunk retrieval.
Contribution
The paper proposes GSW, a new structured memory representation for continual learning in language models, improving reasoning efficiency and accuracy over previous retrieval-based methods.
Findings
Achieves 5-7% higher performance on QA benchmarks.
Uses 2-30x fewer answer-context tokens.
Reduces unsupported answers on unanswerable queries.
Abstract
Language models are increasingly used to reason over content they were not trained on, such as new documents, evolving knowledge, and user-specific data. A common approach is retrieval-augmented generation (RAG), which stores verbatim documents externally (as chunks) and retrieves only a relevant subset at inference time for an LLM to reason over. However, this results in inefficient usage of test-time compute (LLM repeatedly reasons over the same documents); moreover, chunk retrieval can inject irrelevant context that increases unsupported generation. We propose a human-like non-parametric continual learning framework, where the base model remains fixed, and learning occurs by integrating each new experience into an external semantic memory state that accumulates and consolidates itself continually. We present Panini, which realizes this by representing documents as Generative Semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
