Neural Garbage Collection: Learning to Forget while Learning to Reason
Michael Y. Li, Jubayer Ibn Hamid, Emily B. Fox, Noah D. Goodman

TL;DR
Neural Garbage Collection enables language models to learn to forget irrelevant information during reasoning, optimizing memory management end-to-end with reinforcement learning, improving efficiency without sacrificing accuracy.
Contribution
Introduces Neural Garbage Collection, a method for models to learn cache eviction policies end-to-end from reward signals, reducing memory use during reasoning.
Findings
Maintains strong accuracy with 2-3x cache compression.
Outperforms baseline eviction strategies.
Operates solely from outcome-based rewards without supervision.
Abstract
Chain-of-thought reasoning has driven striking advances in language model capability, yet every reasoning step grows the KV cache, creating a bottleneck to scaling this paradigm further. Current approaches manage these constraints on the model's behalf using hand-designed criteria. A more scalable approach would let end-to-end learning subsume this design choice entirely, following a broader pattern in deep learning. After all, if a model can learn to reason, why can't it learn to forget? We introduce Neural Garbage Collection (NGC), in which a language model learns to forget while learning to reason, trained end-to-end from outcome-based task reward alone. As the model reasons, it periodically pauses, decides which KV cache entries to evict, and continues to reason conditioned on the remaining cache. By treating tokens in a chain-of-thought and cache-eviction decisions as discrete…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
