Memory as a Markov Matrix: Sample Efficient Knowledge Expansion via Token-to-Dictionary Mapping
Kaustubh Pethkar, Ziyang Xiong, Zuofeng Shang, Yingcong Li

TL;DR
This paper introduces a Markov matrix framework for LLM knowledge expansion, enabling efficient token addition with minimal updates and zero forgetting, validated by theoretical bounds and experiments.
Contribution
It proposes a novel Markov process-based approach for knowledge integration in LLMs, reducing the need for large weight updates and preventing forgetting.
Findings
Sample complexity scales linearly with mapped tokens
Embedding-tuning achieves zero forgetting
Method outperforms traditional parameter-update approaches
Abstract
Continual incorporation of new knowledge is essential for the long-term evolution of large language models (LLMs). Existing approaches typically rely on parameter-update algorithms to mitigate catastrophic forgetting, yet they suffer from fundamental limitations: 1) forgetting is unavoidable as the amount of newly injected knowledge grows; and 2) model updates are often irreversible. As modern LLMs become increasingly expressive, it is natural to question whether large-scale weight updates are necessary for acquiring a small amount of new knowledge. In this work, we propose a principled framework that models autoregressive language generation as a Markov process over tokens, where model memory is represented by a Markov transition matrix. Under this formulation, incorporating new knowledge/tokens corresponds to extending the state space, and preserving existing transitions guarantees…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
