Memory as a Markov Matrix: Sample Efficient Knowledge Expansion via Token-to-Dictionary Mapping

Kaustubh Pethkar; Ziyang Xiong; Zuofeng Shang; Yingcong Li

arXiv:2605.04308·cs.LG·May 7, 2026

Memory as a Markov Matrix: Sample Efficient Knowledge Expansion via Token-to-Dictionary Mapping

Kaustubh Pethkar, Ziyang Xiong, Zuofeng Shang, Yingcong Li

PDF

TL;DR

This paper introduces a Markov matrix framework for LLM knowledge expansion, enabling efficient token addition with minimal updates and zero forgetting, validated by theoretical bounds and experiments.

Contribution

It proposes a novel Markov process-based approach for knowledge integration in LLMs, reducing the need for large weight updates and preventing forgetting.

Findings

01

Sample complexity scales linearly with mapped tokens

02

Embedding-tuning achieves zero forgetting

03

Method outperforms traditional parameter-update approaches

Abstract

Continual incorporation of new knowledge is essential for the long-term evolution of large language models (LLMs). Existing approaches typically rely on parameter-update algorithms to mitigate catastrophic forgetting, yet they suffer from fundamental limitations: 1) forgetting is unavoidable as the amount of newly injected knowledge grows; and 2) model updates are often irreversible. As modern LLMs become increasingly expressive, it is natural to question whether large-scale weight updates are necessary for acquiring a small amount of new knowledge. In this work, we propose a principled framework that models autoregressive language generation as a Markov process over tokens, where model memory is represented by a Markov transition matrix. Under this formulation, incorporating new knowledge/tokens corresponds to extending the state space, and preserving existing transitions guarantees…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.