Additive Multi-Step Markov Chains and the Curse of Dimensionality in Large Language Models
O.V. Usatenko, S.S. Melnyk, and G.M. Pritula

TL;DR
This paper introduces additive multi-step Markov chains as a feasible approximation for modeling the complex, high-dimensional dependencies in large language models, addressing the curse of dimensionality.
Contribution
It establishes a theoretical correspondence between additive N-order Markov chains and chains with step-wise memory, and introduces the concept of information temperature for these models.
Findings
Established a correspondence between additive N-order and step-wise memory chains
Introduced the concept of information temperature for additive Markov chains
Provided a theoretical framework for approximating LLM dynamics
Abstract
Large-scale language models (LLMs) operate in extremely high-dimensional state spaces, where both token embeddings and their hidden representations create complex dependencies that are not easily reduced to classical Markov structures. In this paper, we explore a theoretically feasible approximation of LLM dynamics using N-order additive Markov chains. Such models allow the conditional probability of the next token to be decomposed into a superposition of contributions from multiple historical depths, reducing the combinatorial explosion typically associated with high-order Markov processes. The main result of the work is the establishment of a correspondence between an additive multi-step chain and a chain with a step-wise memory function. This equivalence allowed the introduction of the concept of information temperature not only for stepwise but also for additive N-order Markov…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Language and cultural evolution · Speech Recognition and Synthesis
