Markovian Generation Chains in Large Language Models

Mingmeng Geng; Amr Mohamed; Guokan Shang; Michalis Vazirgiannis; Thierry Poibeau

arXiv:2603.11228·cs.CL·March 13, 2026

Markovian Generation Chains in Large Language Models

Mingmeng Geng, Amr Mohamed, Guokan Shang, Michalis Vazirgiannis, Thierry Poibeau

PDF

Open Access 3 Reviews

TL;DR

This paper models the iterative text generation process of large language models as Markov chains, analyzing how repeated inference affects sentence diversity and convergence, with implications for multi-agent LLM systems.

Contribution

It introduces a Markovian framework for understanding iterative LLM outputs and analyzes how parameters influence diversity and convergence.

Findings

01

Iterative process can lead to convergence or ongoing novelty.

02

Temperature and initial input significantly affect diversity.

03

Markov chain analysis provides insights into LLM dynamics.

Abstract

The widespread use of large language models (LLMs) raises an important question: how do texts evolve when they are repeatedly processed by LLMs? In this paper, we define this iterative inference process as Markovian generation chains, where each step takes a specific prompt template and the previous output as input, without including any prior memory. In iterative rephrasing and round-trip translation experiments, the output either converges to a small recurrent set or continues to produce novel sentences over a finite horizon. Through sentence-level Markov chain modeling and analysis of simulated data, we show that iterative process can either increase or reduce sentence diversity depending on factors such as the temperature parameter and the initial input sentence. These results offer valuable insights into the dynamics of iterative LLM inference and their implications for multi-agent…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

The paper carries out extensive analysis of how the paraphrases (when generated repeatedly by prompting the model on previous outputs) compare with each other, such as number of unique paraphrases and their textual similarity measured via automated metrics like BLEU and ROUGE, while using multiple LLMs. However the purpose of this analysis remains unclear to me.

Weaknesses

As mentioned in the summary section

Reviewer 02Rating 6Confidence 2

Strengths

1. **Novel and Relevant Problem:** The paper addresses a highly practical and under-studied question. While model collapse (iterative training) is well-researched, this focus on iterative inference chains mimics real-world scenarios where users repeatedly edit, translate, or rephrase content using LLMs. 2. **Strong Empirical Evidence:** The simulations are thorough, covering multiple models, different domains, and the two distinct tasks. The inclusion of Google Translate is also a particularly

Weaknesses

1. **"Diversity" vs. "Factual Drift"**: The paper primarily measures diversity as the "number of unique rephrasings". However, the provided example in Table 2 (Appendix) shows the Llama-3.1 model's output not just rephrasing "We begin with a prologue" but progressively adding new, unprompted information. The paper notes this as "information in the sentences having changed", but doesn't critically analyze this as a potential failure of semantic preservation (which the prompt explicitly requested

Reviewer 03Rating 2Confidence 4

Strengths

1. Clear presentation and solid structure — the manuscript is well written and easy to follow. 2. Systematic empirical setup — experiments cover multiple datasets (BookSum, ScriptBase, News2024) and several open models (Llama-3.1-8B, Mistral-7B, Qwen-2.5-7B, GPT-4o-mini). 3. Sound use of Markov-chain formalism — modeling iterative generation as a Markov process is mathematically reasonable and consistent with existing literature (e.g., Zekri et al., 2024).

Weaknesses

1. Lack of novelty. The main idea—treating LLM iterative inference as a Markov process—has already appeared in prior studies such as Zekri et al. (2024) “Large Language Models as Markov Chains” and other works. The theoretical framing and entropy analysis largely restate standard Markov-chain properties without introducing new modeling insights or learning mechanisms. 2. Conclusions are largely descriptive/common-sense. The observed divergence under sampling and convergence under greedy decodin

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Language and cultural evolution