Context Tokens are Anchors: Understanding the Repetition Curse in dMLLMs from an Information Flow Perspective

Qiyan Zhao; Xiaofeng Zhang; Shuochen Chang; Qianyu Chen; Xiaosong Yuan; Xuhang Chen; Luoqi Liu; Jiajun Zhang; Xu-Yao Zhang; Da-Han Wang

arXiv:2601.20520·cs.CV·January 29, 2026

Context Tokens are Anchors: Understanding the Repetition Curse in dMLLMs from an Information Flow Perspective

Qiyan Zhao, Xiaofeng Zhang, Shuochen Chang, Qianyu Chen, Xiaosong Yuan, Xuhang Chen, Luoqi Liu, Jiajun Zhang, Xu-Yao Zhang, Da-Han Wang

PDF

Open Access

TL;DR

This paper investigates the causes of repetitive text in diffusion-based multimodal large language models by analyzing information flow, and proposes a method called CoTA to mitigate this issue and improve model performance.

Contribution

The paper introduces an information flow perspective to understand repetition in dMLLMs and presents CoTA, a novel plug-and-play approach to reduce repetition and enhance decoding quality.

Findings

01

Repetition is linked to disruptions in context token information flow.

02

Deeper layers show convergence of context token entropy, indicating prediction certainty.

03

CoTA effectively reduces repetition and improves task performance.

Abstract

Recent diffusion-based Multimodal Large Language Models (dMLLMs) suffer from high inference latency and therefore rely on caching techniques to accelerate decoding. However, the application of cache mechanisms often introduces undesirable repetitive text generation, a phenomenon we term the \textbf{Repeat Curse}. To better investigate underlying mechanism behind this issue, we analyze repetition generation through the lens of information flow. Our work reveals three key findings: (1) context tokens aggregate semantic information as anchors and guide the final predictions; (2) as information propagates across layers, the entropy of context tokens converges in deeper layers, reflecting the model's growing prediction certainty; (3) Repetition is typically linked to disruptions in the information flow of context tokens and to the inability of their entropy to converge in deeper layers.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Topic Modeling · Domain Adaptation and Few-Shot Learning