TABES: Trajectory-Aware Backward-on-Entropy Steering for Masked Diffusion Models
Shreshth Saini, Avinab Saha, Balu Adsumilli, Neil Birkbeck, Yilin Wang, Alan C. Bovik

TL;DR
This paper introduces BoE Steering, a gradient-based inference method for Masked Diffusion Models that approximates long-term decision effects efficiently, improving generation coherence and scalability.
Contribution
It proposes a novel gradient-guided inference framework with the Token Influence Score and ActiveQueryAttention for scalable, trajectory-aware generation in MDMs.
Findings
BoE outperforms existing methods on inference scalability.
Gradient guidance reduces incoherence in generated outputs.
The approach offers a mathematically principled solution for non-autoregressive generation.
Abstract
Masked Diffusion Models (MDMs) have emerged as a promising non-autoregressive paradigm for generative tasks, offering parallel decoding and bidirectional context utilization. However, current sampling methods rely on simple confidence-based heuristics that ignore the long-term impact of local decisions, leading to trajectory lock-in where early hallucinations cascade into global incoherence. While search-based methods mitigate this, they incur prohibitive computational costs ( forward passes per step). In this work, we propose Backward-on-Entropy (BoE) Steering, a gradient-guided inference framework that approximates infinite-horizon lookahead via a single backward pass. We formally derive the Token Influence Score (TIS) from a first-order expansion of the trajectory cost functional, proving that the gradient of future entropy with respect to input embeddings serves as an optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques
