TABES: Trajectory-Aware Backward-on-Entropy Steering for Masked Diffusion Models

Shreshth Saini; Avinab Saha; Balu Adsumilli; Neil Birkbeck; Yilin Wang; Alan C. Bovik

arXiv:2602.00250·cs.LG·February 12, 2026

TABES: Trajectory-Aware Backward-on-Entropy Steering for Masked Diffusion Models

Shreshth Saini, Avinab Saha, Balu Adsumilli, Neil Birkbeck, Yilin Wang, Alan C. Bovik

PDF

Open Access

TL;DR

This paper introduces BoE Steering, a gradient-based inference method for Masked Diffusion Models that approximates long-term decision effects efficiently, improving generation coherence and scalability.

Contribution

It proposes a novel gradient-guided inference framework with the Token Influence Score and ActiveQueryAttention for scalable, trajectory-aware generation in MDMs.

Findings

01

BoE outperforms existing methods on inference scalability.

02

Gradient guidance reduces incoherence in generated outputs.

03

The approach offers a mathematically principled solution for non-autoregressive generation.

Abstract

Masked Diffusion Models (MDMs) have emerged as a promising non-autoregressive paradigm for generative tasks, offering parallel decoding and bidirectional context utilization. However, current sampling methods rely on simple confidence-based heuristics that ignore the long-term impact of local decisions, leading to trajectory lock-in where early hallucinations cascade into global incoherence. While search-based methods mitigate this, they incur prohibitive computational costs ( $O (K)$ forward passes per step). In this work, we propose Backward-on-Entropy (BoE) Steering, a gradient-guided inference framework that approximates infinite-horizon lookahead via a single backward pass. We formally derive the Token Influence Score (TIS) from a first-order expansion of the trajectory cost functional, proving that the gradient of future entropy with respect to input embeddings serves as an optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques