Neural Estimation of Pairwise Mutual Information in Masked Discrete Sequence Models
Jai Sharma, Yifan Wang, Bryan Li

TL;DR
This paper introduces a neural method to estimate pairwise mutual information within masked diffusion models, enabling better interpretability and more efficient parallel decoding without sacrificing quality.
Contribution
It presents a novel neural estimator for pairwise mutual information directly from hidden states, facilitating MI-guided parallel decoding in masked diffusion models.
Findings
MI maps recover known structural constraints in Sudoku and protein sequences.
The method achieves 3-5x reduction in inference passes compared to sequential decoding.
It outperforms entropy-based parallelization methods while maintaining generative quality.
Abstract
Understanding dependencies between variables is critical for interpretability and efficient generation in masked diffusion models (MDMs), yet these models primarily expose marginal conditional distributions and do not explicitly represent inter-variable dependence. We propose a neural framework for estimating pairwise conditional mutual information (MI) directly from the hidden states of a pretrained MDM, using ground-truth MI computed from the model's own conditional distributions for supervision. The resulting estimator captures the model's internal belief about dependency structure and predicts the full MI matrix in a single forward pass, enabling MI-guided parallel decoding by identifying conditionally independent subsets of variables. We evaluate our approach on Sudoku and protein sequence generation with ESM-C, where the MI maps recover known structural constraints and enable a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
