When Attention Collapses: Residual Evidence Modeling for Compositional Inference
Niklas Houba

TL;DR
This paper introduces residual evidence modeling with evidence depletion to prevent slot collapse in attention-based compositional inference, improving the separation of latent components in complex observations.
Contribution
The paper proposes a novel residual evidence modeling technique that addresses the structural failure mode of slot collapse in attention models under additive superposition.
Findings
Evidence depletion reduces slot collapse by up to an order of magnitude.
The method generalizes beyond synthetic benchmarks to real-world audio and gravitational-wave data.
Standard attention fails in additive superposition scenarios, while evidence depletion succeeds.
Abstract
Compositional inference - the decomposition of observations into an unknown number of latent components - is central to perception and scientific data analysis. Attention-based models perform well when components are approximately separable, as in object-centric vision. Under additive superposition, however - where multiple components contribute to every observation - we identify a structural failure mode we term slot collapse: multiple slots converge to the same dominant component while weaker ones remain unrepresented. We trace this to a general limitation: attention is memoryless with respect to explained evidence. All slots repeatedly operate on the same input without accounting for what has already been explained, so gradients are dominated by the strongest component, inducing shared fixed points across slots. As a result, attention fails to enforce non-redundant allocation under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
