TL;DR
Re-ttention introduces a highly sparse attention mechanism for visual generation that leverages temporal redundancy to significantly reduce computational complexity while maintaining high visual quality.
Contribution
The paper presents Re-ttention, a novel sparse attention method that reshapes attention scores based on prior distributions to preserve quality at extreme sparsity levels.
Findings
Re-ttention achieves as low as 3.1% token usage during inference.
Re-ttention outperforms existing sparse attention methods in visual quality.
The approach maintains high-quality visual generation with reduced computational cost.
Abstract
Diffusion Transformers (DiT) have become the de-facto model for generating high-quality visual content like videos and images. A huge bottleneck is the attention mechanism where complexity scales quadratically with resolution and video length. One logical way to lessen this burden is sparse attention, where only a subset of tokens or patches are included in the calculation. However, existing techniques fail to preserve visual quality at extremely high sparsity levels and might even incur non-negligible compute overheads. To address this concern, we propose Re-ttention, which implements very high sparse attention for visual generation models by leveraging the temporal redundancy of Diffusion Models to overcome the probabilistic normalization shift within the attention mechanism. Specifically, Re-ttention reshapes attention scores based on the prior softmax distribution history in order…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
MethodsAttention Is All You Need · Softmax · Diffusion
