AttentionBender: Manipulating Cross-Attention in Video Diffusion Transformers as a Creative Probe
Adam Cole, Mick Grierson

TL;DR
AttentionBender is a tool that manipulates cross-attention in Video Diffusion Transformers to help artists understand and creatively control the generative process, revealing complex entanglements and enabling novel visual effects.
Contribution
It introduces a novel method for visualizing and controlling cross-attention in video diffusion models, combining explainability with artistic exploration.
Findings
Cross-attention manipulation often causes distributed distortions and glitches.
Targeted edits in attention maps are highly entangled and resist localized control.
The tool enables both understanding of transformer mechanisms and creative aesthetic generation.
Abstract
We present AttentionBender, a tool that manipulates cross-attention in Video Diffusion Transformers to help artists probe the internal mechanics of black-box video generation. While generative outputs are increasingly realistic, prompt-only control limits artists' ability to build intuition for the model's material process or to work beyond its default tendencies. Using an autobiographical research-through-design approach, we built on Network Bending to design AttentionBender, which applies 2D transforms (rotation, scaling, translation, etc.) to cross-attention maps to modulate generation. We assess AttentionBender by visualizing 4,500+ video generations across prompts, operations, and layer targets. Our results suggest that cross-attention is highly entangled: targeted manipulations often resist clean, localized control, producing distributed distortions and glitch aesthetics over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
