SympFormer: Accelerated attention blocks via Inertial Dynamics on Density Manifolds
Viktor Stein, Wuchen Li, Gabriele Steidl

TL;DR
This paper introduces accelerated attention blocks inspired by inertial dynamics on density manifolds, leading to faster convergence in transformer models while maintaining computational efficiency.
Contribution
It extends the particle system interpretation of attention by incorporating inertial Nesterov-type dynamics, resulting in Hamiltonian momentum attention blocks with improved convergence.
Findings
Accelerated attention blocks converge faster than classical ones.
The proposed method preserves elliptically contoured distributions.
Particle-based algorithms demonstrate improved efficiency.
Abstract
Transformers owe much of their empirical success in natural language processing to the self-attention blocks. Recent perspectives interpret attention blocks as interacting particle systems, whose mean-field limits correspond to gradient flows of interaction energy functionals on probability density spaces equipped with Wasserstein--type metrics. We extend this viewpoint by introducing accelerated attention blocks derived from inertial Nesterov-type dynamics on density spaces. In our proposed architecture, tokens carry both spatial (feature) and velocity variables. The time discretization and the approximation of accelerated density dynamics yield Hamiltonian momentum attention blocks, which constitute the proposed accelerated attention architectures. In particular, for linear self-attention, we show that the attention blocks approximate a Stein variational gradient flow, using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Stochastic Gradient Optimization Techniques · Advanced Neural Network Applications
