Fractional neural attention for efficient multiscale sequence processing
Cheng Kevin Qu, Andrew Ly, Pulin Gong

TL;DR
This paper introduces Fractional Neural Attention (FNA), a neuroscience-inspired multiscale attention mechanism based on fractional diffusion, enhancing Transformer efficiency and interpretability across NLP, image processing, and translation tasks.
Contribution
FNA models token interactions via Lévy diffusion governed by the fractional Laplacian, offering a novel, biologically grounded approach to multiscale information processing in neural networks.
Findings
FNA achieves competitive text classification with a single layer and head.
FNA improves performance in image processing tasks.
FNA enables dimensionality reduction of weights while preserving structure.
Abstract
Attention mechanisms underpin the computational power of Transformer models, which have achieved remarkable success across diverse domains. Yet understanding and extending the principles underlying self-attention remains a key challenge for advancing artificial intelligence. Drawing inspiration from the multiscale dynamics of biological attention and from dynamical systems theory, we introduce Fractional Neural Attention (FNA), a principled, neuroscience-inspired framework for multiscale information processing. FNA models token interactions through L\'evy diffusion governed by the fractional Laplacian, intrinsically realizing simultaneous short- and long-range dependencies across multiple scales. This mechanism yields greater expressivity and faster information mixing, advancing the foundational capacity of Transformers. Theoretically, we show that FNA's dynamics are governed by the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Neural dynamics and brain function · Neural Networks and Applications
