
TL;DR
Linear Diffusion Networks introduce a new architecture that models sequential data as a diffusion process, enabling efficient parallel processing and multi-scale temporal representations, outperforming traditional models on benchmark tasks.
Contribution
The paper proposes a novel diffusion-inspired architecture for sequence modeling that combines adaptive diffusion modules with localized nonlinear updates and attention mechanisms.
Findings
Competitive performance on ImageNet and LRA tasks
Supports full parallelization across time steps
Provides robust multi-scale temporal representations
Abstract
We present Linear Diffusion Networks (LDNs), a novel architecture that reinterprets sequential data processing as a unified diffusion process. Our model integrates adaptive diffusion modules with localized nonlinear updates and a diffusion-inspired attention mechanism. This design enables efficient global information propagation while preserving fine-grained temporal details. LDN overcomes the limitations of conventional recurrent and transformer models by allowing full parallelization across time steps and supporting robust multi-scale temporal representations. Experiments on benchmark sequence modeling tasks demonstrate that LDN delivers competitive performance across ImageNet and LRA tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
MethodsSoftmax · Attention Is All You Need · Diffusion
