Linear-Time Global Visual Modeling without Explicit Attention

Ruize He; Dongchen Han; Gao Huang

arXiv:2605.01711·cs.CV·May 7, 2026

Linear-Time Global Visual Modeling without Explicit Attention

Ruize He, Dongchen Han, Gao Huang

PDF

1 Repo

TL;DR

This paper proposes a novel approach to global sequence modeling that replaces explicit attention with dynamically predicted parameters, achieving Transformer-level performance with linear complexity.

Contribution

It introduces a dynamic parameterization method that models global context without explicit attention, enabling efficient linear-time sequence modeling.

Findings

01

Dynamic parameterization can replace explicit attention in vision models.

02

The proposed method achieves comparable performance to Transformers.

03

Code is available at https://github.com/LeapLabTHU/WeightFormer.

Abstract

Existing research largely attributes the global sequence modeling capability of Transformers to the explicit computation of attention weights, a process that inherently incurs quadratic computational complexity. In this work, we offer a novel perspective: we demonstrate that attention can be mathematically reframed as a Multi-Layer Perceptron (MLP) equipped with dynamically predicted parameters. Through this lens, we explain attention's global modeling power not as explicit token-wise aggregation, but as an implicit process where dynamically generated parameters act as a compressed representation of the global context. Inspired by this insight, we investigate a fundamental question: can we achieve Transformer-level sequence global modeling entirely through dynamic parameterization while maintaining linear complexity, effectively replacing explicit attention? To explore this, we design…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LeapLabTHU/WeightFormer
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.