ViG: Linear-complexity Visual Sequence Learning with Gated Linear   Attention

Bencheng Liao; Xinggang Wang; Lianghui Zhu; Qian Zhang; Chang Huang

arXiv:2405.18425·cs.CV·May 30, 2024

ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention

Bencheng Liao, Xinggang Wang, Lianghui Zhu, Qian Zhang, Chang Huang

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

ViG introduces Gated Linear Attention to improve the efficiency and speed of vision models, achieving high accuracy with fewer parameters and FLOPs, and faster runtime on various image resolutions.

Contribution

The paper proposes Gated Linear Attention and a hardware-aware implementation for vision models, significantly enhancing speed and efficiency while maintaining high accuracy.

Findings

01

ViG-S matches DeiT-B accuracy with 73% fewer parameters.

02

ViG-T achieves 20.7% top-1 accuracy at 1024x1024 resolution, outperforming DeiT-T.

03

Model runs 2x faster on 224x224 images and 4.8x faster at higher resolutions.

Abstract

Recently, linear complexity sequence modeling networks have achieved modeling capabilities similar to Vision Transformers on a variety of computer vision tasks, while using fewer FLOPs and less memory. However, their advantage in terms of actual runtime speed is not significant. To address this issue, we introduce Gated Linear Attention (GLA) for vision, leveraging its superior hardware-awareness and efficiency. We propose direction-wise gating to capture 1D global context through bidirectional modeling and a 2D gating locality injection to adaptively inject 2D local details into 1D global context. Our hardware-aware implementation further merges forward and backward scanning into a single kernel, enhancing parallelism and reducing memory cost and latency. The proposed model, ViG, offers a favorable trade-off in accuracy, parameters, and FLOPs on ImageNet and downstream tasks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hustvl/vig
pytorchOfficial

Models

🤗
hustvl/ViG
model· ♡ 1
♡ 1

Videos

ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention· underline

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections