EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction
Han Cai, Junyan Li, Muyan Hu, Chuang Gan, Song Han

TL;DR
EfficientViT introduces a multi-scale linear attention mechanism for high-resolution dense prediction, achieving global receptive fields and multi-scale learning with lightweight operations, resulting in significant speedups across various hardware platforms.
Contribution
The paper proposes a novel multi-scale linear attention method for high-resolution vision models, improving efficiency and performance over existing methods without sacrificing accuracy.
Findings
Up to 13.9x GPU latency reduction on Cityscapes
6.4x speedup in super-resolution tasks
48.9x higher throughput on A100 GPU
Abstract
High-resolution dense prediction enables many appealing real-world applications, such as computational photography, autonomous driving, etc. However, the vast computational cost makes deploying state-of-the-art high-resolution dense prediction models on hardware devices difficult. This work presents EfficientViT, a new family of high-resolution vision models with novel multi-scale linear attention. Unlike prior high-resolution dense prediction models that rely on heavy softmax attention, hardware-inefficient large-kernel convolution, or complicated topology structure to obtain good performances, our multi-scale linear attention achieves the global receptive field and multi-scale learning (two desirable features for high-resolution dense prediction) with only lightweight and hardware-efficient operations. As such, EfficientViT delivers remarkable performance gains over previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗timm/efficientvit_b0.r224_in1kmodel· 2.7k dl· ♡ 52.7k dl♡ 5
- 🤗timm/efficientvit_b1.r224_in1kmodel· 2.2k dl2.2k dl
- 🤗timm/efficientvit_b1.r256_in1kmodel· 700 dl700 dl
- 🤗timm/efficientvit_b1.r288_in1kmodel· 1.1k dl1.1k dl
- 🤗timm/efficientvit_b2.r224_in1kmodel· 14k dl14k dl
- 🤗timm/efficientvit_b2.r256_in1kmodel· 544 dl544 dl
- 🤗timm/efficientvit_b2.r288_in1kmodel· 261 dl261 dl
- 🤗timm/efficientvit_b3.r224_in1kmodel· 8.9k dl8.9k dl
- 🤗timm/efficientvit_b3.r256_in1kmodel· 118 dl· ♡ 1118 dl♡ 1
- 🤗timm/efficientvit_b3.r288_in1kmodel· 119 dl119 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Absolute Position Encodings · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Mix-FFN · Position-Wise Feed-Forward Layer · Pointwise Convolution · Batch Normalization
