EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense   Prediction

Han Cai; Junyan Li; Muyan Hu; Chuang Gan; Song Han

arXiv:2205.14756·cs.CV·February 7, 2024·35 cites

EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction

Han Cai, Junyan Li, Muyan Hu, Chuang Gan, Song Han

PDF

Open Access 5 Repos 10 Models

TL;DR

EfficientViT introduces a multi-scale linear attention mechanism for high-resolution dense prediction, achieving global receptive fields and multi-scale learning with lightweight operations, resulting in significant speedups across various hardware platforms.

Contribution

The paper proposes a novel multi-scale linear attention method for high-resolution vision models, improving efficiency and performance over existing methods without sacrificing accuracy.

Findings

01

Up to 13.9x GPU latency reduction on Cityscapes

02

6.4x speedup in super-resolution tasks

03

48.9x higher throughput on A100 GPU

Abstract

High-resolution dense prediction enables many appealing real-world applications, such as computational photography, autonomous driving, etc. However, the vast computational cost makes deploying state-of-the-art high-resolution dense prediction models on hardware devices difficult. This work presents EfficientViT, a new family of high-resolution vision models with novel multi-scale linear attention. Unlike prior high-resolution dense prediction models that rely on heavy softmax attention, hardware-inefficient large-kernel convolution, or complicated topology structure to obtain good performances, our multi-scale linear attention achieves the global receptive field and multi-scale learning (two desirable features for high-resolution dense prediction) with only lightweight and hardware-efficient operations. As such, EfficientViT delivers remarkable performance gains over previous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Absolute Position Encodings · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Mix-FFN · Position-Wise Feed-Forward Layer · Pointwise Convolution · Batch Normalization