ViT-1.58b: Mobile Vision Transformers in the 1-bit Era

Zhengqing Yuan; Rong Zhou; Hongyi Wang; Lifang He; Yanfang Ye; Lichao; Sun

arXiv:2406.18051·cs.CV·June 27, 2024

ViT-1.58b: Mobile Vision Transformers in the 1-bit Era

Zhengqing Yuan, Rong Zhou, Hongyi Wang, Lifang He, Yanfang Ye, Lichao, Sun

PDF

Open Access 1 Repo

TL;DR

This paper presents ViT-1.58b, a highly quantized Vision Transformer model that drastically reduces memory and computation requirements while maintaining competitive accuracy, enabling more sustainable AI deployment.

Contribution

Introduction of ViT-1.58b, a 1.58-bit quantized ViT model using ternary weights and 8-bit activations, balancing efficiency and accuracy for resource-constrained environments.

Findings

01

Maintains comparable accuracy to full-precision ViT on CIFAR-10 and ImageNet-1k.

02

Significantly reduces memory usage and computational costs.

03

Demonstrates the effectiveness of extreme quantization in practical vision tasks.

Abstract

Vision Transformers (ViTs) have achieved remarkable performance in various image classification tasks by leveraging the attention mechanism to process image patches as tokens. However, the high computational and memory demands of ViTs pose significant challenges for deployment in resource-constrained environments. This paper introduces ViT-1.58b, a novel 1.58-bit quantized ViT model designed to drastically reduce memory and computational overhead while preserving competitive performance. ViT-1.58b employs ternary quantization, which refines the balance between efficiency and accuracy by constraining weights to {-1, 0, 1} and quantizing activations to 8-bit precision. Our approach ensures efficient scaling in terms of both memory and computation. Experiments on CIFAR-10 and ImageNet-1k demonstrate that ViT-1.58b maintains comparable accuracy to full-precision Vit, with significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dlyuangod/vit-1.58b
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Infrared Target Detection Methodologies

MethodsSoftmax · Attention Is All You Need