ViT-1.58b: Mobile Vision Transformers in the 1-bit Era
Zhengqing Yuan, Rong Zhou, Hongyi Wang, Lifang He, Yanfang Ye, Lichao, Sun

TL;DR
This paper presents ViT-1.58b, a highly quantized Vision Transformer model that drastically reduces memory and computation requirements while maintaining competitive accuracy, enabling more sustainable AI deployment.
Contribution
Introduction of ViT-1.58b, a 1.58-bit quantized ViT model using ternary weights and 8-bit activations, balancing efficiency and accuracy for resource-constrained environments.
Findings
Maintains comparable accuracy to full-precision ViT on CIFAR-10 and ImageNet-1k.
Significantly reduces memory usage and computational costs.
Demonstrates the effectiveness of extreme quantization in practical vision tasks.
Abstract
Vision Transformers (ViTs) have achieved remarkable performance in various image classification tasks by leveraging the attention mechanism to process image patches as tokens. However, the high computational and memory demands of ViTs pose significant challenges for deployment in resource-constrained environments. This paper introduces ViT-1.58b, a novel 1.58-bit quantized ViT model designed to drastically reduce memory and computational overhead while preserving competitive performance. ViT-1.58b employs ternary quantization, which refines the balance between efficiency and accuracy by constraining weights to {-1, 0, 1} and quantizing activations to 8-bit precision. Our approach ensures efficient scaling in terms of both memory and computation. Experiments on CIFAR-10 and ImageNet-1k demonstrate that ViT-1.58b maintains comparable accuracy to full-precision Vit, with significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Infrared Target Detection Methodologies
MethodsSoftmax · Attention Is All You Need
