BHViT: Binarized Hybrid Vision Transformer

Tian Gao; Zhiyuan Zhang; Yu Zhang; Huajun Liu; Kaijie Yin; and Chengzhong Xu; Hui Kong

arXiv:2503.02394·cs.CV·March 7, 2025·2 cites

BHViT: Binarized Hybrid Vision Transformer

Tian Gao, Zhiyuan Zhang, Yu Zhang, Huajun Liu, Kaijie Yin, and Chengzhong Xu, Hui Kong

PDF

Open Access 1 Repo

TL;DR

BHViT introduces a novel binarized hybrid Vision Transformer architecture that effectively combines local information interaction, shift-based modules, and attention binarization to achieve state-of-the-art performance in energy-efficient edge deployment.

Contribution

The paper proposes BHViT, a binarization-friendly hybrid ViT architecture with new modules and techniques to improve performance of binary Vision Transformers.

Findings

01

Achieves state-of-the-art results among binary ViT methods.

02

Effectively reduces computational redundancy in token processing.

03

Enhances binary MLP performance with shift operation modules.

Abstract

Model binarization has made significant progress in enabling real-time and energy-efficient computation for convolutional neural networks (CNN), offering a potential solution to the deployment challenges faced by Vision Transformers (ViTs) on edge devices. However, due to the structural differences between CNN and Transformer architectures, simply applying binary CNN strategies to the ViT models will lead to a significant performance drop. To tackle this challenge, we propose BHViT, a binarization-friendly hybrid ViT architecture and its full binarization model with the guidance of three important observations. Initially, BHViT utilizes the local information interaction and hierarchical feature aggregation technique from coarse to fine levels to address redundant computations stemming from excessive tokens. Then, a novel module based on shift operations is proposed to enhance the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IMRL/BHViT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices

MethodsAttention Is All You Need · Byte Pair Encoding · Dense Connections · Residual Connection · Absolute Position Encodings · Linear Layer · Layer Normalization · Label Smoothing · Multi-Head Attention · Position-Wise Feed-Forward Layer