High-Fidelity Differential-information Driven Binary Vision Transformer
Tian Gao, Zhiyuan Zhang, Kaijie Yin, Xu-Cheng Zhong, Hui Kong

TL;DR
This paper introduces DIDB-ViT, a binary vision transformer that uses differential information, frequency decomposition, and an improved activation function to significantly improve performance while maintaining efficiency.
Contribution
The paper presents a novel binary ViT architecture with differential attention, frequency-based similarity preservation, and enhanced activation functions, advancing binary vision transformer performance.
Findings
Outperforms state-of-the-art quantization methods in image classification.
Achieves high accuracy with binary ViT while maintaining computational efficiency.
Improves segmentation performance using the proposed binary ViT model.
Abstract
The binarization of vision transformers (ViTs) offers a promising approach to addressing the trade-off between high computational/storage demands and the constraints of edge-device deployment. However, existing binary ViT methods often suffer from severe performance degradation or rely heavily on full-precision modules. To address these issues, we propose DIDB-ViT, a novel binary ViT that is highly informative while maintaining the original ViT architecture and computational efficiency. Specifically, we design an informative attention module incorporating differential information to mitigate information loss caused by binarization and enhance high-frequency retention. To preserve the fidelity of the similarity calculations between binary Q and K tensors, we apply frequency decomposition using the discrete Haar wavelet and integrate similarities across different frequencies.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · CCD and CMOS Imaging Sensors · Ferroelectric and Negative Capacitance Devices
