BinaryViT: Pushing Binary Vision Transformers Towards Convolutional   Models

Phuoc-Hoan Charles Le; Xinlin Li

arXiv:2306.16678·cs.CV·July 4, 2023·2 cites

BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models

Phuoc-Hoan Charles Le, Xinlin Li

PDF

Open Access 1 Repo

TL;DR

BinaryViT introduces architectural modifications inspired by CNNs to enhance the performance of binary vision transformers, achieving competitive results on ImageNet-1k without convolutions.

Contribution

The paper proposes BinaryViT, a novel binary vision transformer architecture that incorporates CNN-inspired operations to improve binary ViT performance without using convolutions.

Findings

01

BinaryViT achieves competitive accuracy with state-of-the-art binary CNNs.

02

Architectural modifications significantly improve binary ViT representational capacity.

03

BinaryViT reduces computational cost while maintaining high performance.

Abstract

With the increasing popularity and the increasing size of vision transformers (ViTs), there has been an increasing interest in making them more efficient and less computationally costly for deployment on edge devices with limited computing resources. Binarization can be used to help reduce the size of ViT models and their computational cost significantly, using popcount operations when the weights and the activations are in binary. However, ViTs suffer a larger performance drop when directly applying convolutional neural network (CNN) binarization methods or existing binarization methods to binarize ViTs compared to CNNs on datasets with a large number of classes such as ImageNet-1k. With extensive analysis, we find that binary vanilla ViTs such as DeiT miss out on a lot of key architectural properties that CNNs have that allow binary CNNs to have much higher representational capability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

phuoc-hoan-le/binaryvit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Infrared Target Detection Methodologies

MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention · Dense Connections · Dropout · Feedforward Network · Average Pooling · Attention Dropout · Data-efficient Image Transformer