Trimming Down Large Spiking Vision Transformers via Heterogeneous   Quantization Search

Boxun Xu; Yufei Song; Peng Li

arXiv:2412.05505·cs.NE·December 10, 2024

Trimming Down Large Spiking Vision Transformers via Heterogeneous Quantization Search

Boxun Xu, Yufei Song, Peng Li

PDF

Open Access

TL;DR

This paper introduces a layer-wise heterogeneous quantization method for compressing large spiking vision transformers, significantly reducing energy consumption and model size while maintaining high accuracy on multiple datasets.

Contribution

It proposes a novel mixed-quantization scheme for spiking transformers that balances compression and performance, enabling deployment on resource-constrained devices.

Findings

01

Achieves 8.71x-10.19x model compression with less than 1% accuracy loss.

02

Reduces energy consumption by up to 10.2x on target datasets.

03

Maintains high accuracy levels with an average effective resolution of 3.14-3.67 bits.

Abstract

Spiking Neural Networks (SNNs) are amenable to deployment on edge devices and neuromorphic hardware due to their lower dissipation. Recently, SNN-based transformers have garnered significant interest, incorporating attention mechanisms akin to their counterparts in Artificial Neural Networks (ANNs) while demonstrating excellent performance. However, deploying large spiking transformer models on resource-constrained edge devices such as mobile phones, still poses significant challenges resulted from the high computational demands of large uncompressed high-precision models. In this work, we introduce a novel heterogeneous quantization method for compressing spiking transformers through layer-wise quantization. Our approach optimizes the quantization of each layer using one of two distinct quantization schemes, i.e., uniform or power-of-two quantification, with mixed bit resolutions. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · CCD and CMOS Imaging Sensors · Neural Networks and Applications

MethodsSoftmax · Attention Is All You Need