ButterflyViT: 354$\times$ Expert Compression for Edge Vision Transformers
Aryan Karmore

TL;DR
ButterflyViT introduces a novel expert compression method for Vision Transformers that significantly reduces memory requirements by geometrically reorienting shared parameters, enabling edge deployment with minimal accuracy loss.
Contribution
The paper proposes ButterflyViT, a geometric expert compression technique that achieves sub-linear memory scaling for Vision Transformers, addressing the linear memory bottleneck on edge devices.
Findings
Achieves 354× memory reduction on CIFAR-100 with 64 experts.
Maintains negligible accuracy loss despite significant compression.
Enables multiple experts to operate on memory-constrained edge devices.
Abstract
Deploying sparse Mixture of Experts(MoE) Vision Transformers remains a challenge due to linear expert memory scaling. Linear memory scaling stores independent expert weight matrices requiring memory, which exceeds edge devices memory budget. Current compression methods like quantization, pruning and low-rank factorization reduce constant factors but leave the scaling bottleneck unresolved. We introduce ButterflyViT, a method that treats experts not as independent weight matrices but as geometric reorientations of a unified shared quantized substrate. Diversity among experts arises from viewing different angles of shared capacity, not from redundant storage. By applying learned rotations to a shared ternary prototype, each expert yields memory which is sub-linear in the number…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing · Stochastic Gradient Optimization Techniques · Ferroelectric and Negative Capacitance Devices
