SHARCS: Efficient Transformers through Routing with Dynamic Width Sub-networks
Mohammadreza Salehi, Sachin Mehta, Aditya Kusupati, Ali Farhadi,, Hannaneh Hajishirzi

TL;DR
SHARCS is a method for adaptive inference in transformers that dynamically routes samples to sub-networks of different widths, improving efficiency and accuracy across various tasks and architectures.
Contribution
It introduces a trainable router for transformers that enables dynamic sub-network selection based on input difficulty, enhancing efficiency and performance.
Findings
SHARCS outperforms existing adaptive inference methods in accuracy vs. FLOPs.
It generalizes across different transformer architectures and compressed models.
SHARCS achieves approximately 2x inference speedup with minimal accuracy loss.
Abstract
We introduce SHARCS for adaptive inference that takes into account the hardness of input samples. SHARCS can train a router on any transformer network, enabling the model to direct different samples to sub-networks with varying widths. Our experiments demonstrate that: (1) SHARCS outperforms or complements existing per-sample adaptive inference methods across various classification tasks in terms of accuracy vs. FLOPs; (2) SHARCS generalizes across different architectures and can be even applied to compressed and efficient transformer encoders to further improve their efficiency; (3) SHARCS can provide a 2 times inference speed up at an insignificant drop in accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Neural Networks and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
