Adaptive Computation Modules: Granular Conditional Computation For Efficient Inference
Bartosz W\'ojcik, Alessio Devoto, Karol Pustelnik, Pasquale Minervini,, Simone Scardapane

TL;DR
This paper introduces the Adaptive Computation Module (ACM), a dynamic approach that adjusts computational effort per token in transformer models, significantly reducing inference costs while maintaining accuracy.
Contribution
The paper presents ACM, a novel module that adaptively allocates computation in transformers, along with a distillation technique to convert pre-trained models into ACM variants.
Findings
Reduces inference costs in vision and speech transformers
Maintains accuracy across various computational budgets
Demonstrates effectiveness of ACM in practical settings
Abstract
While transformer models have been highly successful, they are computationally inefficient. We observe that for each layer, the full width of the layer may be needed only for a small subset of tokens inside a batch and that the "effective" width needed to process a token can vary from layer to layer. Motivated by this observation, we introduce the Adaptive Computation Module (ACM), a generic module that dynamically adapts its computational load to match the estimated difficulty of the input on a per-token basis. An ACM consists of a sequence of learners that progressively refine the output of their preceding counterparts. An additional gating mechanism determines the optimal number of learners to execute for each token. We also propose a distillation technique to replace any pre-trained model with an "ACMized" variant. Our evaluation of transformer models in computer vision and speech…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Multimodal Machine Learning Applications · Complexity and Algorithms in Graphs
