AxMoE: Characterizing the Impact of Approximate Multipliers on Mixture-of-Experts DNN Architectures
Omkar B Shende, Marcello Traiola, Gayathri Ananthanarayanan

TL;DR
This study investigates how approximate multipliers affect the performance of various Mixture-of-Experts DNN architectures, revealing architecture-dependent resilience and recovery capabilities after retraining.
Contribution
First comprehensive analysis of approximate multiplication impacts on MoE DNNs, highlighting architecture-specific behaviors and recovery potential without prior work in this area.
Findings
Dense baseline is most resilient without retraining.
VGG architectures recover at moderate multipliers but not at aggressive ones.
Hard MoE outperforms Dense on ViT-Small under aggressive approximation after retraining.
Abstract
Deep neural network (DNN) inference at the edge demands simultaneous improvements in accuracy, computational efficiency, and energy consumption. Approximate computing and Mixture-of-Experts (MoE) architectures have each been studied as independent routes towards efficient inference, the former by replacing exact arithmetic with low-power approximate multipliers, the latter by routing inputs through specialized expert sub-networks to enable conditional computation. However, their interaction remains entirely unexplored. This paper presents AxMoE, the first study of the impact of approximate multiplication on MoE DNN architectures. We evaluate three MoE variants: Hard MoE, Soft MoE, and Cluster MoE against dense baselines across three CNN architectures (ResNet-20, VGG11_bn, VGG19_bn) on CIFAR-100 and a Vision Transformer (ViT-Small) on Tiny ImageNet-200 dataset, using eight 8-bit signed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
