ADMN: A Layer-Wise Adaptive Multimodal Network for Dynamic Input Noise and Compute Resources
Jason Wu, Yuyang Yuan, Kang Yang, Lance Kaplan, Mani Srivastava

TL;DR
ADMN is a novel multimodal network that dynamically adjusts its layers and reallocates resources based on input quality and compute constraints, maintaining accuracy while significantly reducing computational costs.
Contribution
It introduces a layer-wise adaptive framework that manages both resource constraints and modality quality variations in multimodal systems.
Findings
Reduces up to 75% of floating-point operations compared to state-of-the-art.
Maintains comparable accuracy to static models under dynamic conditions.
Effectively reallocates layers based on modality quality.
Abstract
Multimodal deep learning systems are deployed in dynamic scenarios due to the robustness afforded by multiple sensing modalities. Nevertheless, they struggle with varying compute resource availability (due to multi-tenancy, device heterogeneity, etc.) and fluctuating quality of inputs (from sensor feed corruption, environmental noise, etc.). Statically provisioned multimodal systems cannot adapt when compute resources change over time, while existing dynamic networks struggle with strict compute budgets. Additionally, both systems often neglect the impact of variations in modality quality. Consequently, modalities suffering substantial corruption may needlessly consume resources better allocated towards other modalities. We propose ADMN, a layer-wise Adaptive Depth Multimodal Network capable of tackling both challenges: it adjusts the total number of active layers across all modalities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Traffic Prediction and Management Techniques
