ADMN: A Layer-Wise Adaptive Multimodal Network for Dynamic Input Noise and Compute Resources

Jason Wu; Yuyang Yuan; Kang Yang; Lance Kaplan; Mani Srivastava

arXiv:2502.07862·cs.LG·October 29, 2025

ADMN: A Layer-Wise Adaptive Multimodal Network for Dynamic Input Noise and Compute Resources

Jason Wu, Yuyang Yuan, Kang Yang, Lance Kaplan, Mani Srivastava

PDF

Open Access 1 Video

TL;DR

ADMN is a novel multimodal network that dynamically adjusts its layers and reallocates resources based on input quality and compute constraints, maintaining accuracy while significantly reducing computational costs.

Contribution

It introduces a layer-wise adaptive framework that manages both resource constraints and modality quality variations in multimodal systems.

Findings

01

Reduces up to 75% of floating-point operations compared to state-of-the-art.

02

Maintains comparable accuracy to static models under dynamic conditions.

03

Effectively reallocates layers based on modality quality.

Abstract

Multimodal deep learning systems are deployed in dynamic scenarios due to the robustness afforded by multiple sensing modalities. Nevertheless, they struggle with varying compute resource availability (due to multi-tenancy, device heterogeneity, etc.) and fluctuating quality of inputs (from sensor feed corruption, environmental noise, etc.). Statically provisioned multimodal systems cannot adapt when compute resources change over time, while existing dynamic networks struggle with strict compute budgets. Additionally, both systems often neglect the impact of variations in modality quality. Consequently, modalities suffering substantial corruption may needlessly consume resources better allocated towards other modalities. We propose ADMN, a layer-wise Adaptive Depth Multimodal Network capable of tackling both challenges: it adjusts the total number of active layers across all modalities…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ADMN: A Layer-Wise Adaptive Multimodal Network for Dynamic Input Noise and Compute Resources· slideslive

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Traffic Prediction and Management Techniques