Sense Less, Infer More: Agentic Multimodal Transformers for Edge Medical Intelligence
Chengwei Zhou, Zhaoyan Jia, Haotian Yu, Xuming Chen, Brandon Lee, Christopher Pulliam, Steve Majerus, Massoud Pedram, and Gourav Datta

TL;DR
This paper presents AMI, a multimodal framework for edge medical monitoring that dynamically selects sensors and skips redundant data to save energy while maintaining high diagnostic accuracy.
Contribution
The paper introduces a novel adaptive multimodal framework with a sensor controller, a delta-sigma sensing module, and a robust transformer model, enabling energy-efficient and accurate edge medical inference.
Findings
Reduces sensor usage by 48.8% on average.
Improves accuracy by 1.9% over state-of-the-art methods.
Supports dynamic computation for energy savings.
Abstract
Edge-based multimodal medical monitoring requires models that balance diagnostic accuracy with severe energy constraints. Continuous acquisition of ECG, PPG, EMG, and IMU streams rapidly drains wearable batteries, often limiting operation to under 10 hours, while existing systems overlook the high temporal redundancy present in physiological signals. We introduce Adaptive Multimodal Intelligence (AMI), an end-to-end framework that jointly learns when to sense and how to infer. AMI integrates three components: (1) a lightweight Agentic Modality Controller that uses differentiable Gumbel-Sigmoid gating to dynamically select active sensors based on model confidence and task relevance; (2) a Learned Sigma-Delta Sensing module that applies patch-wise Delta-Sigma operations with learnable thresholds to skip temporally redundant samples; and (3) a Foundation-backed Multimodal Prediction Model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
