Sense Less, Infer More: Agentic Multimodal Transformers for Edge Medical Intelligence

Chengwei Zhou; Zhaoyan Jia; Haotian Yu; Xuming Chen; Brandon Lee; Christopher Pulliam; Steve Majerus; Massoud Pedram; and Gourav Datta

arXiv:2604.10404·cs.ET·April 14, 2026

Sense Less, Infer More: Agentic Multimodal Transformers for Edge Medical Intelligence

Chengwei Zhou, Zhaoyan Jia, Haotian Yu, Xuming Chen, Brandon Lee, Christopher Pulliam, Steve Majerus, Massoud Pedram, and Gourav Datta

PDF

TL;DR

This paper presents AMI, a multimodal framework for edge medical monitoring that dynamically selects sensors and skips redundant data to save energy while maintaining high diagnostic accuracy.

Contribution

The paper introduces a novel adaptive multimodal framework with a sensor controller, a delta-sigma sensing module, and a robust transformer model, enabling energy-efficient and accurate edge medical inference.

Findings

01

Reduces sensor usage by 48.8% on average.

02

Improves accuracy by 1.9% over state-of-the-art methods.

03

Supports dynamic computation for energy savings.

Abstract

Edge-based multimodal medical monitoring requires models that balance diagnostic accuracy with severe energy constraints. Continuous acquisition of ECG, PPG, EMG, and IMU streams rapidly drains wearable batteries, often limiting operation to under 10 hours, while existing systems overlook the high temporal redundancy present in physiological signals. We introduce Adaptive Multimodal Intelligence (AMI), an end-to-end framework that jointly learns when to sense and how to infer. AMI integrates three components: (1) a lightweight Agentic Modality Controller that uses differentiable Gumbel-Sigmoid gating to dynamically select active sensors based on model confidence and task relevance; (2) a Learned Sigma-Delta Sensing module that applies patch-wise Delta-Sigma operations with learnable thresholds to skip temporally redundant samples; and (3) a Foundation-backed Multimodal Prediction Model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.