MMEdge: Accelerating On-device Multimodal Inference via Pipelined Sensing and Encoding

Runxi Huang; Mingxuan Yu; Mingyu Tsoi; Xiaomin Ouyang

arXiv:2510.25327·cs.CV·March 31, 2026

MMEdge: Accelerating On-device Multimodal Inference via Pipelined Sensing and Encoding

Runxi Huang, Mingxuan Yu, Mingyu Tsoi, Xiaomin Ouyang

PDF

1 Repo

TL;DR

MMEdge is a novel on-device multimodal inference framework that employs pipelined sensing and encoding to reduce latency while maintaining accuracy, suitable for resource-constrained edge devices.

Contribution

It introduces a pipelined sensing and encoding approach with temporal aggregation, adaptive configuration, and speculative skipping for efficient multimodal inference.

Findings

01

Significantly reduces end-to-end latency on UAV testbed.

02

Maintains high task accuracy across system and data variations.

03

Demonstrates effectiveness on public multimodal datasets.

Abstract

Real-time multimodal inference on resource-constrained edge devices is essential for applications such as autonomous driving, human-computer interaction, and mobile health. However, prior work often overlooks the tight coupling between sensing dynamics and model execution, as well as the complex inter-modality dependencies. In this paper, we propose MMEdge, a new on-device multimodal inference framework based on pipelined sensing and encoding. Instead of waiting for complete sensor inputs, MMEdge decomposes the entire inference process into a sequence of fine-grained sensing and encoding units, allowing computation to proceed incrementally as data arrive. MMEdge also introduces a lightweight but effective temporal aggregation module that captures rich temporal dynamics across different pipelined units to maintain accuracy performance. Such pipelined design also opens up opportunities…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.