S2M3: Split-and-Share Multi-Modal Models for Distributed Multi-Task Inference on the Edge

JinYi Yoon; JiHo Lee; Ting He; Nakjung Choi; Bo Ji

arXiv:2508.04271·cs.DC·August 7, 2025

S2M3: Split-and-Share Multi-Modal Models for Distributed Multi-Task Inference on the Edge

JinYi Yoon, JiHo Lee, Ting He, Nakjung Choi, Bo Ji

PDF

TL;DR

S2M3 introduces a split-and-share architecture for multi-modal, multi-task inference on edge devices, significantly reducing resource usage and latency while maintaining accuracy, enabling efficient on-device AI for multi-modal applications.

Contribution

The paper proposes a novel split-and-share multi-modal model architecture with greedy module placement for efficient multi-task inference on edge devices, addressing resource constraints.

Findings

01

Reduces memory usage by up to 62% in multi-task settings.

02

Achieves up to 56.9% reduction in inference latency.

03

Maintains accuracy comparable to cloud AI across multiple benchmarks.

Abstract

With the advancement of Artificial Intelligence (AI) towards multiple modalities (language, vision, speech, etc.), multi-modal models have increasingly been used across various applications (e.g., visual question answering or image generation/captioning). Despite the success of AI as a service for multi-modal applications, it relies heavily on clouds, which are constrained by bandwidth, latency, privacy concerns, and unavailability under network or server failures. While on-device AI becomes popular, supporting multiple tasks on edge devices imposes significant resource challenges. To address this, we introduce S2M3, a split-and-share multi-modal architecture for multi-task inference on edge devices. Inspired by the general-purpose nature of multi-modal models, which are composed of multiple modules (encoder, decoder, classifier, etc.), we propose to split multi-modal models at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.