RAPID: Redundancy-Aware and Compatibility-Optimal Edge-Cloud Partitioned Inference for Diverse VLA Models

Zihao Zheng; Sicheng Tian; Hangyu Cao; Chenyue Li; Jiayu Chen; Maoliang Li; Xinhao Sun; Hailong Zou; Guojie Luo; Xiang Chen

arXiv:2603.07949·cs.DC·March 13, 2026

RAPID: Redundancy-Aware and Compatibility-Optimal Edge-Cloud Partitioned Inference for Diverse VLA Models

Zihao Zheng, Sicheng Tian, Hangyu Cao, Chenyue Li, Jiayu Chen, Maoliang Li, Xinhao Sun, Hailong Zou, Guojie Luo, Xiang Chen

PDF

Open Access

TL;DR

RAPID is a novel edge-cloud inference framework designed for Vision Language Action models, reducing inference costs and improving robustness by addressing noise interference and redundancy, achieving up to 1.73x speedup with minimal overhead.

Contribution

The paper introduces RAPID, a new ECC inference framework that optimally handles redundancy and noise in VLA models for embodied intelligence.

Findings

01

Achieves up to 1.73x speedup in inference

02

Maintains only 5-7% overhead

03

Effectively handles visual noise and redundancy

Abstract

Vision Language Action (VLA) models are mainstream in embodied intelligence but face high inference costs. Edge-Cloud Collaborative (ECC) inference offers an effective fix by easing edge-device computing pressure to meet real-time needs. However, existing ECC frameworks are suboptimal for VLA models due to two challenges: (1) Mainstream environment-oriented edge-cloud partitioning methods are susceptible to interference from visual noise; (2) Existing edge-cloud partitioning methods overlook the step-wise redundancy unique to embodied tasks, thereby disrupting the physical continuity of motion. To address these issues, we propose a novel ECC inference framework, termed RAPID. Specifically, we developed an implementation tailored to the proposed framework. Experiments demonstrate this achieves a speedup of up to 1.73x with only 5%~7% overhead.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · IoT and Edge/Fog Computing