OxyGen: Unified KV Cache Management for VLA Inference under Multi-Task Parallelism

Xiangyu Li; Huaizhi Tang; Xin Ding; Weijun Wang; Ting Cao; Yunxin Liu

arXiv:2603.14371·cs.RO·May 19, 2026

OxyGen: Unified KV Cache Management for VLA Inference under Multi-Task Parallelism

Xiangyu Li, Huaizhi Tang, Xin Ding, Weijun Wang, Ting Cao, Yunxin Liu

PDF

1 Repo

TL;DR

OxyGen introduces a unified KV cache management system for multi-task VLA inference, significantly improving efficiency and throughput on edge devices by sharing resources across tasks and decoupling decoding processes.

Contribution

It presents a novel inference design that treats KV cache as a shared resource, enabling cross-task sharing and cross-frame batching for efficient multi-task VLA inference.

Findings

01

Achieves up to 3.7× speedup over isolated execution.

02

Delivers over 200 tokens/s language throughput.

03

Maintains 70 Hz action frequency without degrading quality.

Abstract

Embodied AI agents increasingly require parallel execution of multiple tasks, such as manipulation, conversation, and memory construction, from shared observations under distinct time constraints. Recent Mixture-of-Transformers (MoT) Vision-Language-Action Models (VLAs) architecturally support such heterogeneous outputs, yet existing inference systems fail to achieve efficient multi-task parallelism for on-device deployment because of redundant computation and resource contention. We identify isolated KV cache management as the root cause. To address this, we propose unified KV cache management, an inference design that treats the KV cache as a first-class shared resource across tasks and over time. This abstraction enables two key optimizations: cross-task KV sharing eliminates redundant prefill of shared observations, while cross-frame continuous batching decouples variable-length…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

air-embodied-brain/OxyGen
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Ferroelectric and Negative Capacitance Devices