Token Expand-Merge: Training-Free Token Compression for Vision-Language-Action Models

Yifan Ye; Jiaqi Ma; Jun Cen; Zhihe Lu

arXiv:2512.09927·cs.RO·December 11, 2025

Token Expand-Merge: Training-Free Token Compression for Vision-Language-Action Models

Yifan Ye, Jiaqi Ma, Jun Cen, Zhihe Lu

PDF

Open Access

TL;DR

TEAM-VLA is a training-free token compression method that accelerates vision-language-action models by dynamically expanding and merging tokens, improving inference speed without retraining.

Contribution

It introduces a novel training-free token expansion and merging framework for VLA models, enhancing efficiency while preserving performance.

Findings

01

Significantly speeds up inference on LIBERO benchmark

02

Maintains or improves task success rates compared to full models

03

Operates without retraining or parameter updates

Abstract

Vision-Language-Action (VLA) models pretrained on large-scale multimodal datasets have emerged as powerful foundations for robotic perception and control. However, their massive scale, often billions of parameters, poses significant challenges for real-time deployment, as inference becomes computationally expensive and latency-sensitive in dynamic environments. To address this, we propose Token Expand-and-Merge-VLA (TEAM-VLA), a training-free token compression framework that accelerates VLA inference while preserving task performance. TEAM-VLA introduces a dynamic token expansion mechanism that identifies and samples additional informative tokens in the spatial vicinity of attention-highlighted regions, enhancing contextual completeness. These expanded tokens are then selectively merged in deeper layers under action-aware guidance, effectively reducing redundancy while maintaining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning