SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation
Wei Li, Renshan Zhang, Rui Shao, Zhijian Fang, Kaiwen Zhou, Zhuotao Tian, Liqiang Nie

TL;DR
SemanticVLA introduces a novel framework that enhances robotic manipulation by sparsifying perceptual inputs, aligning semantics effectively, and integrating perception with action for improved efficiency and performance.
Contribution
It proposes SemanticVLA, a new VLA framework with semantic-aligned sparsification and enhanced perception-action integration, achieving state-of-the-art results in robotic manipulation tasks.
Findings
Surpasses OpenVLA by 21.1% success rate on LIBERO benchmark.
Reduces training cost by 3.0 times and inference latency by 2.7 times.
Sets new state-of-the-art in efficiency and performance for robotic manipulation.
Abstract
Vision-Language-Action (VLA) models have advanced in robotic manipulation, yet practical deployment remains hindered by two key limitations: 1) perceptual redundancy, where irrelevant visual inputs are processed inefficiently, and 2) superficial instruction-vision alignment, which hampers semantic grounding of actions. In this paper, we propose SemanticVLA, a novel VLA framework that performs Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation. Specifically: 1) To sparsify redundant perception while preserving semantic alignment, Semantic-guided Dual Visual Pruner (SD-Pruner) performs: Instruction-driven Pruner (ID-Pruner) extracts global action cues and local semantic anchors in SigLIP; Spatial-aggregation Pruner (SA-Pruner) compacts geometry-rich features into task-adaptive tokens in DINOv2. 2) To exploit sparsified features and integrate semantics with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics
