BFA++: Hierarchical Best-Feature-Aware Token Prune for Multi-View Vision Language Action Model
Haosheng Li, Weixin Mao, Zihan Lan, Hongwei Xiong, Hongan Wang, Chenyang Si, Ziwei Liu, Xiaoming Deng, Hua Chen

TL;DR
BFA++ is a hierarchical, task-aware token pruning framework for multi-view vision-language action models that enhances efficiency and success rates in robotic manipulation tasks.
Contribution
It introduces a hierarchical, dynamic token pruning method tailored for VLA models, improving efficiency and accuracy over existing techniques.
Findings
BFA++ achieves about 10% higher success rates on RoboTwin benchmark.
It speeds up inference by 1.8X and 1.5X on different models.
The method effectively reduces cross-view redundancy and spatial noise.
Abstract
Vision-Language-Action (VLA) models have achieved significant breakthroughs by leveraging Large Vision Language Models (VLMs) to jointly interpret instructions and visual inputs. However, the substantial increase in visual tokens, particularly from multi-view inputs, poses serious challenges to real-time robotic manipulation. Existing acceleration techniques for VLMs, such as token pruning, often result in degraded performance when directly applied to VLA models, as they overlook the relationships between different views and fail to account for the dynamic and task-specific characteristics of robotic operation. To address this, we propose BFA++, a dynamic token pruning framework designed specifically for VLA models. BFA++ introduces a hierarchical pruning strategy guided by two-level importance predictors: an intra-view predictor highlights task-relevant regions within each image to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Robot Manipulation and Learning
