Selective Perception for Robot: Task-Aware Attention in Multimodal VLA
Young-Chae Son, Jung-Woo Lee, Yoon-Ji Choi, Dae-Kwan Ko, Soo-Chul Lim

TL;DR
This paper introduces a dynamic, task-aware attention framework for robot vision-language-action models that improves efficiency and robustness by selectively processing relevant visual inputs in real-time, inspired by human perception.
Contribution
It proposes a lightweight adaptive routing architecture for real-time, task-dependent multimodal data fusion in robotic perception, reducing unnecessary computation and noise.
Findings
Significant improvements in inference efficiency.
Enhanced control performance in robotic manipulation.
Validated effectiveness in real-world scenarios.
Abstract
In robotics, Vision-Language-Action (VLA) models that integrate diverse multimodal signals from multi-view inputs have emerged as an effective approach. However, most prior work adopts static fusion that processes all visual inputs uniformly, which incurs unnecessary computational overhead and allows task-irrelevant background information to act as noise. Inspired by the principles of human active perception, we propose a dynamic information fusion framework designed to maximize the efficiency and robustness of VLA models. Our approach introduces a lightweight adaptive routing architecture that analyzes the current text prompt and observations from a wrist-mounted camera in real-time to predict the task-relevance of multiple camera views. By conditionally attenuating computations for views with low informational utility and selectively providing only essential visual features to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Advanced Neural Network Applications
