Scene Exploration by Vision-Language Models
Venkatesh Sripada, Samuel Carter, Frank Guerin, Amir Ghalamzan

TL;DR
This paper introduces AP-VLM, a framework combining active perception with vision-language models to improve robotic scene exploration and semantic understanding in complex, partially observable environments.
Contribution
The paper presents a novel active perception framework that integrates vision-language models for robotic exploration and semantic querying, enabling adaptive viewpoint selection.
Findings
AP-VLM outperforms passive perception methods in object identification.
The system effectively guides robots in complex scenes with occlusions.
AP-VLM demonstrates adaptability across different robotic platforms.
Abstract
Active perception enables robots to dynamically gather information by adjusting their viewpoints, a crucial capability for interacting with complex, partially observable environments. In this paper, we present AP-VLM, a novel framework that combines active perception with a Vision-Language Model (VLM) to guide robotic exploration and answer semantic queries. Using a 3D virtual grid overlaid on the scene and orientation adjustments, AP-VLM allows a robotic manipulator to intelligently select optimal viewpoints and orientations to resolve challenging tasks, such as identifying objects in occluded or inclined positions. We evaluate our system on two robotic platforms: a 7-DOF Franka Panda and a 6-DOF UR5, across various scenes with differing object configurations. Our results demonstrate that AP-VLM significantly outperforms passive perception methods and baseline models, including Toward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRetinal Imaging and Analysis · Blind Source Separation Techniques · Image Processing Techniques and Applications
