A General One-Shot Multimodal Active Perception Framework for Robotic Manipulation: Learning to Predict Optimal Viewpoint
Deyun Qin, Zezhi Liu, Hanqian Luo, Xiao Liang, Yongchun Fang

TL;DR
This paper introduces a versatile one-shot multimodal active perception framework that predicts optimal viewpoints for robotic manipulation, significantly enhancing grasp success rates and enabling effective sim-to-real transfer without extra fine-tuning.
Contribution
The paper presents a novel one-shot framework that decouples viewpoint evaluation from task-specific objectives and employs a multimodal prediction network with cross-attention for robotic perception.
Findings
Nearly doubled grasp success rate in real-world tests.
Effective sim-to-real transfer without additional fine-tuning.
Framework supports heterogeneous task requirements.
Abstract
Active perception in vision-based robotic manipulation aims to move the camera toward more informative observation viewpoints, thereby providing high-quality perceptual inputs for downstream tasks. Most existing active perception methods rely on iterative optimization, leading to high time and motion costs, and are tightly coupled with task-specific objectives, which limits their transferability. In this paper, we propose a general one-shot multimodal active perception framework for robotic manipulation. The framework enables direct inference of optimal viewpoints and comprises a data collection pipeline and an optimal viewpoint prediction network. Specifically, the framework decouples viewpoint quality evaluation from the overall architecture, supporting heterogeneous task requirements. Optimal viewpoints are defined through systematic sampling and evaluation of candidate viewpoints,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Soft Robotics and Applications · Social Robot Interaction and HRI
