Observer-Actor: Active Vision Imitation Learning with Sparse-View Gaussian Splatting
Yilong Wang, Cheng Qian, Ruomeng Fan, and Edward Johns

TL;DR
The paper introduces ObAct, a framework where an observer moves to optimal viewpoints for an actor in robotic imitation learning, significantly improving policy robustness and performance over static-camera setups.
Contribution
ObAct is a novel active vision imitation learning framework that dynamically positions an observer to enhance observation quality for better policy training.
Findings
ObAct outperforms static-camera setups in imitation learning tasks.
Trajectory transfer improves by up to 233% with ObAct.
Behavior cloning improves by up to 143% with ObAct.
Abstract
We propose Observer Actor (ObAct), a novel framework for active vision imitation learning in which the observer moves to optimal visual observations for the actor. We study ObAct on a dual-arm robotic system equipped with wrist-mounted cameras. At test time, ObAct dynamically assigns observer and actor roles: the observer arm constructs a 3D Gaussian Splatting (3DGS) representation from three images, virtually explores this to find an optimal camera pose, then moves to this pose; the actor arm then executes a policy using the observer's observations. This formulation enhances the clarity and visibility of both the object and the gripper in the policy's observations. As a result, we enable the training of ambidextrous policies on observations that remain closer to the occlusion-free training distribution, leading to more robust policies. We study this formulation with two existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Advanced Vision and Imaging · Reinforcement Learning in Robotics
