Behavior Cloning for Active Perception with Low-Resolution Egocentric Vision
Anthony Bilic, Chen Chen, and Ladislau B\"ol\"oni

TL;DR
This paper demonstrates that behavior cloning using low-resolution egocentric vision enables a robot to perform active perception tasks like object finding and grasping reliably.
Contribution
It shows that behavior cloning with low-res visual input can produce effective active perception behaviors in a structured object-finding task.
Findings
Low-resolution egocentric vision suffices for task completion.
Predicting joint deltas outperforms absolute joint position prediction.
Visually grounded active perception emerges from behavior cloning.
Abstract
We investigate whether behavior cloning is sufficient to produce active perception in a structured object-finding task. A low-cost robot arm equipped with a wrist-mounted egocentric RGB camera must reposition to center a partially visible plant before triggering a grasp signal, requiring actions that improve future observations. The model predicts joint commands directly from low-resolution RGB images under closed-loop control. We show that low-resolution egocentric vision is sufficient for reliable task completion and that predicting relative joint deltas substantially outperforms absolute joint position prediction in our setting. These results demonstrate that visually grounded active perception can emerge from behavior cloning in a reproducible setting.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
