Behavior Cloning for Active Perception with Low-Resolution Egocentric Vision

Anthony Bilic; Chen Chen; and Ladislau B\"ol\"oni

arXiv:2605.14106·cs.RO·May 15, 2026

Behavior Cloning for Active Perception with Low-Resolution Egocentric Vision

Anthony Bilic, Chen Chen, and Ladislau B\"ol\"oni

PDF

TL;DR

This paper demonstrates that behavior cloning using low-resolution egocentric vision enables a robot to perform active perception tasks like object finding and grasping reliably.

Contribution

It shows that behavior cloning with low-res visual input can produce effective active perception behaviors in a structured object-finding task.

Findings

01

Low-resolution egocentric vision suffices for task completion.

02

Predicting joint deltas outperforms absolute joint position prediction.

03

Visually grounded active perception emerges from behavior cloning.

Abstract

We investigate whether behavior cloning is sufficient to produce active perception in a structured object-finding task. A low-cost robot arm equipped with a wrist-mounted egocentric RGB camera must reposition to center a partially visible plant before triggering a grasp signal, requiring actions that improve future observations. The model predicts joint commands directly from low-resolution RGB images under closed-loop control. We show that low-resolution egocentric vision is sufficient for reliable task completion and that predicting relative joint deltas substantially outperforms absolute joint position prediction in our setting. These results demonstrate that visually grounded active perception can emerge from behavior cloning in a reproducible setting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.