TL;DR
This paper introduces embodied visual active learning for semantic segmentation, where an agent explores 3D environments to select informative views for annotation, improving recognition accuracy with fewer labels through reinforcement learning.
Contribution
It proposes a novel embodied active learning framework with deep reinforcement learning that outperforms pre-specified methods in semantic segmentation tasks in 3D environments.
Findings
Learned method outperforms pre-specified agents.
Fewer annotations needed for comparable accuracy.
Effective in photorealistic 3D environments.
Abstract
We study the task of embodied visual active learning, where an agent is set to explore a 3d environment with the goal to acquire visual scene understanding by actively selecting views for which to request annotation. While accurate on some benchmarks, today's deep visual recognition pipelines tend to not generalize well in certain real-world scenarios, or for unusual viewpoints. Robotic perception, in turn, requires the capability to refine the recognition capabilities for the conditions where the mobile system operates, including cluttered indoor environments or poor illumination. This motivates the proposed task, where an agent is placed in a novel environment with the objective of improving its visual recognition capability. To study embodied visual active learning, we develop a battery of agents - both learnt and pre-specified - and with different levels of knowledge of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
