Embodied Learning for Lifelong Visual Perception
David Nilsson, Aleksis Pirinen, Erik G\"artner, Cristian Sminchisescu

TL;DR
This paper introduces embodied lifelong visual perception agents that navigate, explore, and request annotations in building environments, improving recognition through active learning and prior knowledge across scenes.
Contribution
It proposes a deep RL-based agent that jointly learns navigation and active learning, leveraging prior scene knowledge for improved exploration and perception.
Findings
Learning-based agents outperform heuristic methods.
Prior knowledge reduces annotation requests.
Agents achieve better semantic segmentation accuracy.
Abstract
We study lifelong visual perception in an embodied setup, where we develop new models and compare various agents that navigate in buildings and occasionally request annotations which, in turn, are used to refine their visual perception capabilities. The purpose of the agents is to recognize objects and other semantic classes in the whole building at the end of a process that combines exploration and active visual learning. As we study this task in a lifelong learning context, the agents should use knowledge gained in earlier visited environments in order to guide their exploration and active learning strategy in successively visited buildings. We use the semantic segmentation performance as a proxy for general visual perception and study this novel task for several exploration and annotation methods, ranging from frontier exploration baselines which use heuristic active learning, to a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Urban Planning and Valuation
