Embodied Visual Recognition
Jianwei Yang, Zhile Ren, Mingze Xu, Xinlei Chen, David Crandall, Devi, Parikh, Dhruv Batra

TL;DR
This paper introduces Embodied Visual Recognition, where agents in 3D environments actively move to improve object recognition, demonstrating that embodied agents outperform passive ones and learn strategic movement paths.
Contribution
The paper presents a new task and model for embodied visual recognition, enabling agents to learn strategic movements to enhance recognition in occluded environments.
Findings
Embodied agents outperform passive visual systems in recognition tasks.
Agents learn strategic movement paths that differ from shortest routes.
Active movement improves amodal object recognition accuracy.
Abstract
Passive visual systems typically fail to recognize objects in the amodal setting where they are heavily occluded. In contrast, humans and other embodied agents have the ability to move in the environment, and actively control the viewing angle to better understand object shapes and semantics. In this work, we introduce the task of Embodied Visual Recognition (EVR): An agent is instantiated in a 3D environment close to an occluded target object, and is free to move in the environment to perform object classification, amodal object localization, and amodal object segmentation. To address this, we develop a new model called Embodied Mask R-CNN, for agents to learn to move strategically to improve their visual recognition abilities. We conduct experiments using the House3D environment. Experimental results show that: 1) agents with embodiment (movement) achieve better visual recognition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Neural Network Applications
MethodsRegion Proposal Network · Softmax · RoIAlign · Convolution · Mask R-CNN
