Explore and Explain: Self-supervised Navigation and Recounting

Roberto Bigazzi; Federico Landi; Marcella Cornia; Silvia Cascianelli,; Lorenzo Baraldi; Rita Cucchiara

arXiv:2007.07268·cs.CV·April 16, 2024

Explore and Explain: Self-supervised Navigation and Recounting

Roberto Bigazzi, Federico Landi, Marcella Cornia, Silvia Cascianelli,, Lorenzo Baraldi, Rita Cucchiara

PDF

TL;DR

This paper introduces a new embodied AI setting where agents explore unknown environments, generate natural language descriptions of their observations, and make decisions based on integrated self-supervised exploration and captioning models.

Contribution

It presents a novel self-supervised exploration module combined with an attentive captioning model for explanation in embodied AI navigation tasks.

Findings

01

Effective exploration and explanation in photorealistic environments

02

Interaction between navigation and explanation improves agent performance

03

Different explanation policies impact the quality of descriptions

Abstract

Embodied AI has been recently gaining attention as it aims to foster the development of autonomous and intelligent agents. In this paper, we devise a novel embodied setting in which an agent needs to explore a previously unknown environment while recounting what it sees during the path. In this context, the agent needs to navigate the environment driven by an exploration goal, select proper moments for description, and output natural language descriptions of relevant objects and scenes. Our model integrates a novel self-supervised exploration module with penalty, and a fully-attentive captioning model for explanation. Also, we investigate different policies for selecting proper moments for explanation, driven by information coming from both the environment and the navigation. Experiments are conducted on photorealistic environments from the Matterport3D dataset and investigate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.