Embodied Agents for Efficient Exploration and Smart Scene Description
Roberto Bigazzi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi,, Rita Cucchiara

TL;DR
This paper presents a novel approach for embodied agents that explore unseen indoor environments and generate natural language descriptions, enhancing interpretability and semantic understanding during exploration.
Contribution
It introduces a combined exploration and image captioning method that produces informative, non-repetitive scene descriptions, improving human-robot interaction in indoor navigation tasks.
Findings
Effective scene descriptions generated during exploration.
Improved interpretability of robot observations.
Validated on simulated and real-world environments.
Abstract
The development of embodied agents that can communicate with humans in natural language has gained increasing interest over the last years, as it facilitates the diffusion of robotic platforms in human-populated environments. As a step towards this objective, in this work, we tackle a setting for visual navigation in which an autonomous agent needs to explore and map an unseen indoor environment while portraying interesting scenes with natural language descriptions. To this end, we propose and evaluate an approach that combines recent advances in visual robotic exploration and image captioning on images generated through agent-environment interaction. Our approach can generate smart scene descriptions that maximize semantic knowledge of the environment and avoid repetitions. Further, such descriptions offer user-understandable insights into the robot's representation of the environment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques
MethodsDiffusion
