UNMuTe: Unifying Navigation and Multimodal Dialogue-like Text Generation
Niyati Rawal, Roberto Bigazzi, Lorenzo Baraldi, Rita Cucchiara

TL;DR
UNMuTe is a novel model that unifies navigation and multimodal dialogue generation, enabling autonomous agents to interact with humans or other agents for improved navigation in complex environments.
Contribution
This work introduces UNMuTe, combining a GPT-2 based dialogue model with a navigation component, allowing agents to ask questions and receive guidance during navigation tasks.
Findings
Achieves state-of-the-art results on CVDN and NDH datasets.
Effectively generates questions and answers to improve navigation.
Demonstrates the benefit of integrated dialogue and navigation models.
Abstract
Smart autonomous agents are becoming increasingly important in various real-life applications, including robotics and autonomous vehicles. One crucial skill that these agents must possess is the ability to interact with their surrounding entities, such as other agents or humans. In this work, we aim at building an intelligent agent that can efficiently navigate in an environment while being able to interact with an oracle (or human) in natural language and ask for directions when it is unsure about its navigation performance. The interaction is started by the agent that produces a question, which is then answered by the oracle on the basis of the shortest trajectory to the goal. The process can be performed multiple times during navigation, thus enabling the agent to hold a dialogue with the oracle. To this end, we propose a novel computational model, named UNMuTe, that consists of two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
