DiaLoc: An Iterative Approach to Embodied Dialog Localization

Chao Zhang; Mohan Li; Ignas Budvytis; Stephan Liwicki

arXiv:2403.06846·cs.CV·March 12, 2024·1 cites

DiaLoc: An Iterative Approach to Embodied Dialog Localization

Chao Zhang, Mohan Li, Ignas Budvytis, Stephan Liwicki

PDF

Open Access

TL;DR

DiaLoc introduces an iterative, multimodal dialog-based localization framework that refines location predictions after each dialog turn, achieving state-of-the-art results and bridging the gap between simulation and real-world applications.

Contribution

It presents DiaLoc, a novel iterative localization approach that effectively fuses vision and dialog data for improved embodied localization performance.

Findings

01

Achieves +7.08% in Acc5@valUnseen in single-shot setting.

02

Achieves +10.85% in Acc5@valUnseen in multi-shot setting.

03

Narrowed the gap between simulation and real-world localization tasks.

Abstract

Multimodal learning has advanced the performance for many vision-language tasks. However, most existing works in embodied dialog research focus on navigation and leave the localization task understudied. The few existing dialog-based localization approaches assume the availability of entire dialog prior to localizaiton, which is impractical for deployed dialog-based localization. In this paper, we propose DiaLoc, a new dialog-based localization framework which aligns with a real human operator behavior. Specifically, we produce an iterative refinement of location predictions which can visualize current pose believes after each dialog turn. DiaLoc effectively utilizes the multimodal data for multi-shot localization, where a fusion encoder fuses vision and dialog information iteratively. We achieve state-of-the-art results on embodied dialog-based localization task, in single-shot (+7.08%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling

MethodsFocus