DiaLoc: An Iterative Approach to Embodied Dialog Localization
Chao Zhang, Mohan Li, Ignas Budvytis, Stephan Liwicki

TL;DR
DiaLoc introduces an iterative, multimodal dialog-based localization framework that refines location predictions after each dialog turn, achieving state-of-the-art results and bridging the gap between simulation and real-world applications.
Contribution
It presents DiaLoc, a novel iterative localization approach that effectively fuses vision and dialog data for improved embodied localization performance.
Findings
Achieves +7.08% in Acc5@valUnseen in single-shot setting.
Achieves +10.85% in Acc5@valUnseen in multi-shot setting.
Narrowed the gap between simulation and real-world localization tasks.
Abstract
Multimodal learning has advanced the performance for many vision-language tasks. However, most existing works in embodied dialog research focus on navigation and leave the localization task understudied. The few existing dialog-based localization approaches assume the availability of entire dialog prior to localizaiton, which is impractical for deployed dialog-based localization. In this paper, we propose DiaLoc, a new dialog-based localization framework which aligns with a real human operator behavior. Specifically, we produce an iterative refinement of location predictions which can visualize current pose believes after each dialog turn. DiaLoc effectively utilizes the multimodal data for multi-shot localization, where a fusion encoder fuses vision and dialog information iteratively. We achieve state-of-the-art results on embodied dialog-based localization task, in single-shot (+7.08%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling
MethodsFocus
