Accessible Instruction-Following Agent
Kairui Zhou

TL;DR
This paper presents UVLN, a multilingual instruction-following agent for vision-language navigation that leverages machine translation, large language models, and cross-modal transformers to improve accessibility and performance in diverse languages.
Contribution
The work introduces UVLN, a novel framework combining machine translation, large language models, and cross-modal transformers for cross-lingual vision-language navigation, enhancing accessibility and multilingual capabilities.
Findings
Effective cross-lingual navigation demonstrated on Room Across Room Dataset.
Improved accessibility for non-English and low-resource languages.
Qualitative results show enhanced intractability and user understanding.
Abstract
Humans can collaborate and complete tasks based on visual signals and instruction from the environment. Training such a robot is difficult especially due to the understanding of the instruction and the complicated environment. Previous instruction-following agents are biased to English-centric corpus, making it unrealizable to be applied to users that use multiple languages or even low-resource languages. Nevertheless, the instruction-following agents are pre-trained in a mode that assumes the user can observe the environment, which limits its accessibility. In this work, we're trying to generalize the success of instruction-following agents to non-English languages with little corpus resources, and improve its intractability and accessibility. We introduce UVLN (Universal Vision-Language Navigation), a novel machine-translation instructional augmented framework for cross-lingual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Advanced Image and Video Retrieval Techniques
