Accessible Instruction-Following Agent

Kairui Zhou

arXiv:2305.06358·cs.AI·May 12, 2023·1 cites

Accessible Instruction-Following Agent

Kairui Zhou

PDF

Open Access

TL;DR

This paper presents UVLN, a multilingual instruction-following agent for vision-language navigation that leverages machine translation, large language models, and cross-modal transformers to improve accessibility and performance in diverse languages.

Contribution

The work introduces UVLN, a novel framework combining machine translation, large language models, and cross-modal transformers for cross-lingual vision-language navigation, enhancing accessibility and multilingual capabilities.

Findings

01

Effective cross-lingual navigation demonstrated on Room Across Room Dataset.

02

Improved accessibility for non-English and low-resource languages.

03

Qualitative results show enhanced intractability and user understanding.

Abstract

Humans can collaborate and complete tasks based on visual signals and instruction from the environment. Training such a robot is difficult especially due to the understanding of the instruction and the complicated environment. Previous instruction-following agents are biased to English-centric corpus, making it unrealizable to be applied to users that use multiple languages or even low-resource languages. Nevertheless, the instruction-following agents are pre-trained in a mode that assumes the user can observe the environment, which limits its accessibility. In this work, we're trying to generalize the success of instruction-following agents to non-English languages with little corpus resources, and improve its intractability and accessibility. We introduce UVLN (Universal Vision-Language Navigation), a novel machine-translation instructional augmented framework for cross-lingual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Advanced Image and Video Retrieval Techniques