Multimodal Sequential Generative Models for Semi-Supervised Language Instruction Following
Kei Akuzawa, Yusuke Iwasawa, Yutaka Matsuo

TL;DR
This paper introduces a multimodal generative model for semi-supervised learning in language instruction following, effectively leveraging unpaired data to enhance agent performance in navigation tasks.
Contribution
It proposes a novel network architecture for sequence-to-sequence multimodal data and combines generative models with semi-supervised methods to improve instruction following.
Findings
Improves instruction following performance using unpaired data.
Enhances speaker-follower model accuracy by 2-4% in R2R environment.
Addresses challenges of variable-length multimodal sequences with a new architecture.
Abstract
Agents that can follow language instructions are expected to be useful in a variety of situations such as navigation. However, training neural network-based agents requires numerous paired trajectories and languages. This paper proposes using multimodal generative models for semi-supervised learning in the instruction following tasks. The models learn a shared representation of the paired data, and enable semi-supervised learning by reconstructing unpaired data through the representation. Key challenges in applying the models to sequence-to-sequence tasks including instruction following are learning a shared representation of variable-length mulitimodal data and incorporating attention mechanisms. To address the problems, this paper proposes a novel network architecture to absorb the difference in the sequence lengths of the multimodal data. In addition, to further improve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Multimodal Machine Learning Applications · Topic Modeling
