Imagination-Augmented Natural Language Understanding
Yujie Lu, Wanrong Zhu, Xin Eric Wang, Miguel Eckstein, William Yang, Wang

TL;DR
This paper introduces iACE, a novel model that enhances natural language understanding by integrating visual imagination through cross-modal learning, significantly improving performance especially in low-resource scenarios.
Contribution
The paper proposes the first imagination-augmented cross-modal encoder that leverages vision-and-language models to improve NLU, especially in few-shot learning settings.
Findings
iACE outperforms existing models on GLUE and SWAG benchmarks.
iACE shows significant improvements in low-resource, few-shot scenarios.
Visual imagination integration enhances NLU model generalization.
Abstract
Human brains integrate linguistic and perceptual information simultaneously to understand natural language, and hold the critical ability to render imaginations. Such abilities enable us to construct new abstract concepts or concrete objects, and are essential in involving practical knowledge to solve problems in low-resource scenarios. However, most existing methods for Natural Language Understanding (NLU) are mainly focused on textual signals. They do not simulate human visual imagination ability, which hinders models from inferring and learning efficiently from limited data samples. Therefore, we introduce an Imagination-Augmented Cross-modal Encoder (iACE) to solve natural language understanding tasks from a novel learning perspective -- imagination-augmented cross-modal understanding. iACE enables visual imagination with external knowledge transferred from the powerful generative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Subtitles and Audiovisual Media
