Audio-Oriented Multimodal Machine Comprehension: Task, Dataset and Model
Zhiqi Huang, Fenglin Liu, Xian Wu, Shen Ge, Helin Wang, Wei Fan,, Yuexian Zou

TL;DR
This paper introduces a multimodal machine comprehension model that fuses audio and textual data, enabling it to perform well on various MC tasks and outperform unimodal models through novel attention and knowledge distillation techniques.
Contribution
It proposes the DIIA model for effective audio-text fusion and the MKD module for unimodal prediction, advancing multimodal MC capabilities.
Findings
DIIA improves accuracy by up to 21.08%.
MKD enables the model to outperform unimodal models by up to 18.87%.
The model handles multiple MC tasks with a single architecture.
Abstract
While Machine Comprehension (MC) has attracted extensive research interests in recent years, existing approaches mainly belong to the category of Machine Reading Comprehension task which mines textual inputs (paragraphs and questions) to predict the answers (choices or text spans). However, there are a lot of MC tasks that accept audio input in addition to the textual input, e.g. English listening comprehension test. In this paper, we target the problem of Audio-Oriented Multimodal Machine Comprehension, and its goal is to answer questions based on the given audio and textual information. To solve this problem, we propose a Dynamic Inter- and Intra-modality Attention (DIIA) model to effectively fuse the two modalities (audio and textual). DIIA can work as an independent component and thus be easily integrated into existing MC models. Moreover, we further develop a Multimodal Knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsKnowledge Distillation
