Hierarchical Attention Model for Improved Machine Comprehension of Spoken Content
Wei Fang, Jui-Yang Hsu, Hung-yi Lee, Lin-Shan Lee

TL;DR
This paper introduces a Hierarchical Attention Model (HAM) that enhances machine comprehension of spoken content by leveraging tree-structured representations, leading to more robust understanding even with ASR errors.
Contribution
The paper presents a novel HAM architecture that uses multi-hopped attention over tree structures, improving comprehension accuracy over previous sequential models.
Findings
HAM outperforms previous models in comprehension accuracy.
The model is robust against ASR errors.
Tree-structured attention improves understanding of spoken content.
Abstract
Multimedia or spoken content presents more attractive information than plain text content, but the former is more difficult to display on a screen and be selected by a user. As a result, accessing large collections of the former is much more difficult and time-consuming than the latter for humans. It's therefore highly attractive to develop machines which can automatically understand spoken content and summarize the key information for humans to browse over. In this endeavor, a new task of machine comprehension of spoken content was proposed recently. The initial goal was defined as the listening comprehension test of TOEFL, a challenging academic English examination for English learners whose native languages are not English. An Attention-based Multi-hop Recurrent Neural Network (AMRNN) architecture was also proposed for this task, which considered only the sequential relationship…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
