ODSQA: Open-domain Spoken Question Answering Dataset
Chia-Hsuan Lee, Shang-Ming Wang, Huan-Cheng Chang, Hung-Yi, Lee

TL;DR
This paper introduces ODSQA, the largest real-world spoken question answering dataset, and explores the impact of ASR errors and data augmentation techniques to improve machine comprehension of spoken content.
Contribution
The paper releases the first large-scale real spoken QA dataset and investigates methods to mitigate ASR errors, including subword units and data augmentation.
Findings
ASR errors severely impact spoken QA performance
Subword units improve robustness across models
Data augmentation enhances spoken QA accuracy
Abstract
Reading comprehension by machine has been widely studied, but machine comprehension of spoken content is still a less investigated problem. In this paper, we release Open-Domain Spoken Question Answering Dataset (ODSQA) with more than three thousand questions. To the best of our knowledge, this is the largest real SQA dataset. On this dataset, we found that ASR errors have catastrophic impact on SQA. To mitigate the effect of ASR errors, subword units are involved, which brings consistent improvements over all the models. We further found that data augmentation on text-based QA training examples can improve SQA.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
