End-to-end Spoken Conversational Question Answering: Task, Dataset and Model
Chenyu You, Nuo Chen, Fenglin Liu, Shen Ge, Xian Wu, Yuexian Zou

TL;DR
This paper introduces the SCQA task for spoken conversational question answering, proposing a novel data distillation method and dual attention mechanism to improve system understanding of dialogue in speech data.
Contribution
The paper presents a new SCQA task, a unified data distillation approach DDNet, and a dual attention mechanism to enhance spoken conversational QA systems.
Findings
Proposed methods outperform existing models on the Spoken-CoQA dataset.
Cross-modal information integration is crucial for effective spoken conversational QA.
The dataset contains over 40k question-answer pairs from 4k conversations.
Abstract
In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts. However, the most natural way that human seek or test their knowledge is via human conversations. Therefore, we propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows given the speech documents. In this task, our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering. To this end, instead of directly adopting automatically generated speech transcripts with highly noisy data, we propose a novel unified data distillation approach, DDNet, which effectively ingests cross-modal information to achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
