A Multimodal Dense Retrieval Approach for Speech-Based Open-Domain Question Answering
Georgios Sidiropoulos, Evangelos Kanoulas

TL;DR
This paper introduces an end-to-end multimodal dense retriever for speech-based open-domain QA that directly processes spoken questions, outperforming traditional ASR-dependent pipelines especially with shorter questions and high error rates.
Contribution
It proposes an ASR-free, end-to-end trained multimodal dense retrieval model for spoken questions, addressing limitations of ASR-based pipelines in low-resource and specialized domains.
Findings
Outperforms ASR-based pipelines on shorter questions
Better retrieval when ASR transcriptions have high error rates
Effective in low-resource language and domain scenarios
Abstract
Speech-based open-domain question answering (QA over a large corpus of text passages with spoken questions) has emerged as an important task due to the increasing number of users interacting with QA systems via speech interfaces. Passage retrieval is a key task in speech-based open-domain QA. So far, previous works adopted pipelines consisting of an automatic speech recognition (ASR) model that transcribes the spoken question before feeding it to a dense text retriever. Such pipelines have several limitations. The need for an ASR model limits the applicability to low-resource languages and specialized domains with no annotated speech data. Furthermore, the ASR model propagates its errors to the retriever. In this work, we try to alleviate these limitations by proposing an ASR-free, end-to-end trained multimodal dense retriever that can work directly on spoken questions. Our experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
