A neural document language modeling framework for spoken document retrieval
Li-Phen Yen, Zhen-Yu Wu, Kuan-Yu Chen

TL;DR
This paper introduces a pioneering neural document language modeling framework for spoken document retrieval, leveraging pretrained language representations to improve retrieval performance in multimedia speech data.
Contribution
It proposes the first supervised neural language model-based SDR framework that combines pretrained language representations, enhancing retrieval accuracy for spoken documents.
Findings
Demonstrates improved SDR performance using the proposed neural framework.
Integrates pretrained language models into spoken document retrieval.
Pioneers supervised training of neural LM-based SDR methods.
Abstract
Recent developments in deep learning have led to a significant innovation in various classic and practical subjects, including speech recognition, computer vision, question answering, information retrieval and so on. In the context of natural language processing (NLP), language representations have shown giant successes in many downstream tasks, so the school of studies have become a major stream of research recently. Because the immenseness of multimedia data along with speech have spread around the world in our daily life, spoken document retrieval (SDR) has become an important research subject in the past decades. Targeting on enhancing the SDR performance, the paper concentrates on proposing a neural retrieval framework, which assembles the merits of using language modeling (LM) mechanism in SDR and leveraging the abstractive information learned by the language representation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
