End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations
Bolaji Yusuf, Jan Cernocky, Murat Saraclar

TL;DR
This paper introduces a multilingual neural model for open-vocabulary keyword search that simplifies the pipeline and outperforms traditional ASR-based systems on long and out-of-vocabulary queries.
Contribution
It extends previous neural ASR-free keyword search models with multilingual pretraining, enhancing performance and analysis of the approach.
Findings
Multilingual training improves model performance.
Outperforms ASR-based systems on long and out-of-vocabulary queries.
Maintains an efficient, simplified search pipeline.
Abstract
Conventional keyword search systems operate on automatic speech recognition (ASR) outputs, which causes them to have a complex indexing and search pipeline. This has led to interest in ASR-free approaches to simplify the search procedure. We recently proposed a neural ASR-free keyword search model which achieves competitive performance while maintaining an efficient and simplified pipeline, where queries and documents are encoded with a pair of recurrent neural network encoders and the encodings are combined with a dot-product. In this article, we extend this work with multilingual pretraining and detailed analysis of the model. Our experiments show that the proposed multilingual training significantly improves the model performance and that despite not matching a strong ASR-based conventional keyword search system for short queries and queries comprising in-vocabulary words, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Speech and dialogue systems
