End-to-end model for named entity recognition from speech without paired   training data

Salima Mdhaffar; Jarod Duret; Titouan Parcollet; Yannick Est\`eve

arXiv:2204.00803·cs.CL·April 5, 2022

End-to-end model for named entity recognition from speech without paired training data

Salima Mdhaffar, Jarod Duret, Titouan Parcollet, Yannick Est\`eve

PDF

Open Access

TL;DR

This paper introduces an end-to-end neural approach for named entity recognition from speech that does not require paired audio and text data, using an external text-to-vector model to simulate speech representations.

Contribution

It presents a novel method to build end-to-end spoken language understanding models without paired training data by leveraging external text-based vector representations.

Findings

01

Outperforms cascade approaches in NER from speech

02

Effective even without paired audio-text data

03

Shows promising results on the QUAERO corpus

Abstract

Recent works showed that end-to-end neural approaches tend to become very popular for spoken language understanding (SLU). Through the term end-to-end, one considers the use of a single model optimized to extract semantic information directly from the speech signal. A major issue for such models is the lack of paired audio and textual data with semantic annotation. In this paper, we propose an approach to build an end-to-end neural model to extract semantic information in a scenario in which zero paired audio data is available. Our approach is based on the use of an external model trained to generate a sequence of vectorial representations from text. These representations mimic the hidden representations that could be generated inside an end-to-end automatic speech recognition (ASR) model by processing a speech signal. An SLU neural module is then trained using these representations as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques