Towards Semi-Supervised Semantics Understanding from Speech
Cheng-I Lai, Jin Cao, Sravan Bodapati, Shang-Wen Li

TL;DR
This paper introduces a semi-supervised framework for speech-based semantics understanding that leverages pretrained models and performs well under noisy conditions with limited labeled data.
Contribution
The paper presents a novel semi-supervised approach combining pretrained ASR and language models for speech semantics understanding, addressing noise robustness and limited data challenges.
Findings
Achieves parity with oracle text-based models in noisy environments
Demonstrates effectiveness with limited labeled data
Introduces the slots edit F1 score as a new evaluation metric
Abstract
Much recent work on Spoken Language Understanding (SLU) falls short in at least one of three ways: models were trained on oracle text input and neglected the Automatics Speech Recognition (ASR) outputs, models were trained to predict only intents without the slot values, or models were trained on a large amount of in-house data. We proposed a clean and general framework to learn semantics directly from speech with semi-supervision from transcribed speech to address these. Our framework is built upon pretrained end-to-end (E2E) ASR and self-supervised language models, such as BERT, and fine-tuned on a limited amount of target SLU corpus. In parallel, we identified two inadequate settings under which SLU models have been tested: noise-robustness and E2E semantics evaluation. We tested the proposed framework under realistic environmental noises and with a new metric, the slots edit F1…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
MethodsLinear Layer · Residual Connection · Dense Connections · WordPiece · Layer Normalization · Attention Is All You Need · Adam · Linear Warmup With Linear Decay · Weight Decay · Dropout
