Bidirectional Representations for Low Resource Spoken Language   Understanding

Quentin Meeus; Marie-Francine Moens; Hugo Van hamme

arXiv:2211.14320·cs.CL·October 18, 2023

Bidirectional Representations for Low Resource Spoken Language Understanding

Quentin Meeus, Marie-Francine Moens, Hugo Van hamme

PDF

Open Access

TL;DR

This paper introduces a bidirectional speech representation model using masked language modeling, improving low-resource spoken language understanding and achieving state-of-the-art results with efficient class attention mechanisms.

Contribution

It proposes a novel bidirectional encoding approach for speech, leveraging masked language modeling and class attention for better low-resource spoken language understanding.

Findings

01

Better pre-fine-tuning representations than comparable models

02

State-of-the-art performance on Fluent Speech Command dataset

03

Effective in low-data regimes

Abstract

Most spoken language understanding systems use a pipeline approach composed of an automatic speech recognition interface and a natural language understanding module. This approach forces hard decisions when converting continuous inputs into discrete language symbols. Instead, we propose a representation model to encode speech in rich bidirectional encodings that can be used for downstream tasks such as intent prediction. The approach uses a masked language modelling objective to learn the representations, and thus benefits from both the left and right contexts. We show that the performance of the resulting encodings before fine-tuning is better than comparable models on multiple datasets, and that fine-tuning the top layers of the representation model improves the current state of the art on the Fluent Speech Command dataset, also in a low-data regime, when a limited amount of labelled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Class Attention