A Study into Pre-training Strategies for Spoken Language Understanding   on Dysarthric Speech

Pu Wang; Bagher BabaAli; Hugo Van hamme

arXiv:2106.08313·eess.AS·June 16, 2021·Interspeech

A Study into Pre-training Strategies for Spoken Language Understanding on Dysarthric Speech

Pu Wang, Bagher BabaAli, Hugo Van hamme

PDF

Open Access

TL;DR

This paper explores pre-training strategies for end-to-end spoken language understanding systems tailored to dysarthric speech, emphasizing transfer learning from normal and dysarthric speech to improve performance in low-resource scenarios.

Contribution

It introduces a two-stage pre-training approach for SLU on dysarthric speech and analyzes how impairment severity affects model generalization.

Findings

01

Pre-training on normal speech followed by fine-tuning improves SLU performance.

02

Intelligibility scores correlate with model generalization to dysarthric speech.

03

Pre-training strategies mitigate data scarcity issues in dysarthric SLU.

Abstract

End-to-end (E2E) spoken language understanding (SLU) systems avoid an intermediate textual representation by mapping speech directly into intents with slot values. This approach requires considerable domain-specific training data. In low-resource scenarios this is a major concern, e.g., in the present study dealing with SLU for dysarthric speech. Pretraining part of the SLU model for automatic speech recognition targets helps but no research has shown to which extent SLU on dysarthric speech benefits from knowledge transferred from other dysarthric speech tasks. This paper investigates the efficiency of pre-training strategies for SLU tasks on dysarthric speech. The designed SLU system consists of a TDNN acoustic model for feature encoding and a capsule network for intent and slot decoding. The acoustic model is pre-trained in two stages: initialization with a corpus of normal speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and dialogue systems

MethodsCapsule Network