A dual task learning approach to fine-tune a multilingual semantic   speech encoder for Spoken Language Understanding

Ga\"elle Laperri\`ere; Sahar Ghannay; Bassam Jabaian; Yannick Est\`eve

arXiv:2406.12141·cs.CL·June 19, 2024

A dual task learning approach to fine-tune a multilingual semantic speech encoder for Spoken Language Understanding

Ga\"elle Laperri\`ere, Sahar Ghannay, Bassam Jabaian, Yannick Est\`eve

PDF

TL;DR

This paper introduces a dual task learning method to enhance a multilingual speech encoder's semantic understanding, aiming to improve performance across diverse languages without sacrificing cross-lingual capabilities.

Contribution

It proposes a novel dual task learning approach that balances semantic enrichment and multilingual performance in speech encoders, addressing limitations of previous specialization methods.

Findings

01

Improved semantic enrichment across multiple languages.

02

Maintained cross-lingual abilities after fine-tuning.

03

Enhanced performance on multilingual SLU tasks.

Abstract

Self-Supervised Learning is vastly used to efficiently represent speech for Spoken Language Understanding, gradually replacing conventional approaches. Meanwhile, textual SSL models are proposed to encode language-agnostic semantics. SAMU-XLSR framework employed this semantic information to enrich multilingual speech representations. A recent study investigated SAMU-XLSR in-domain semantic enrichment by specializing it on downstream transcriptions, leading to state-of-the-art results on a challenging SLU task. This study's interest lies in the loss of multilingual performances and lack of specific-semantics training induced by such specialization in close languages without any SLU implication. We also consider SAMU-XLSR's loss of initial cross-lingual abilities due to a separate SLU fine-tuning. Therefore, this paper proposes a dual task learning approach to improve SAMU-XLSR semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.