Semantic enrichment towards efficient speech representations

Ga\"elle Laperri\`ere; Ha Nguyen; Sahar Ghannay; Bassam Jabaian,; Yannick Est\`eve

arXiv:2307.01323·cs.CL·June 19, 2024

Semantic enrichment towards efficient speech representations

Ga\"elle Laperri\`ere, Ha Nguyen, Sahar Ghannay, Bassam Jabaian,, Yannick Est\`eve

PDF

Open Access

TL;DR

This paper explores semantic enrichment of speech representations using the SAMU-XLSR model, focusing on improving spoken language understanding in low-resource languages and evaluating cross-domain capabilities.

Contribution

It introduces in-domain semantic specialization of SAMU-XLSR with limited data and assesses its effectiveness across different languages and domains.

Findings

01

Semantic enrichment improves SLU performance.

02

In-domain training enhances language-specific understanding.

03

Cross-domain capabilities are maintained with enriched models.

Abstract

Over the past few years, self-supervised learned speech representations have emerged as fruitful replacements for conventional surface representations when solving Spoken Language Understanding (SLU) tasks. Simultaneously, multilingual models trained on massive textual data were introduced to encode language agnostic semantics. Recently, the SAMU-XLSR approach introduced a way to make profit from such textual models to enrich multilingual speech representations with language agnostic semantics. By aiming for better semantic extraction on a challenging Spoken Language Understanding task and in consideration with computation costs, this study investigates a specific in-domain semantic enrichment of the SAMU-XLSR model by specializing it on a small amount of transcribed data from the downstream task. In addition, we show the benefits of the use of same-domain French and Italian benchmarks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques