A Quantitative Approach to Understand Self-Supervised Models as   Cross-lingual Feature Extractors

Shuyue Stella Li; Beining Xu; Xiangyu Zhang; Hexin Liu; Wenhan Chao,; Leibny Paola Garcia

arXiv:2311.15954·cs.CL·November 28, 2023·1 cites

A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors

Shuyue Stella Li, Beining Xu, Xiangyu Zhang, Hexin Liu, Wenhan Chao,, Leibny Paola Garcia

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new metric called Phonetic-Syntax Ratio (PSR) to evaluate the quality of features extracted by English self-supervised models for cross-lingual speech recognition, highlighting the impact of training objectives and model architecture.

Contribution

The paper proposes a novel metric (PSR) to predict cross-lingual feature quality and analyzes how model design influences speech recognition performance in multilingual contexts.

Findings

01

Contrastive loss improves cross-lingual feature extraction.

02

Higher PSR scores correlate with better ASR performance.

03

PSR effectively predicts representation quality for model selection.

Abstract

In this work, we study the features extracted by English self-supervised learning (SSL) models in cross-lingual contexts and propose a new metric to predict the quality of feature representations. Using automatic speech recognition (ASR) as a downstream task, we analyze the effect of model size, training objectives, and model architecture on the models' performance as a feature extractor for a set of topologically diverse corpora. We develop a novel metric, the Phonetic-Syntax Ratio (PSR), to measure the phonetic and synthetic information in the extracted representations using deep generalized canonical correlation analysis. Results show the contrastive loss in the wav2vec2.0 objective facilitates more effective cross-lingual feature extraction. There is a positive correlation between PSR scores and ASR performance, suggesting that phonetic information extracted by monolingual SSL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stellali7/ssl_psr
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling

MethodsSparse Evolutionary Training