A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors
Shuyue Stella Li, Beining Xu, Xiangyu Zhang, Hexin Liu, Wenhan Chao,, Leibny Paola Garcia

TL;DR
This paper introduces a new metric called Phonetic-Syntax Ratio (PSR) to evaluate the quality of features extracted by English self-supervised models for cross-lingual speech recognition, highlighting the impact of training objectives and model architecture.
Contribution
The paper proposes a novel metric (PSR) to predict cross-lingual feature quality and analyzes how model design influences speech recognition performance in multilingual contexts.
Findings
Contrastive loss improves cross-lingual feature extraction.
Higher PSR scores correlate with better ASR performance.
PSR effectively predicts representation quality for model selection.
Abstract
In this work, we study the features extracted by English self-supervised learning (SSL) models in cross-lingual contexts and propose a new metric to predict the quality of feature representations. Using automatic speech recognition (ASR) as a downstream task, we analyze the effect of model size, training objectives, and model architecture on the models' performance as a feature extractor for a set of topologically diverse corpora. We develop a novel metric, the Phonetic-Syntax Ratio (PSR), to measure the phonetic and synthetic information in the extracted representations using deep generalized canonical correlation analysis. Results show the contrastive loss in the wav2vec2.0 objective facilitates more effective cross-lingual feature extraction. There is a positive correlation between PSR scores and ASR performance, suggesting that phonetic information extracted by monolingual SSL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
MethodsSparse Evolutionary Training
