A Comprehensive Analysis of Tokenization and Self-Supervised Learning in End-to-End Automatic Speech Recognition applied on French Language

Thibault Ba\~neras-Roux; Mickael Rouvier; Jane Wottawa; Richard Dufour

arXiv:2605.03696·cs.CL·May 6, 2026

A Comprehensive Analysis of Tokenization and Self-Supervised Learning in End-to-End Automatic Speech Recognition applied on French Language

Thibault Ba\~neras-Roux, Mickael Rouvier, Jane Wottawa, Richard Dufour

PDF

TL;DR

This paper qualitatively analyzes how subword tokenization and self-supervised learning affect French ASR performance across various linguistic and acoustic metrics.

Contribution

It provides a comprehensive evaluation of tokenization algorithms and self-supervised models for French ASR beyond traditional error metrics.

Findings

01

Tokenization impacts linguistic and acoustic aspects of ASR.

02

Self-supervised models improve certain linguistic features.

03

Traditional metrics like CER and WER are insufficient for full assessment.

Abstract

The performance of end-to-end automatic speech recognition (ASR) systems enables their increasing integration into numerous applications. While there are various benefits to such speech-to-text systems, the choice of hyperparameters and models plays a crucial role in their performance. Typically, these choices are determined by considering only the character (CER) and/or word error rate (WER) metrics. However, it has been shown in several studies that these metrics are largely incomplete and fail to adequately describe the downstream application of automatic transcripts. In this paper, we conduct a qualitative study on the French language that investigates the impact of subword tokenization algorithms and self-supervised learning models from different linguistic and acoustic perspectives, using a comprehensive set of evaluation metrics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.