A transfer learning based approach for pronunciation scoring

Marcelo Sancinetti; Jazmin Vidal; Cyntia Bonomi; Luciana Ferrer

arXiv:2111.00976·cs.CL·May 10, 2023

A transfer learning based approach for pronunciation scoring

Marcelo Sancinetti, Jazmin Vidal, Cyntia Bonomi, Luciana Ferrer

PDF

Open Access 1 Repo

TL;DR

This paper introduces a transfer learning approach that adapts ASR models for pronunciation scoring, significantly improving accuracy over existing methods by leveraging native speech data and addressing data scarcity issues.

Contribution

The study presents a novel transfer learning method for pronunciation scoring that outperforms state-of-the-art GOP systems, especially in low correction rate scenarios.

Findings

01

20% improvement over GOP system on EpaDB

02

Effective adaptation of ASR models for pronunciation scoring

03

Analysis of design choices impacts

Abstract

Phone-level pronunciation scoring is a challenging task, with performance far from that of human annotators. Standard systems generate a score for each phone in a phrase using models trained for automatic speech recognition (ASR) with native data only. Better performance has been shown when using systems that are trained specifically for the task using non-native data. Yet, such systems face the challenge that datasets labelled for this task are scarce and usually small. In this paper, we present a transfer learning-based approach that leverages a model trained for ASR, adapting it for the task of pronunciation scoring. We analyze the effect of several design choices and compare the performance with a state-of-the-art goodness of pronunciation (GOP) system. Our final system is 20% better than the GOP system on EpaDB, a database for pronunciation scoring research, for a cost function…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

marcelosancinetti/epa-gop-pykaldi
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing