A Pilot Study of Applying Sequence-to-Sequence Voice Conversion to   Evaluate the Intelligibility of L2 Speech Using a Native Speaker's Shadowings

Haopeng Geng; Daisuke Saito; Nobuaki Minematsu

arXiv:2410.02239·cs.SD·October 4, 2024

A Pilot Study of Applying Sequence-to-Sequence Voice Conversion to Evaluate the Intelligibility of L2 Speech Using a Native Speaker's Shadowings

Haopeng Geng, Daisuke Saito, Nobuaki Minematsu

PDF

Open Access

TL;DR

This study explores using sequence-to-sequence voice conversion to simulate native speaker shadowing of L2 speech, aiming to improve fine-grained feedback for language learners by creating a virtual shadower system.

Contribution

It demonstrates the feasibility of a voice conversion system to replicate native shadowing behavior, providing a new approach for language learning feedback.

Findings

01

The VC system can produce shadowed speech similar to native utterances.

02

The virtual shadower shows linguistic and acoustic similarity to real native shadowing.

03

The approach offers potential for enhanced pronunciation feedback in language learning.

Abstract

Utterances by L2 speakers can be unintelligible due to mispronunciation and improper prosody. In computer-aided language learning systems, textual feedback is often provided using a speech recognition engine. However, an ideal form of feedback for L2 speakers should be so fine-grained that it enables them to detect and diagnose unintelligible parts of L2 speakers' utterances. Inspired by language teachers who correct students' pronunciation through a voice-to-voice process, this pilot study utilizes a unique semi-parallel dataset composed of non-native speakers' (L2) reading aloud, shadowing of native speakers (L1) and their script-shadowing utterances. We explore the technical possibility of replicating the process of an L1 speaker's shadowing L2 speech using Voice Conversion techniques, to create a virtual shadower system. Experimental results demonstrate the feasibility of the VC…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis