Assessing Phrase Break of ESL speech with Pre-trained Language Models

Zhiyi Wang; Shaoguang Mao; Wenshan Wu; Yan Xia

arXiv:2210.16029·cs.CL·October 31, 2022

Assessing Phrase Break of ESL speech with Pre-trained Language Models

Zhiyi Wang, Shaoguang Mao, Wenshan Wu, Yan Xia

PDF

Open Access

TL;DR

This paper proposes a novel method for assessing phrase breaks in ESL speech using pre-trained language models, converting speech to token sequences and leveraging PLMs for improved accuracy with less labeled data.

Contribution

It introduces a new approach that converts speech to token sequences and utilizes PLMs for phrase break assessment, reducing reliance on labeled data and enhancing performance.

Findings

01

Performance improved with PLMs compared to traditional methods

02

Reduced dependence on labeled training data

03

Effective for both overall and fine-grained phrase break assessment

Abstract

This work introduces an approach to assessing phrase break in ESL learners' speech with pre-trained language models (PLMs). Different with traditional methods, this proposal converts speech to token sequences, and then leverages the power of PLMs. There are two sub-tasks: overall assessment of phrase break for a speech clip; fine-grained assessment of every possible phrase break position. Speech input is first force-aligned with texts, then pre-processed to a token sequence, including words and associated phrase break information. The token sequence is then fed into the pre-training and fine-tuning pipeline. In pre-training, a replaced break token detection module is trained with token data where each token has a certain percentage chance to be randomly replaced. In fine-tuning, overall and fine-grained scoring are optimized with text classification and sequence labeling pipeline,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems