Transformer-Based Multi-Aspect Multi-Granularity Non-Native English   Speaker Pronunciation Assessment

Yuan Gong; Ziyi Chen; Iek-Heng Chu; Peng Chang; James Glass

arXiv:2205.03432·cs.SD·May 10, 2022

Transformer-Based Multi-Aspect Multi-Granularity Non-Native English Speaker Pronunciation Assessment

Yuan Gong, Ziyi Chen, Iek-Heng Chu, Peng Chang, James Glass

PDF

1 Repo

TL;DR

This paper introduces a Transformer-based model that assesses multiple aspects of non-native English pronunciation at various granularities, improving accuracy over previous single-aspect, single-granularity methods.

Contribution

It proposes a multi-task learning approach with a Goodness Of Pronunciation feature-based Transformer (GOPT) for comprehensive pronunciation assessment.

Findings

01

GOPT achieves state-of-the-art results on speechocean762

02

Multi-aspect, multi-granularity modeling improves assessment accuracy

03

Utilizes a public ASR acoustic model trained on Librispeech

Abstract

Automatic pronunciation assessment is an important technology to help self-directed language learners. While pronunciation quality has multiple aspects including accuracy, fluency, completeness, and prosody, previous efforts typically only model one aspect (e.g., accuracy) at one granularity (e.g., at the phoneme-level). In this work, we explore modeling multi-aspect pronunciation assessment at multiple granularities. Specifically, we train a Goodness Of Pronunciation feature-based Transformer (GOPT) with multi-task learning. Experiments show that GOPT achieves the best results on speechocean762 with a public automatic speech recognition (ASR) acoustic model trained on Librispeech.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YuanGongND/gopt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Absolute Position Encodings · Residual Connection · Position-Wise Feed-Forward Layer · Dense Connections · Dropout · Softmax