Hierarchical Pronunciation Assessment with Multi-Aspect Attention

Heejin Do; Yunsu Kim; Gary Geunbae Lee

arXiv:2211.08102·cs.CL·May 29, 2023·1 cites

Hierarchical Pronunciation Assessment with Multi-Aspect Attention

Heejin Do, Yunsu Kim, Gary Geunbae Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces HiPAMA, a hierarchical model with multi-aspect attention for detailed pronunciation assessment across phoneme, word, and utterance levels, improving accuracy especially in challenging aspects.

Contribution

The paper presents a novel hierarchical assessment model that captures linguistic structures and cross-aspect relations, enhancing multi-granularity pronunciation evaluation.

Findings

01

Significant improvements on speachocean762 dataset.

02

Enhanced assessment accuracy for difficult aspects.

03

Robustness demonstrated across multiple evaluation metrics.

Abstract

Automatic pronunciation assessment is a major component of a computer-assisted pronunciation training system. To provide in-depth feedback, scoring pronunciation at various levels of granularity such as phoneme, word, and utterance, with diverse aspects such as accuracy, fluency, and completeness, is essential. However, existing multi-aspect multi-granularity methods simultaneously predict all aspects at all granularity levels; therefore, they have difficulty in capturing the linguistic hierarchy of phoneme, word, and utterance. This limitation further leads to neglecting intimate cross-aspect relations at the same linguistic unit. In this paper, we propose a Hierarchical Pronunciation Assessment with Multi-aspect Attention (HiPAMA) model, which hierarchically represents the granularity levels to directly capture their linguistic structures and introduces multi-aspect attention that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

doheejin/HiPAMA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling