Towards trustworthy phoneme boundary detection with autoregressive model   and improved evaluation metric

Hyeongju Kim; Hyeong-Seok Choi

arXiv:2212.06387·cs.SD·December 14, 2022

Towards trustworthy phoneme boundary detection with autoregressive model and improved evaluation metric

Hyeongju Kim, Hyeong-Seok Choi

PDF

Open Access

TL;DR

This paper introduces SuperSeg, an autoregressive model for phoneme boundary detection, and proposes improved evaluation metrics to more accurately assess boundary detection performance.

Contribution

It presents a novel autoregressive boundary detector and new evaluation metrics that address limitations of existing measures, improving reliability in phoneme boundary detection assessment.

Findings

01

SuperSeg outperforms existing models on TIMIT and Buckeye datasets.

02

New metrics prevent multiple boundary contributions, offering more reliable evaluation.

03

Autoregressive approach enhances phoneme boundary detection accuracy.

Abstract

Phoneme boundary detection has been studied due to its central role in various speech applications. In this work, we point out that this task needs to be addressed not only by algorithmic way, but also by evaluation metric. To this end, we first propose a state-of-the-art phoneme boundary detector that operates in an autoregressive manner, dubbed SuperSeg. Experiments on the TIMIT and Buckeye corpora demonstrates that SuperSeg identifies phoneme boundaries with significant margin compared to existing models. Furthermore, we note that there is a limitation on the popular evaluation metric, R-value, and propose new evaluation metrics that prevent each boundary from contributing to evaluation multiple times. The proposed metrics reveal the weaknesses of non-autoregressive baselines and establishes a reliable criterion that suits for evaluating phoneme boundary detection.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing