Confidence-based Filtering for Speech Dataset Curation with Generative Speech Enhancement Using Discrete Tokens
Kazuki Yamauchi, Masato Murata, Shogo Seki

TL;DR
This paper introduces a confidence-based filtering method using token log-probabilities to detect hallucination errors in generative speech enhancement models, improving dataset quality for TTS applications.
Contribution
The paper presents a novel non-intrusive filtering approach that leverages token confidence scores to identify hallucination errors in GSE models, outperforming traditional quality metrics.
Findings
Confidence scores correlate with intrusive speech quality metrics.
The method detects errors missed by conventional filtering.
Filtering improves TTS model performance.
Abstract
Generative speech enhancement (GSE) models show great promise in producing high-quality clean speech from noisy inputs, enabling applications such as curating noisy text-to-speech (TTS) datasets into high-quality ones. However, GSE models are prone to hallucination errors, such as phoneme omissions and speaker inconsistency, which conventional error filtering based on non-intrusive speech quality metrics often fails to detect. To address this issue, we propose a non-intrusive method for filtering hallucination errors from discrete token-based GSE models. Our method leverages the log-probabilities of generated tokens as confidence scores to detect potential errors. Experimental results show that the confidence scores strongly correlate with a suite of intrusive SE metrics, and that our method effectively identifies hallucination errors missed by conventional filtering methods.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Face recognition and analysis
