Gradient boundaries through confidence intervals for forced alignment estimates using model ensembles
Matthew C. Kelley

TL;DR
This paper introduces a method to produce gradient boundaries in forced alignment by deriving confidence intervals from neural network ensembles, providing a more realistic and uncertainty-aware boundary estimation.
Contribution
The method uses neural network ensembles to generate gradient boundaries with confidence intervals, improving boundary realism and uncertainty quantification in forced alignment.
Findings
Ensemble boundaries slightly outperform single-model boundaries on Buckeye and TIMIT.
Gradient boundaries offer a more realistic transition representation between segments.
Confidence intervals indicate model uncertainty, aiding boundary review processes.
Abstract
Forced alignment is a common tool to align audio with orthographic and phonetic transcriptions. Most forced alignment tools provide only point-estimates of boundaries. The present project introduces a method of producing gradient boundaries by deriving confidence intervals using neural network ensembles. Ten different segment classifier neural networks were previously trained, and the alignment process is repeated with each classifier. The ensemble is then used to place the point-estimate of a boundary at the median of the boundaries in the ensemble, and the gradient range is placed using a 97.85% confidence interval around the median constructed using order statistics. Gradient boundaries are taken here as a more realistic representation of how segments transition into each other. Moreover, the range indicates the model uncertainty in the boundary placement, facilitating tasks like…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
