Landmark-based consonant voicing detection on multilingual corpora
Xiang Kong, Xuesong Yang, Mark Hasegawa-Johnson, Jeung-Yoon Choi,, Stefanie Shattuck-Hufnagel

TL;DR
This study evaluates phonetic landmark-based classifiers for consonant voicing detection across multiple languages, showing CNN features generalize best with minimal accuracy loss.
Contribution
It introduces and compares three classifiers, demonstrating that CNN and landmark-based features outperform traditional MFCCs in cross-lingual transfer.
Findings
CNN features outperform all other classifiers.
Manual features have minimal accuracy loss across languages.
MFCC classifiers suffer significant performance decline when generalized.
Abstract
This paper tests the hypothesis that distinctive feature classifiers anchored at phonetic landmarks can be transferred cross-lingually without loss of accuracy. Three consonant voicing classifiers were developed: (1) manually selected acoustic features anchored at a phonetic landmark, (2) MFCCs (either averaged across the segment or anchored at the landmark), and(3) acoustic features computed using a convolutional neural network (CNN). All detectors are trained on English data (TIMIT),and tested on English, Turkish, and Spanish (performance measured using F1 and accuracy). Experiments demonstrate that manual features outperform all MFCC classifiers, while CNNfeatures outperform both. MFCC-based classifiers suffer an F1reduction of 16% absolute when generalized from English to other languages. Manual features suffer only a 5% F1 reduction,and CNN features actually perform better in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
