Detection of transitions between broad phonetic classes in a speech signal
T V Ananthapadmanabha, K V Vijay Girish, A G Ramakrishnan

TL;DR
This paper presents a hierarchical method for detecting phonetic class transitions in speech signals, achieving high accuracy and comparable or better results than existing methods on the TIMIT database.
Contribution
A novel hierarchical approach for detecting broad phonetic class transitions in speech signals with high accuracy and robustness.
Findings
93.6% transition detection within 20 ms tolerance
83.5% accuracy in class onset detection
Performance comparable or superior to state-of-the-art methods
Abstract
Detection of transitions between broad phonetic classes in a speech signal is an important problem which has applications such as landmark detection and segmentation. The proposed hierarchical method detects silence to non-silence transitions, high amplitude (mostly sonorants) to low ampli- tude (mostly fricatives/affricates/stop bursts) transitions and vice-versa. A subset of the extremum (minimum or maximum) samples between every pair of successive zero-crossings is selected above a second pass threshold, from each bandpass filtered speech signal frame. Relative to the mid-point (reference) of a frame, locations of the first and the last extrema lie on either side, if the speech signal belongs to a homogeneous segment; else, both these locations lie on the left or the right side of the reference, indicating a transition frame. When tested on the entire TIMIT database, of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Phonetics and Phonology Research · Speech Recognition and Synthesis
