Sequence Segmentation Using Joint RNN and Structured Prediction Models
Yossi Adi, Joseph Keshet, Emily Cibelli, Matthew Goldrick

TL;DR
This paper introduces a joint RNN and structured prediction model for sequence segmentation in speech processing, achieving state-of-the-art results on phonetic tasks by effectively combining neural features with structured learning.
Contribution
It presents a novel neural architecture that jointly trains RNNs and structured models for sequence segmentation, improving performance over previous methods.
Findings
Achieved state-of-the-art results on word segmentation datasets.
Demonstrated effectiveness in voice onset time segmentation.
Outperformed previous approaches in phonetic sequence tasks.
Abstract
We describe and analyze a simple and effective algorithm for sequence segmentation applied to speech processing tasks. We propose a neural architecture that is composed of two modules trained jointly: a recurrent neural network (RNN) module and a structured prediction model. The RNN outputs are considered as feature functions to the structured model. The overall model is trained with a structured loss function which can be designed to the given segmentation task. We demonstrate the effectiveness of our method by applying it to two simple tasks commonly used in phonetic studies: word segmentation and voice onset time segmentation. Results sug- gest the proposed model is superior to previous methods, ob- taining state-of-the-art results on the tested datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Neural Networks and Applications
