Sequence Segmentation Using Joint RNN and Structured Prediction Models

Yossi Adi; Joseph Keshet; Emily Cibelli; Matthew Goldrick

arXiv:1610.07918·cs.CL·October 26, 2016

Sequence Segmentation Using Joint RNN and Structured Prediction Models

Yossi Adi, Joseph Keshet, Emily Cibelli, Matthew Goldrick

PDF

Open Access

TL;DR

This paper introduces a joint RNN and structured prediction model for sequence segmentation in speech processing, achieving state-of-the-art results on phonetic tasks by effectively combining neural features with structured learning.

Contribution

It presents a novel neural architecture that jointly trains RNNs and structured models for sequence segmentation, improving performance over previous methods.

Findings

01

Achieved state-of-the-art results on word segmentation datasets.

02

Demonstrated effectiveness in voice onset time segmentation.

03

Outperformed previous approaches in phonetic sequence tasks.

Abstract

We describe and analyze a simple and effective algorithm for sequence segmentation applied to speech processing tasks. We propose a neural architecture that is composed of two modules trained jointly: a recurrent neural network (RNN) module and a structured prediction model. The RNN outputs are considered as feature functions to the structured model. The overall model is trained with a structured loss function which can be designed to the given segmentation task. We demonstrate the effectiveness of our method by applying it to two simple tasks commonly used in phonetic studies: word segmentation and voice onset time segmentation. Results sug- gest the proposed model is superior to previous methods, ob- taining state-of-the-art results on the tested datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Neural Networks and Applications