The 2015 Sheffield System for Transcription of Multi-Genre Broadcast   Media

Oscar Saz; Mortaza Doulaty; Salil Deena; Rosanna Milner; Raymond W.M.; Ng; Madina Hasan; Yulan Liu; Thomas Hain

arXiv:1512.06643·cs.CL·November 15, 2016

The 2015 Sheffield System for Transcription of Multi-Genre Broadcast Media

Oscar Saz, Mortaza Doulaty, Salil Deena, Rosanna Milner, Raymond W.M., Ng, Madina Hasan, Yulan Liu, Thomas Hain

PDF

TL;DR

This paper presents the University of Sheffield's multi-genre broadcast transcription system from 2015, combining multiple adaptation and segmentation techniques to improve speech recognition accuracy across diverse broadcast media.

Contribution

It introduces a multi-pass system with advanced adaptation and segmentation methods tailored for highly variable broadcast media environments.

Findings

01

Final error rate of 27.5% on development set

02

Effective data selection for unreliable training data

03

Improved segmentation and acoustic modeling techniques

Abstract

We describe the University of Sheffield system for participation in the 2015 Multi-Genre Broadcast (MGB) challenge task of transcribing multi-genre broadcast shows. Transcription was one of four tasks proposed in the MGB challenge, with the aim of advancing the state of the art of automatic speech recognition, speaker diarisation and automatic alignment of subtitles for broadcast media. Four topics are investigated in this work: Data selection techniques for training with unreliable data, automatic speech segmentation of broadcast media shows, acoustic modelling and adaptation in highly variable environments, and language modelling of multi-genre shows. The final system operates in multiple passes, using an initial unadapted decoding stage to refine segmentation, followed by three adapted passes: a hybrid DNN pass with input features normalised by speaker-based cepstral normalisation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.