The Marchex 2018 English Conversational Telephone Speech Recognition   System

Seongjun Hahm; Iroro Orife; Shane Walker; Jason Flaks

arXiv:1811.02058·cs.CL·May 3, 2019·1 cites

The Marchex 2018 English Conversational Telephone Speech Recognition System

Seongjun Hahm, Iroro Orife, Shane Walker, Jason Flaks

PDF

Open Access

TL;DR

This paper details enhancements to the Marchex English speech recognition system using semi-supervised LF-MMI training, resulting in significant WER reduction and faster decoding, improving overall conversational telephone speech recognition performance.

Contribution

The paper introduces the application of semi-supervised LF-MMI training to improve a production conversational speech recognition system, achieving lower error rates and faster decoding.

Findings

01

3.3% absolute WER reduction on Marchex English dataset

02

3x faster decoding speed compared to previous system

03

Improved performance for natural language processing pipeline

Abstract

In this paper, we describe recent performance improvements to the production Marchex speech recognition system for our spontaneous customer-to-business telephone conversations. In our previous work, we focused on in-domain language and acoustic model training. In this work we employ state-of-the-art semi-supervised lattice-free maximum mutual information (LF-MMI) training process which can supervise over full lattices from unlabeled audio. On Marchex English (ME), a modern evaluation set of conversational North American English, we observed a 3.3% (3.2% for agent, 3.6% for caller) reduction in absolute word error rate (WER) with 3x faster decoding speed over the performance of the 2017 production system. We expect this improvement boost Marchex Call Analytics system performance especially for natural language processing pipeline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide) · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings