The Marchex 2018 English Conversational Telephone Speech Recognition System
Seongjun Hahm, Iroro Orife, Shane Walker, Jason Flaks

TL;DR
This paper details enhancements to the Marchex English speech recognition system using semi-supervised LF-MMI training, resulting in significant WER reduction and faster decoding, improving overall conversational telephone speech recognition performance.
Contribution
The paper introduces the application of semi-supervised LF-MMI training to improve a production conversational speech recognition system, achieving lower error rates and faster decoding.
Findings
3.3% absolute WER reduction on Marchex English dataset
3x faster decoding speed compared to previous system
Improved performance for natural language processing pipeline
Abstract
In this paper, we describe recent performance improvements to the production Marchex speech recognition system for our spontaneous customer-to-business telephone conversations. In our previous work, we focused on in-domain language and acoustic model training. In this work we employ state-of-the-art semi-supervised lattice-free maximum mutual information (LF-MMI) training process which can supervise over full lattices from unlabeled audio. On Marchex English (ME), a modern evaluation set of conversational North American English, we observed a 3.3% (3.2% for agent, 3.6% for caller) reduction in absolute word error rate (WER) with 3x faster decoding speed over the performance of the 2017 production system. We expect this improvement boost Marchex Call Analytics system performance especially for natural language processing pipeline.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide) · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
