Smart Speech Segmentation using Acousto-Linguistic Features with   look-ahead

Piyush Behre; Naveen Parihar; Sharman Tan; Amy Shah; Eva Sharma,; Geoffrey Liu; Shuangyu Chang; Hosam Khalil; Chris Basoglu; Sayan Pathak

arXiv:2210.14446·cs.CL·October 28, 2022·6 cites

Smart Speech Segmentation using Acousto-Linguistic Features with look-ahead

Piyush Behre, Naveen Parihar, Sharman Tan, Amy Shah, Eva Sharma,, Geoffrey Liu, Shuangyu Chang, Hosam Khalil, Chris Basoglu, Sayan Pathak

PDF

Open Access

TL;DR

This paper introduces a hybrid speech segmentation method combining acoustic and linguistic features with look-ahead, significantly enhancing segmentation accuracy and downstream translation quality across multiple languages.

Contribution

It presents a novel hybrid segmentation approach that incorporates language understanding and look-ahead, outperforming traditional acoustic-only methods.

Findings

01

Segmentation-F0.5 score improved by 9.8% on average

02

BLEU score for machine translation increased by 1.05 points

03

Effective across multiple languages

Abstract

Segmentation for continuous Automatic Speech Recognition (ASR) has traditionally used silence timeouts or voice activity detectors (VADs), which are both limited to acoustic features. This segmentation is often overly aggressive, given that people naturally pause to think as they speak. Consequently, segmentation happens mid-sentence, hindering both punctuation and downstream tasks like machine translation for which high-quality segmentation is critical. Model-based segmentation methods that leverage acoustic features are powerful, but without an understanding of the language itself, these approaches are limited. We present a hybrid approach that leverages both acoustic and language information to improve segmentation. Furthermore, we show that including one word as a look-ahead boosts segmentation quality. On average, our models improve segmentation-F0.5 score by 9.8% over baseline. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing