Speech Segmentation Optimization using Segmented Bilingual Speech Corpus   for End-to-end Speech Translation

Ryo Fukuda; Katsuhito Sudoh; Satoshi Nakamura

arXiv:2203.15479·cs.CL·July 14, 2022

Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation

Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel speech segmentation method using a binary classifier trained on bilingual speech data, improving segmentation accuracy for speech translation systems, especially when combined with traditional VAD techniques.

Contribution

The study presents a new segmentation approach using a binary classification model trained on bilingual speech, enhancing translation performance over conventional pause-based methods.

Findings

01

Proposed method outperforms traditional VAD in segmentation accuracy.

02

Hybrid VAD and classification approach further improves translation results.

03

Method is effective for both cascade and end-to-end speech translation systems.

Abstract

Speech segmentation, which splits long speech into short segments, is essential for speech translation (ST). Popular VAD tools like WebRTC VAD have generally relied on pause-based segmentation. Unfortunately, pauses in speech do not necessarily match sentence boundaries, and sentences can be connected by a very short pause that is difficult to detect by VAD. In this study, we propose a speech segmentation method using a binary classification model trained using a segmented bilingual speech corpus. We also propose a hybrid method that combines VAD and the above speech segmentation method. Experimental results revealed that the proposed method is more suitable for cascade and end-to-end ST systems than conventional segmentation methods. The hybrid approach further improved the translation performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wiseman/py-webrtcvad
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Speech and dialogue systems