Large-Scale Learning on Overlapped Speech Detection: New Benchmark and   New General System

Zhaohui Yin; Jingguang Tian; Xinhui Hu; Xinkang Xu; Yang Xiang

arXiv:2308.05987·cs.SD·September 8, 2023·1 cites

Large-Scale Learning on Overlapped Speech Detection: New Benchmark and New General System

Zhaohui Yin, Jingguang Tian, Xinhui Hu, Xinkang Xu, Yang Xiang

PDF

Open Access

TL;DR

This paper introduces a large-scale learning approach for overlapped speech detection, proposing a new benchmark and a general system that significantly improves accuracy and robustness in diverse acoustic environments.

Contribution

It presents a new large-scale benchmark dataset and a general OSD system based on Conformer networks with large-scale learning, advancing robustness and accuracy.

Findings

01

LSL significantly improves OSD accuracy and robustness.

02

CF-OSD outperforms existing systems on the new benchmark.

03

Achieves state-of-the-art results on small dataset benchmarks.

Abstract

Overlapped Speech Detection (OSD) is an important part of speech applications involving analysis of multi-party conversations. However, most of existing OSD systems are trained and evaluated on small datasets with limited application domains, which led to the robustness of them lacks benchmark for evaluation and the accuracy of them remains inadequate in realistic acoustic environments. To solve these problem, we conduct a study of large-scale learning (LSL) in OSD tasks and propose a new general OSD system named CF-OSD with LSL based on Conformer network and LSL. In our study, a large-scale test set consisting of 151h labeled speech of different styles, languages and sound-source distances is produced and used as a new benchmark for evaluating the generality of OSD systems. Rigorous comparative experiments are designed and used to evaluate the effectiveness of LSL in OSD tasks and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing