Joint Speech Activity and Overlap Detection with Multi-Exit Architecture

Ziqing Du; Kai Liu; Xucheng Wan; Huan Zhou

arXiv:2209.11906·cs.SD·September 27, 2022

Joint Speech Activity and Overlap Detection with Multi-Exit Architecture

Ziqing Du, Kai Liu, Xucheng Wan, Huan Zhou

PDF

Open Access

TL;DR

This paper introduces a multi-exit neural network architecture for joint speech activity and overlap detection, achieving state-of-the-art results and offering efficient deployment options.

Contribution

It proposes a novel multi-exit architecture with training schemes like knowledge distillation and dense connection for improved joint VAD and OSD performance.

Findings

01

Outperforms existing models on AMI and DIHARD-III datasets.

02

Achieves high F1 scores of 0.792 and 0.625 respectively.

03

Offers a flexible system for quality and complexity trade-offs.

Abstract

Overlapped speech detection (OSD) is critical for speech applications in scenario of multi-party conversion. Despite numerous research efforts and progresses, comparing with speech activity detection (VAD), OSD remains an open challenge and its overall performance is far from satisfactory. The majority of prior research typically formulates the OSD problem as a standard classification problem, to identify speech with binary (OSD) or three-class label (joint VAD and OSD) at frame level. In contrast to the mainstream, this study investigates the joint VAD and OSD task from a new perspective. In particular, we propose to extend traditional classification network with multi-exit architecture. Such an architecture empowers our system with unique capability to identify class using either low-level features from early exits or high-level features from last exit. In addition, two training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems

MethodsKnowledge Distillation