GPU-accelerated Guided Source Separation for Meeting Transcription

Desh Raj; Daniel Povey; Sanjeev Khudanpur

arXiv:2212.05271·eess.AS·August 15, 2023·5 cites

GPU-accelerated Guided Source Separation for Meeting Transcription

Desh Raj, Daniel Povey, Sanjeev Khudanpur

PDF

Open Access 2 Repos

TL;DR

This paper presents a GPU-accelerated implementation of Guided Source Separation (GSS) that significantly speeds up processing, enabling detailed analysis and improved meeting transcription performance on standard benchmarks.

Contribution

The paper introduces a GPU-based GSS implementation that achieves 300x faster inference, facilitating extensive ablation studies and practical meeting transcription applications.

Findings

01

300x speed-up over CPU-based GSS

02

Enables detailed parameter ablation studies

03

Provides reproducible pipelines for meeting benchmarks

Abstract

Guided source separation (GSS) is a type of target-speaker extraction method that relies on pre-computed speaker activities and blind source separation to perform front-end enhancement of overlapped speech signals. It was first proposed during the CHiME-5 challenge and provided significant improvements over the delay-and-sum beamforming baseline. Despite its strengths, however, the method has seen limited adoption for meeting transcription benchmarks primarily due to its high computation time. In this paper, we describe our improved implementation of GSS that leverages the power of modern GPU-based pipelines, including batched processing of frequencies and segments, to provide 300x speed-up over CPU-based inference. The improved inference time allows us to perform detailed ablation studies over several parameters of the GSS algorithm -- such as context duration, number of channels, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing