Deep neural network Based Low-latency Speech Separation with Asymmetric   analysis-Synthesis Window Pair

Shanshan Wang; Gaurav Naithani; Archontis Politis; Tuomas Virtanen

arXiv:2106.11794·eess.AS·June 23, 2021·1 cites

Deep neural network Based Low-latency Speech Separation with Asymmetric analysis-Synthesis Window Pair

Shanshan Wang, Gaurav Naithani, Archontis Politis, Tuomas Virtanen

PDF

Open Access

TL;DR

This paper introduces an asymmetric window pair for low-latency speech separation, improving performance while maintaining real-time processing capabilities for applications like hearing aids.

Contribution

It proposes using asymmetric analysis-synthesis windows in DNN-based speech separation to enhance frequency resolution without increasing latency.

Findings

01

Up to 1.5 dB SDR improvement achieved.

02

Maintains 8 ms algorithmic latency.

03

Effective across different model types and datasets.

Abstract

Time-frequency masking or spectrum prediction computed via short symmetric windows are commonly used in low-latency deep neural network (DNN) based source separation. In this paper, we propose the usage of an asymmetric analysis-synthesis window pair which allows for training with targets with better frequency resolution, while retaining the low-latency during inference suitable for real-time speech enhancement or assisted hearing applications. In order to assess our approach across various model types and datasets, we evaluate it with both speaker-independent deep clustering (DC) model and a speaker-dependent mask inference (MI) model. We report an improvement in separation performance of up to 1.5 dB in terms of source-to-distortion ratio (SDR) while maintaining an algorithmic latency of 8 ms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Ultrasonics and Acoustic Wave Propagation · Speech Recognition and Synthesis