Trainable Adaptive Window Switching for Speech Enhancement
Yuma Koizumi, Noboru Harada, Yoichi Haneda

TL;DR
This paper introduces a trainable adaptive window switching method integrated with DNNs for speech enhancement, optimizing time-frequency resolution dynamically to improve signal quality over fixed methods.
Contribution
It presents a novel trainable AWS technique that adjusts window lengths in real-time within a DNN framework for enhanced speech signal recovery.
Findings
Achieved higher signal-to-distortion ratio than fixed-resolution methods.
Demonstrated improved speech quality in DNN-based enhancement.
Validated effectiveness in real-world noisy environments.
Abstract
This study proposes a trainable adaptive window switching (AWS) method and apply it to a deep-neural-network (DNN) for speech enhancement in the modified discrete cosine transform domain. Time-frequency (T-F) mask processing in the short-time Fourier transform (STFT)-domain is a typical speech enhancement method. To recover the target signal precisely, DNN-based short-time frequency transforms have recently been investigated and used instead of the STFT. However, since such a fixed-resolution short-time frequency transform method has a T-F resolution problem based on the uncertainty principle, not only the short-time frequency transform but also the length of the windowing function should be optimized. To overcome this problem, we incorporate AWS into the speech enhancement procedure, and the windowing function of each time-frame is manipulated using a DNN depending on the input signal.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Hearing Loss and Rehabilitation
