A Survey of Deep Learning for Complex Speech Spectrograms
Yuying Xie, Zheng-Hua Tan

TL;DR
This survey reviews recent deep learning techniques for processing complex speech spectrograms, highlighting neural network architectures, training methods, and applications like speech enhancement and phase retrieval.
Contribution
It provides a comprehensive overview of complex-valued neural networks and their applications in speech processing, emphasizing recent advancements and architectural designs.
Findings
Deep learning has advanced speech spectrogram analysis.
Complex-valued neural networks are effective for phase-sensitive tasks.
Applications include speech enhancement and speaker separation.
Abstract
Recent advancements in deep learning have significantly impacted the field of speech signal processing, particularly in the analysis and manipulation of complex spectrograms. This survey provides a comprehensive overview of the state-of-the-art techniques leveraging deep neural networks for processing complex spectrograms, which encapsulate both magnitude and phase information. We begin by introducing complex spectrograms and their associated features for various speech processing tasks. Next, we examine the key components and architectures of complex-valued neural networks, which are specifically designed to handle complex-valued data and have been applied to complex spectrogram processing. As recent studies have primarily focused on applying real-valued neural networks to complex spectrograms, we revisit these approaches and their architectural designs. We then discuss various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
