Complexity Scaling for Speech Denoising
Hangting Chen, Jianwei Yu, Chao Weng

TL;DR
This paper introduces a unified Multi-Path Transform architecture for speech denoising that scales across various computational complexities, demonstrating a predictable relationship between model size and performance.
Contribution
The study proposes a novel scalable architecture for speech denoising and explores the empirical relationship between model complexity and performance, unifying models across different complexity levels.
Findings
High-performance models across a wide complexity range
Linear increase in PESQ-WB and SI-SNR with log of MACs
Unified architecture simplifies deployment for diverse devices
Abstract
Computational complexity is critical when deploying deep learning-based speech denoising models for on-device applications. Most prior research focused on optimizing model architectures to meet specific computational cost constraints, often creating distinct neural network architectures for different complexity limitations. This study conducts complexity scaling for speech denoising tasks, aiming to consolidate models with various complexities into a unified architecture. We present a Multi-Path Transform-based (MPT) architecture to handle both low- and high-complexity scenarios. A series of MPT networks present high performance covering a wide range of computational complexities on the DNS challenge dataset. Moreover, inspired by the scaling experiments in natural language processing, we explore the empirical relationship between model performance and computational cost on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
