DPDFNet: Boosting DeepFilterNet2 via Dual-Path RNN
Daniel Rika, Nino Sapir, Ido Gus

TL;DR
DPDFNet is a causal speech enhancement model that improves long-range and cross-band modeling using dual-path RNNs, with added loss components and fine-tuning, outperforming larger models on real-world, low-SNR, multi-language noise scenarios, and is feasible for edge deployment.
Contribution
We introduce DPDFNet, a novel causal speech enhancement architecture with dual-path blocks, enhanced loss functions, and a new evaluation set, demonstrating superior real-time performance on edge devices.
Findings
DPDFNet outperforms larger causal models on a new real-world evaluation set.
The PRISM metric correlates with model scalability and performance.
DPDFNet runs in real-time on edge NPUs, maintaining high quality.
Abstract
We present DPDFNet, a causal single-channel speech enhancement model that extends DeepFilterNet2 architecture with dual-path blocks in the encoder, strengthening long-range temporal and cross-band modeling while preserving the original enhancement framework. In addition, we demonstrate that adding a loss component to mitigate over-attenuation in the enhanced speech, combined with a fine-tuning phase tailored for "always-on" applications, leads to substantial improvements in overall model performance. To compare our proposed architecture with a variety of causal open-source models, we created a new evaluation set comprising long, low-SNR recordings in 12 languages across everyday noise scenarios, better reflecting real-world conditions than commonly used benchmarks. On this evaluation set, DPDFNet delivers superior performance to other causal open-source models, including some that are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Hearing Loss and Rehabilitation
