Align-ULCNet: Towards Low-Complexity and Robust Acoustic Echo and Noise Reduction

Shrishti Saha Shetu; Naveen Kumar Desiraju; Wolfgang Mack; Emanu\"el A. P. Habets

arXiv:2410.13620·eess.AS·August 5, 2025

Align-ULCNet: Towards Low-Complexity and Robust Acoustic Echo and Noise Reduction

Shrishti Saha Shetu, Naveen Kumar Desiraju, Wolfgang Mack, Emanu\"el A. P. Habets

PDF

Open Access

TL;DR

This paper introduces Align-ULCNet, a low-complexity, robust acoustic echo and noise reduction model that improves upon state-of-the-art methods by integrating time alignment, parallel encoders, and a channel-wise sampling feature reorientation.

Contribution

It presents a hybrid model that enhances ULCNet with novel input alignment and feature reorientation techniques for better robustness and efficiency.

Findings

01

Improved echo reduction performance.

02

Maintains low computational complexity.

03

Robust across challenging scenarios.

Abstract

The successful deployment of deep learning-based acoustic echo and noise reduction (AENR) methods in consumer devices has spurred interest in developing low-complexity solutions, while emphasizing the need for robust performance in real-life applications. In this work, we propose a hybrid approach to enhance the state-of-the-art (SOTA) ULCNet model by integrating time alignment and parallel encoder blocks for the model inputs, resulting in better echo reduction and comparable noise reduction performance to existing SOTA methods. We also propose a channel-wise sampling-based feature reorientation method, ensuring robust performance across many challenging scenarios, while maintaining overall low computational and memory requirements.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis