IMSE: Efficient U-Net-based Speech Enhancement using Inception Depthwise Convolution and Amplitude-Aware Linear Attention

Xinxin Tang; Bin Qin; Yufang Li

arXiv:2511.14515·cs.SD·December 2, 2025

IMSE: Efficient U-Net-based Speech Enhancement using Inception Depthwise Convolution and Amplitude-Aware Linear Attention

Xinxin Tang, Bin Qin, Yufang Li

PDF

Open Access

TL;DR

IMSE introduces a lightweight speech enhancement model that replaces complex modules with efficient attention and convolution techniques, significantly reducing parameters while maintaining high performance on benchmark datasets.

Contribution

The paper proposes IMSE, a novel lightweight speech enhancement network using Amplitude-Aware Linear Attention and Inception Depthwise Convolution, improving efficiency over previous models.

Findings

01

Parameter reduction by 16.8% compared to MUSE

02

Achieves competitive PESQ score of 3.373

03

Sets new benchmark for size-performance trade-off

Abstract

Achieving a balance between lightweight design and high performance remains a significant challenge for speech enhancement (SE) tasks on resource-constrained devices. Existing state-of-the-art methods, such as MUSE, have established a strong baseline with only 0.51M parameters by introducing a Multi-path Enhanced Taylor (MET) transformer and Deformable Embedding (DE). However, an in-depth analysis reveals that MUSE still suffers from efficiency bottlenecks: the MET module relies on a complex "approximate-compensate" mechanism to mitigate the limitations of Taylor-expansion-based attention, while the offset calculation for deformable embedding introduces additional computational burden. This paper proposes IMSE, a systematically optimized and ultra-lightweight network. We introduce two core innovations: 1) Replacing the MET module with Amplitude-Aware Linear Attention (MALA). MALA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques