UX-NET: Filter-and-Process-based Improved U-Net for Real-time Time-domain Audio Separation
Kashyap Patel, Anton Kovalyov, Issa Panahi

TL;DR
UX-Net is a real-time, efficient time-domain audio separation network inspired by human auditory processing, which outperforms state-of-the-art models in speech separation tasks with fewer resources.
Contribution
The paper introduces UX-Net, a novel filter-and-process-based U-Net variant that improves efficiency and performance in real-time audio separation.
Findings
Outperforms Conv-TasNet by 0.85 dB SI-SNR on WSJ0-2mix
Uses only 16% of the model parameters of state-of-the-art models
Achieves 58% fewer computations with low latency
Abstract
This study presents UX-Net, a time-domain audio separation network (TasNet) based on a modified U-Net architecture. The proposed UX-Net works in real-time and handles either single or multi-microphone input. Inspired by the filter-and-process-based human auditory behavior, the proposed system introduces novel mixer and separation modules, which result in cost and memory efficient modeling of speech sources. The mixer module combines encoded input in a latent feature space and outputs a desired number of output streams. Then, in the separation module, a modified U-Net (UX) block is applied. The UX block first filters the encoded input at various resolutions followed by aggregating the filtered information and applying recurrent processing to estimate masks of separated sources. The letter 'X' in UX-Net is a name placeholder for the type of recurrent layer employed in the UX block.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Concatenated Skip Connection · Convolution · U-Net
