FasTUSS: Faster Task-Aware Unified Source Separation

Francesco Paissan; Gordon Wichern; Yoshiki Masuyama; Ryo Aihara; Fran\c{c}ois G. Germain; Kohei Saijo; Jonathan Le Roux

arXiv:2507.11435·cs.SD·July 16, 2025

FasTUSS: Faster Task-Aware Unified Source Separation

Francesco Paissan, Gordon Wichern, Yoshiki Masuyama, Ryo Aihara, Fran\c{c}ois G. Germain, Kohei Saijo, Jonathan Le Roux

PDF

Open Access

TL;DR

FasTUSS introduces optimized, faster versions of the TUSS model for audio source separation, significantly reducing computational complexity while maintaining near-original performance, and explores causal prompt conditioning.

Contribution

This paper presents FasTUSS, efficient variants of TUSS that drastically cut operations with minimal performance loss and investigates causal prompt conditioning for improved model flexibility.

Findings

01

FasTUSS-8.3G reduces operations by 81% with 1.2dB performance drop.

02

FasTUSS-11.7G reduces operations by 73% with 0.4dB performance drop.

03

Causal TUSS model is feasible with prompt conditioning.

Abstract

Time-Frequency (TF) dual-path models are currently among the best performing audio source separation network architectures, achieving state-of-the-art performance in speech enhancement, music source separation, and cinematic audio source separation. While they are characterized by a relatively low parameter count, they still require a considerable number of operations, implying a higher execution time. This problem is exacerbated by the trend towards bigger models trained on large amounts of data to solve more general tasks, such as the recently introduced task-aware unified source separation (TUSS) model. TUSS, which aims to solve audio source separation tasks using a single, conditional model, is built upon TF-Locoformer, a TF dual-path model combining convolution and attention layers. The task definition comes in the form of a sequence of prompts that specify the number and type of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Blind Source Separation Techniques