VorTEX: Various overlap ratio for Target speech EXtraction

Ro-hoon Oh; Jihwan Seol; Bugeun Kim

arXiv:2603.14803·cs.SD·March 24, 2026

VorTEX: Various overlap ratio for Target speech EXtraction

Ro-hoon Oh, Jihwan Seol, Bugeun Kim

PDF

Open Access

TL;DR

VorTEX introduces a novel target speech extraction model with a decoupled architecture and a new dataset, enabling detailed analysis of overlap ratios, and demonstrates superior performance and robustness across various overlap conditions.

Contribution

The paper presents VorTEX, a new text-prompted TSE architecture with a decoupled fusion block and a dataset for controlled overlap analysis, along with a diagnostic metric SuRE.

Findings

01

VorTEX achieves high separation fidelity across 20-100% overlap.

02

Existing models show suppression or residual interference under overlap.

03

VorTEX maintains zero SuRE, indicating robust extraction without artifacts.

Abstract

Target speech extraction (TSE) aims to recover a target speaker's voice from a mixture. While recent text-prompted approaches have shown promise, most approaches assume fully overlapped mixtures, limiting insight into behavior across realistic overlap ratios. We introduce VorTEX (Various overlap ratio for Target speech EXtraction), a text-prompted TSE architecture with a Decoupled Adaptive Multi-branch (DAM) Fusion block that separates primary extraction from auxiliary regularization pathways. To enable controlled analysis, we construct PORTE, a two-speaker dataset spanning overlap ratios from 0% to 100%. We further propose Suppression Ratio on Energy (SuRE), a diagnostic metric that detects suppression behavior not captured by conventional measures. Experiments show that existing models exhibit suppression or residual interference under overlap, whereas VorTEX achieves the highest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders