Remix-cycle-consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation
Kohei Saijo, Tetsuji Ogawa

TL;DR
This paper introduces a remix-cycle-consistency loss to improve unsupervised speech separation, reducing residual noise and artifacts by fine-tuning adversarial models for more accurate and stable separation.
Contribution
It proposes a novel remix-cycle-consistency loss function to enhance adversarial speech separation models in an unsupervised setting.
Findings
Achieved high separation accuracy comparable to supervised methods.
Demonstrated improved learning stability in adversarial training.
Reduced residual noise and artifacts in separated speech signals.
Abstract
A new learning algorithm for speech separation networks is designed to explicitly reduce residual noise and artifacts in the separated signal in an unsupervised manner. Generative adversarial networks are known to be effective in constructing separation networks when the ground truth for the observed signal is inaccessible. Still, weak objectives aimed at distribution-to-distribution mapping make the learning unstable and limit their performance. This study introduces the remix-cycle-consistency loss as a more appropriate objective function and uses it to fine-tune adversarially learned source separation models. The remix-cycle-consistency loss is defined as the difference between the mixed speech observed at microphones and the pseudo-mixed speech obtained by alternating the process of separating the mixed sound and remixing its outputs with another combination. The minimization of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Ultrasonics and Acoustic Wave Propagation · Music and Audio Processing
