CycleFlow: Purify Information Factors by Cycle Loss
Haoran Sun, Chen Chen, Lantian Li, Dong Wang

TL;DR
CycleFlow enhances speech factor disentanglement by combining cycle loss and random substitution, leading to improved voice conversion and speech editing capabilities over previous IB-based models.
Contribution
It introduces a novel CycleFlow model that effectively reduces mutual information among factors, improving upon SpeechFlow for speech disentanglement and editing.
Findings
Better voice conversion performance than SpeechFlow
Effective reduction of mutual information among factors
Demonstrated utility in speech editing and emotion perception
Abstract
SpeechFlow is a powerful factorization model based on information bottleneck (IB), and its effectiveness has been reported by several studies. A potential problem of SpeechFlow, however, is that if the IB channels are not well designed, the resultant factors cannot be well disentangled. In this study, we propose a CycleFlow model that combines random factor substitution and cycle loss to solve this problem. Experiments on voice conversion tasks demonstrate that this simple technique can effectively reduce mutual information among individual factors, and produce clearly better conversion than the IB-based SpeechFlow. CycleFlow can also be used as a powerful tool for speech editing. We demonstrate this usage by an emotion perception experiment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Speech Recognition and Synthesis
