Unsupervised Disentanglement of Content and Style via Variance-Invariance Constraints
Yuxuan Wu, Ziyu Wang, Bhiksha Raj, Gus Xia

TL;DR
This paper introduces V3, an unsupervised method that learns to disentangle content and style representations across various domains by leveraging statistical differences, achieving superior generalization and interpretability.
Contribution
The paper proposes a domain-general, unsupervised approach called V3 that effectively disentangles content and style without labels, applicable across multiple modalities.
Findings
V3 successfully disentangles content and style in music, images, and animations.
V3 outperforms existing unsupervised methods in disentanglement quality.
V3 exhibits strong out-of-distribution generalization and interpretability.
Abstract
We contribute an unsupervised method that effectively learns disentangled content and style representations from sequences of observations. Unlike most disentanglement algorithms that rely on domain-specific labels or knowledge, our method is based on the insight of domain-general statistical differences between content and style -- content varies more among different fragments within a sample but maintains an invariant vocabulary across data samples, whereas style remains relatively invariant within a sample but exhibits more significant variation across different samples. We integrate such inductive bias into an encoder-decoder architecture and name our method after V3 (variance-versus-invariance). Experimental results show that V3 generalizes across multiple domains and modalities, successfully learning disentangled content and style representations, such as pitch and timbre from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
