Towards Robust Unsupervised Disentanglement of Sequential Data -- A Case   Study Using Music Audio

Yin-Jyun Luo; Sebastian Ewert; Simon Dixon

arXiv:2205.05871·cs.SD·June 16, 2022

Towards Robust Unsupervised Disentanglement of Sequential Data -- A Case Study Using Music Audio

Yin-Jyun Luo, Sebastian Ewert, Simon Dixon

PDF

Open Access 1 Repo

TL;DR

This paper introduces TS-DSAE, a two-stage training framework for unsupervised disentanglement of sequential data, specifically music audio, which is robust against model sensitivity and static variable collapse.

Contribution

The paper proposes TS-DSAE, a novel two-stage training method that improves robustness and avoids complex adversarial training for disentangling sequential data.

Findings

01

TS-DSAE effectively prevents static variable collapse.

02

The framework achieves robust disentanglement on music audio datasets.

03

It outperforms vanilla DSAE in various configurations.

Abstract

Disentangled sequential autoencoders (DSAEs) represent a class of probabilistic graphical models that describes an observed sequence with dynamic latent variables and a static latent variable. The former encode information at a frame rate identical to the observation, while the latter globally governs the entire sequence. This introduces an inductive bias and facilitates unsupervised disentanglement of the underlying local and global factors. In this paper, we show that the vanilla DSAE suffers from being sensitive to the choice of model architecture and capacity of the dynamic latent variables, and is prone to collapse the static latent variable. As a countermeasure, we propose TS-DSAE, a two-stage training framework that first learns sequence-level prior distributions, which are subsequently employed to regularise the model and facilitate auxiliary objectives to promote…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yjlolo/dseq-vae
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Anomaly Detection Techniques and Applications