Unsupervised Learning of Disentangled and Interpretable Representations   from Sequential Data

Wei-Ning Hsu; Yu Zhang; and James Glass

arXiv:1709.07902·cs.LG·September 26, 2017·150 cites

Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data

Wei-Ning Hsu, Yu Zhang, and James Glass

PDF

Open Access 3 Repos

TL;DR

This paper introduces a hierarchical variational autoencoder that learns disentangled, interpretable representations from sequential data, enabling transformations like speaker or content changes and improving speech recognition and verification.

Contribution

It proposes a novel factorized hierarchical model that captures multi-scale information in sequential data without supervision, outperforming baselines in speech tasks.

Findings

01

Outperforms i-vector baseline in speaker verification

02

Reduces word error rate by up to 35% in speech recognition

03

Enables meaningful manipulation of latent variables for speaker and content changes

Abstract

We present a factorized hierarchical variational autoencoder, which learns disentangled and interpretable representations from sequential data without supervision. Specifically, we exploit the multi-scale nature of information in sequential data by formulating it explicitly within a factorized hierarchical graphical model that imposes sequence-dependent priors and sequence-independent priors to different sets of latent variables. The model is evaluated on two speech corpora to demonstrate, qualitatively, its ability to transform speakers or linguistic content by manipulating different sets of latent variables; and quantitatively, its ability to outperform an i-vector baseline for speaker verification and reduce the word error rate by as much as 35% in mismatched train/test scenarios for automatic speech recognition tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques