Acoustic Feature Learning via Deep Variational Canonical Correlation   Analysis

Qingming Tang; Weiran Wang; Karen Livescu

arXiv:1708.04673·cs.CV·September 1, 2017·5 cites

Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis

Qingming Tang, Weiran Wang, Karen Livescu

PDF

Open Access

TL;DR

This paper introduces deep variational canonical correlation analysis (VCCA) for acoustic feature learning using multi-view data, enhancing phonetic recognition performance with novel extensions.

Contribution

The paper proposes VCCA with improved priors and adversarial training, offering an end-to-end trainable deep generative approach for multi-view acoustic feature learning.

Findings

01

VCCA outperforms previous methods on phonetic recognition

02

Extensions with priors and adversarial learning improve results

03

Efficient end-to-end training demonstrated

Abstract

We study the problem of acoustic feature learning in the setting where we have access to another (non-acoustic) modality for feature learning but not at test time. We use deep variational canonical correlation analysis (VCCA), a recently proposed deep generative method for multi-view representation learning. We also extend VCCA with improved latent variable priors and with adversarial learning. Compared to other techniques for multi-view feature learning, VCCA's advantages include an intuitive latent variable interpretation and a variational lower bound objective that can be trained end-to-end efficiently. We compare VCCA and its extensions with previous feature learning methods on the University of Wisconsin X-ray Microbeam Database, and show that VCCA-based feature learning improves over previous methods for speaker-independent phonetic recognition.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing