Acoustic Feature Learning via Deep Variational Canonical Correlation Analysis
Qingming Tang, Weiran Wang, Karen Livescu

TL;DR
This paper introduces deep variational canonical correlation analysis (VCCA) for acoustic feature learning using multi-view data, enhancing phonetic recognition performance with novel extensions.
Contribution
The paper proposes VCCA with improved priors and adversarial training, offering an end-to-end trainable deep generative approach for multi-view acoustic feature learning.
Findings
VCCA outperforms previous methods on phonetic recognition
Extensions with priors and adversarial learning improve results
Efficient end-to-end training demonstrated
Abstract
We study the problem of acoustic feature learning in the setting where we have access to another (non-acoustic) modality for feature learning but not at test time. We use deep variational canonical correlation analysis (VCCA), a recently proposed deep generative method for multi-view representation learning. We also extend VCCA with improved latent variable priors and with adversarial learning. Compared to other techniques for multi-view feature learning, VCCA's advantages include an intuitive latent variable interpretation and a variational lower bound objective that can be trained end-to-end efficiently. We compare VCCA and its extensions with previous feature learning methods on the University of Wisconsin X-ray Microbeam Database, and show that VCCA-based feature learning improves over previous methods for speaker-independent phonetic recognition.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing
