Deep Within-Class Covariance Analysis for Robust Audio Representation   Learning

Hamid Eghbal-zadeh; Matthias Dorfer; Gerhard Widmer

arXiv:1711.04022·cs.LG·December 3, 2018·1 cites

Deep Within-Class Covariance Analysis for Robust Audio Representation Learning

Hamid Eghbal-zadeh, Matthias Dorfer, Gerhard Widmer

PDF

Open Access

TL;DR

This paper introduces DWCCA, a neural network layer that reduces within-class covariance in CNN representations, leading to improved robustness and accuracy in audio classification under distribution shifts.

Contribution

The paper proposes DWCCA, a novel deep layer that minimizes within-class covariance, enhancing CNN robustness to distribution shifts in audio data.

Findings

01

DWCCA reduces within-class covariance in CNN representations.

02

Applying DWCCA improves classification accuracy on shifted test data.

03

Embedding variance correlates with poorer KNN classification performance.

Abstract

Convolutional Neural Networks (CNNs) can learn effective features, though have been shown to suffer from a performance drop when the distribution of the data changes from training to test data. In this paper we analyze the internal representations of CNNs and observe that the representations of unseen data in each class, spread more (with higher variance) in the embedding space of the CNN compared to representations of the training data. More importantly, this difference is more extreme if the unseen data comes from a shifted distribution. Based on this observation, we objectively evaluate the degree of representation's variance in each class via eigenvalue decomposition on the within-class covariance of the internal representations of CNNs and observe the same behaviour. This can be problematic as larger variances might lead to mis-classification if the sample crosses the decision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis

MethodsDropout · Dense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Softmax · Convolution · Ethereum Customer Service Number +1-833-534-1729