Analysis of Self-Supervised Learning and Dimensionality Reduction Methods in Clustering-Based Active Learning for Speech Emotion Recognition
Einari Vaaras, Manu Airaksinen, Okko R\"as\"anen

TL;DR
This paper explores combining contrastive predictive coding with dimensionality reduction to enhance clustering-based active learning for speech emotion recognition, demonstrating that low-dimensional features can maintain effective performance.
Contribution
It introduces a novel approach integrating CPC and dimensionality reduction to improve clustering-based active learning in speech emotion recognition tasks.
Findings
CPC improves clustering-based active learning performance.
Low-dimensional features retain effective active learning performance.
Both local and global feature space topology are useful for active learning.
Abstract
When domain experts are needed to perform data annotation for complex machine-learning tasks, reducing annotation effort is crucial in order to cut down time and expenses. For cases when there are no annotations available, one approach is to utilize the structure of the feature space for clustering-based active learning (AL) methods. However, these methods are heavily dependent on how the samples are organized in the feature space and what distance metric is used. Unsupervised methods such as contrastive predictive coding (CPC) can potentially be used to learn organized feature spaces, but these methods typically create high-dimensional features which might be challenging for estimating data density. In this paper, we combine CPC and multiple dimensionality reduction methods in search of functioning practices for clustering-based AL. Our experiments for simulating speech emotion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Text and Document Classification Technologies
MethodsInfoNCE · Contrastive Predictive Coding
