Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling
Yuanchao Li, Zixing Zhang, Jing Han, Peter Bell, Catherine Lai

TL;DR
This paper introduces a semi-supervised learning framework for speech-based cognitive state classification that combines acoustic and linguistic pseudo-labeling to effectively utilize unlabeled data, achieving high performance with limited labeled data.
Contribution
The work presents a novel multi-view pseudo-labeling approach that integrates acoustic and linguistic cues for semi-supervised speech classification tasks.
Findings
Achieves comparable performance with only 30% labeled data
Outperforms baseline methods significantly
Effective in emotion recognition and dementia detection
Abstract
The lack of labeled data is a common challenge in speech classification tasks, particularly those requiring extensive subjective assessment, such as cognitive state classification. In this work, we propose a Semi-Supervised Learning (SSL) framework, introducing a novel multi-view pseudo-labeling method that leverages both acoustic and linguistic characteristics to select the most confident data for training the classification model. Acoustically, unlabeled data are compared to labeled data using the Frechet audio distance, calculated from embeddings generated by multiple audio encoders. Linguistically, large language models are prompted to revise automatic speech recognition transcriptions and predict labels based on our proposed task-specific knowledge. High-confidence data are identified when pseudo-labels from both sources align, while mismatches are treated as low-confidence data. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
