Semi-Supervised Cognitive State Classification from Speech with   Multi-View Pseudo-Labeling

Yuanchao Li; Zixing Zhang; Jing Han; Peter Bell; Catherine Lai

arXiv:2409.16937·eess.AS·May 1, 2025

Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling

Yuanchao Li, Zixing Zhang, Jing Han, Peter Bell, Catherine Lai

PDF

Open Access 1 Repo

TL;DR

This paper introduces a semi-supervised learning framework for speech-based cognitive state classification that combines acoustic and linguistic pseudo-labeling to effectively utilize unlabeled data, achieving high performance with limited labeled data.

Contribution

The work presents a novel multi-view pseudo-labeling approach that integrates acoustic and linguistic cues for semi-supervised speech classification tasks.

Findings

01

Achieves comparable performance with only 30% labeled data

02

Outperforms baseline methods significantly

03

Effective in emotion recognition and dementia detection

Abstract

The lack of labeled data is a common challenge in speech classification tasks, particularly those requiring extensive subjective assessment, such as cognitive state classification. In this work, we propose a Semi-Supervised Learning (SSL) framework, introducing a novel multi-view pseudo-labeling method that leverages both acoustic and linguistic characteristics to select the most confident data for training the classification model. Acoustically, unlabeled data are compared to labeled data using the Frechet audio distance, calculated from embeddings generated by multiple audio encoders. Linguistically, large language models are prompted to revise automatic speech recognition transcriptions and predict labels based on our proposed task-specific knowledge. High-confidence data are identified when pseudo-labels from both sources align, while mismatches are treated as low-confidence data. A…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yc-li20/semi-supervised-training
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis