Semi-Supervised Diseased Detection from Speech Dialogues with Multi-Level Data Modeling

Xingyuan Li; Mengyue Wu

arXiv:2601.04744·cs.SD·April 21, 2026

Semi-Supervised Diseased Detection from Speech Dialogues with Multi-Level Data Modeling

Xingyuan Li, Mengyue Wu

PDF

1 Repo

TL;DR

This paper introduces a semi-supervised learning framework for detecting medical conditions from speech dialogues, effectively leveraging unlabeled data through multi-level data modeling and achieving high performance with minimal labeled samples.

Contribution

It presents a novel hierarchical SSL approach that models frame, segment, and session levels, improving disease detection in speech with limited labeled data.

Findings

01

Achieves 90% of fully-supervised performance with only 11 labeled samples.

02

Framework is model-agnostic and robust across languages and conditions.

03

Effectively utilizes unlabeled clinical dialogues through pseudo-labeling.

Abstract

Detecting medical conditions from speech acoustics is fundamentally a weakly-supervised learning problem: a single, often noisy, session-level label must be linked to nuanced patterns within a long, complex audio recording. This task is further hampered by severe data scarcity and the subjective nature of clinical annotations. While semi-supervised learning (SSL) offers a viable path to leverage unlabeled data, existing audio methods often fail to address the core challenge that pathological traits are not uniformly expressed in a patient's speech. We propose a novel, audio-only SSL framework that explicitly models this hierarchy by jointly learning from frame-level, segment-level, and session-level representations within unsegmented clinical dialogues. Our end-to-end approach dynamically aggregates these multi-granularity features and generates high-quality pseudo-labels to efficiently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fispresent/semi_pathological
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.