SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model

Guifeng Deng; Pan Wang; Jiquan Wang; Shuying Rao; Junyi Xie; Wanjun Guo; Tao Li; Haiteng Jiang

arXiv:2603.26738·cs.CV·April 1, 2026

SleepVLM: Explainable and Rule-Grounded Sleep Staging via a Vision-Language Model

Guifeng Deng, Pan Wang, Jiquan Wang, Shuying Rao, Junyi Xie, Wanjun Guo, Tao Li, Haiteng Jiang

PDF

1 Datasets

TL;DR

SleepVLM is an explainable vision-language model for sleep staging that combines high accuracy with transparent, rule-based reasoning to enhance clinical trust and auditability.

Contribution

It introduces a rule-grounded VLM for sleep staging with clinician-readable rationales and releases a new annotated dataset for interpretability research.

Findings

01

Achieved Cohen's kappa of 0.767 on MASS-SS1 and 0.743 on ZUAMHCS.

02

Expert evaluations rated reasoning quality above 4.0/5.0.

03

Matched state-of-the-art performance while providing transparent explanations.

Abstract

While automated sleep staging has achieved expert-level accuracy, its clinical adoption is hindered by a lack of auditable reasoning. We introduce SleepVLM, a rule-grounded vision-language model (VLM) designed to stage sleep from multi-channel polysomnography (PSG) waveform images while generating clinician-readable rationales based on American Academy of Sleep Medicine (AASM) scoring criteria. Utilizing waveform-perceptual pre-training and rule-grounded supervised fine-tuning, SleepVLM achieved Cohen's kappa scores of 0.767 on an held out test set (MASS-SS1) and 0.743 on an external cohort (ZUAMHCS), matching state-of-the-art performance. Expert evaluations further validated the quality of the model's reasoning, with mean scores exceeding 4.0/5.0 for factual accuracy, evidence comprehensiveness, and logical coherence. By coupling competitive performance with transparent, rule-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Feng613/MASS-EX
dataset· 86 dl
86 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.