H-LDM: Hierarchical Latent Diffusion Models for Controllable and Interpretable PCG Synthesis from Clinical Metadata

Chenyang Xu; Siming Li; Hao Wang

arXiv:2511.14312·cs.LG·February 13, 2026

H-LDM: Hierarchical Latent Diffusion Models for Controllable and Interpretable PCG Synthesis from Clinical Metadata

Chenyang Xu, Siming Li, Hao Wang

PDF

Open Access

TL;DR

H-LDM is a hierarchical latent diffusion model that generates controllable, clinically accurate PCG signals from metadata, improving data augmentation for cardiac diagnosis and enabling interpretability.

Contribution

It introduces a multi-scale VAE and a hierarchical pipeline with a novel Medical Attention module for physiologically-disentangled, controllable PCG synthesis from clinical metadata.

Findings

01

Achieves state-of-the-art Fréchet Audio Distance of 9.7

02

92% attribute disentanglement score

03

87.1% clinical validity confirmed by cardiologists

Abstract

Phonocardiogram (PCG) analysis is vital for cardiovascular disease diagnosis, yet the scarcity of labeled pathological data hinders the capability of AI systems. To bridge this, we introduce H-LDM, a Hierarchical Latent Diffusion Model for generating clinically accurate and controllable PCG signals from structured metadata. Our approach features: (1) a multi-scale VAE that learns a physiologically-disentangled latent space, separating rhythm, heart sounds, and murmurs; (2) a hierarchical text-to-biosignal pipeline that leverages rich clinical metadata for fine-grained control over 17 distinct conditions; and (3) an interpretable diffusion process guided by a novel Medical Attention module. Experiments on the PhysioNet CirCor dataset demonstrate state-of-the-art performance, achieving a Fr\'echet Audio Distance of 9.7, a 92% attribute disentanglement score, and 87.1% clinical validity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhonocardiography and Auscultation Techniques · Voice and Speech Disorders · COVID-19 diagnosis using AI