Clinically-guided Data Synthesis for Laryngeal Lesion Detection
Chiara Baldini, Kaisar Kushibar, Richard Osuala, Simone Balocco, Oliver Diaz, Karim Lekadir, Leonardo S. Mattos

TL;DR
This paper presents a novel clinical-guided data synthesis method using Latent Diffusion Models to generate realistic laryngeal endoscopic images, significantly enhancing training data for lesion detection systems.
Contribution
It introduces a clinically-guided image synthesis approach with diffusion models to address data scarcity in laryngeal lesion detection, improving model performance and realism.
Findings
Synthetic data increased detection rate by 9% internally
Synthetic data increased detection rate by 22.1% externally
Experts rated synthetic images as highly realistic
Abstract
Although computer-aided diagnosis (CADx) and detection (CADe) systems have made significant progress in various medical domains, their application is still limited in specialized fields such as otorhinolaryngology. In the latter, current assessment methods heavily depend on operator expertise, and the high heterogeneity of lesions complicates diagnosis, with biopsy persisting as the gold standard despite its substantial costs and risks. A critical bottleneck for specialized endoscopic CADx/e systems is the lack of well-annotated datasets with sufficient variability for real-world generalization. This study introduces a novel approach that exploits a Latent Diffusion Model (LDM) coupled with a ControlNet adapter to generate laryngeal endoscopic image-annotation pairs, guided by clinical observations. The method addresses data scarcity by conditioning the diffusion process to produce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Head and Neck Cancer Studies
