TL;DR
This paper introduces a real-time, automated pipeline combining YOLOv8 and U-Net for accurate glottal area segmentation in high-speed videoendoscopy, enabling clinical pathology assessment and robust cross-dataset performance.
Contribution
A novel detection-gated segmentation framework that improves accuracy, generalizability, and speed for glottal area extraction in clinical settings.
Findings
Achieved high Dice scores on in-distribution datasets (0.81 and 0.856).
Demonstrated cross-dataset portability with 0.745 DSC without fine-tuning.
Clinical study showed the glottal area CV distinguishes healthy from pathological subjects (p=0.006).
Abstract
We present a fully automated, two-stage modular glottal area segmentation framework for high-speed videoendoscopy (HSV) designed for accuracy, generalizability, and real-time playback. Our detection-gated pipeline combines a YOLOv8n glottis localizer with a U-Net segmenter; the localizer defines a tight crop to ensure a consistent field of view and gates the output to reduce spurious segmentations during glottal closure. The models were trained on the GIRAFE (N=600) and BAGLS (N=55,750) datasets. Cross-dataset portability was evaluated by benchmarking GIRAFE-trained models on the BAGLS test set without fine-tuning. In these evaluations, the pipeline achieved a Dice Similarity Coefficient (DSC) of 0.745 (87% of the in-domain ceiling). On in-distribution test sets, the system achieved DSCs of 0.81 (GIRAFE) and 0.856 (BAGLS), outperforming or competing with state-of-the-art methods. An…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
