Multimodal Medical Endoscopic Image Analysis via Progressive Disentangle-aware Contrastive Learning

Junhao Wu; Yun Li; Junhao Li; Jingliang Bian; Xiaomao Fan; Wenbin Lei; Ruxin Wang

arXiv:2508.16882·eess.IV·August 26, 2025

Multimodal Medical Endoscopic Image Analysis via Progressive Disentangle-aware Contrastive Learning

Junhao Wu, Yun Li, Junhao Li, Jingliang Bian, Xiaomao Fan, Wenbin Lei, Ruxin Wang

PDF

TL;DR

This paper introduces a novel multi-modality learning framework that combines White Light Imaging and Narrow Band Imaging for improved tumor segmentation, utilizing progressive disentanglement and contrastive learning to enhance accuracy.

Contribution

It proposes an innovative 'Align-Disentangle-Fusion' framework with multi-scale distribution alignment and progressive feature disentanglement for multimodal medical image analysis.

Findings

01

Outperforms state-of-the-art methods in tumor segmentation accuracy.

02

Effectively separates modality-specific and shared features.

03

Demonstrates robustness across multiple clinical datasets.

Abstract

Accurate segmentation of laryngo-pharyngeal tumors is crucial for precise diagnosis and effective treatment planning. However, traditional single-modality imaging methods often fall short of capturing the complex anatomical and pathological features of these tumors. In this study, we present an innovative multi-modality representation learning framework based on the `Align-Disentangle-Fusion' mechanism that seamlessly integrates 2D White Light Imaging (WLI) and Narrow Band Imaging (NBI) pairs to enhance segmentation performance. A cornerstone of our approach is multi-scale distribution alignment, which mitigates modality discrepancies by aligning features across multiple transformer layers. Furthermore, a progressive feature disentanglement strategy is developed with the designed preliminary disentanglement and disentangle-aware contrastive learning to effectively separate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.