VisionLLM-based Multimodal Fusion Network for Glottic Carcinoma Early   Detection

Zhaohui Jin; Yi Shuai; Yongcheng Li; Lingcong Cai; Yun Li; Huifen Liu,; Xiaomao Fan

arXiv:2412.18124·cs.CV·December 25, 2024

VisionLLM-based Multimodal Fusion Network for Glottic Carcinoma Early Detection

Zhaohui Jin, Yi Shuai, Yongcheng Li, Lingcong Cai, Yun Li, Huifen Liu,, Xiaomao Fan

PDF

Open Access

TL;DR

This paper introduces MMGC-Net, a multimodal fusion network utilizing VisionLLM to improve early detection of glottic carcinoma by integrating image and text data, achieving state-of-the-art accuracy on a new dataset.

Contribution

The paper presents a novel VisionLLM-based multimodal fusion network specifically designed for glottic carcinoma detection, leveraging a new dataset and advanced feature fusion techniques.

Findings

01

MMGC-Net outperforms previous models on SYSU1H dataset.

02

Multimodal integration improves detection accuracy.

03

State-of-the-art results achieved with VisionLLM-based approach.

Abstract

The early detection of glottic carcinoma is critical for improving patient outcomes, as it enables timely intervention, preserves vocal function, and significantly reduces the risk of tumor progression and metastasis. However, the similarity in morphology between glottic carcinoma and vocal cord dysplasia results in suboptimal detection accuracy. To address this issue, we propose a vision large language model-based (VisionLLM-based) multimodal fusion network for glottic carcinoma detection, known as MMGC-Net. By integrating image and text modalities, multimodal models can capture complementary information, leading to more accurate and robust predictions. In this paper, we collect a private real glottic carcinoma dataset named SYSU1H from the First Affiliated Hospital of Sun Yat-sen University, with 5,799 image-text pairs. We leverage an image encoder and additional Q-Former to extract…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHead and Neck Cancer Studies