MOSMOS: Multi-organ segmentation facilitated by medical report supervision
Weiwei Tian, Xinyu Huang, Junlin Hou, Caiyue Ren, Longquan Jiang,, Rui-Wei Zhao, Gang Jin, Yuejie Zhang, Daoying Geng

TL;DR
This paper introduces MOSMOS, a novel pre-training and fine-tuning framework that leverages medical report supervision to improve multi-organ segmentation across various datasets and models.
Contribution
The paper proposes a new framework combining contrastive learning and multi-label recognition to enhance fine-grained multi-organ segmentation using report supervision.
Findings
Effective across multiple datasets and modalities
Improves segmentation accuracy with report supervision
Generalizes to different network architectures
Abstract
Owing to a large amount of multi-modal data in modern medical systems, such as medical images and reports, Medical Vision-Language Pre-training (Med-VLP) has demonstrated incredible achievements in coarse-grained downstream tasks (i.e., medical classification, retrieval, and visual question answering). However, the problem of transferring knowledge learned from Med-VLP to fine-grained multi-organ segmentation tasks has barely been investigated. Multi-organ segmentation is challenging mainly due to the lack of large-scale fully annotated datasets and the wide variation in the shape and size of the same organ between individuals with different diseases. In this paper, we propose a novel pre-training & fine-tuning framework for Multi-Organ Segmentation by harnessing Medical repOrt Supervision (MOSMOS). Specifically, we first introduce global contrastive learning to maximally align the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Radiomics and Machine Learning in Medical Imaging
MethodsAttention Is All You Need · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Linear Layer · Batch Normalization · 1x1 Convolution · Residual Connection · Multi-Head Attention · Max Pooling
