Compound Figure Separation of Biomedical Images: Mining Large Datasets   for Self-supervised Learning

Tianyuan Yao; Chang Qu; Jun Long; Quan Liu; Ruining Deng; Yuanhan; Tian; Jiachen Xu; Aadarsh Jha; Zuhayr Asad; Shunxing Bao; Mengyang Zhao,; Agnes B. Fogo; Bennett A.Landman; Haichun Yang; Catie Chang; Yuankai Huo

arXiv:2208.14357·cs.CV·August 31, 2022

Compound Figure Separation of Biomedical Images: Mining Large Datasets for Self-supervised Learning

Tianyuan Yao, Chang Qu, Jun Long, Quan Liu, Ruining Deng, Yuanhan, Tian, Jiachen Xu, Aadarsh Jha, Zuhayr Asad, Shunxing Bao, Mengyang Zhao,, Agnes B. Fogo, Bennett A.Landman, Haichun Yang, Catie Chang, Yuankai Huo

PDF

TL;DR

This paper introduces SimCFS, a novel framework for separating compound biomedical images into individual images without bounding box annotations, enhancing large-scale data collection for self-supervised learning in medical imaging.

Contribution

The study presents a resource-efficient, annotation-free method for compound figure separation, enabling better utilization of large unannotated biomedical image datasets for self-supervised learning.

Findings

01

Achieved state-of-the-art performance on ImageCLEF 2016 dataset.

02

Pretrained models improved downstream image classification accuracy.

03

Proposed method reduces reliance on extensive bounding box annotations.

Abstract

With the rapid development of self-supervised learning (e.g., contrastive learning), the importance of having large-scale images (even without annotations) for training a more generalizable AI model has been widely recognized in medical image analysis. However, collecting large-scale task-specific unannotated data at scale can be challenging for individual labs. Existing online resources, such as digital books, publications, and search engines, provide a new resource for obtaining large-scale images. However, published images in healthcare (e.g., radiology and pathology) consist of a considerable amount of compound figures with subplots. In order to extract and separate compound figures into usable individual images for downstream learning, we propose a simple compound figure separation (SimCFS) framework without using the traditionally required detection bounding box annotations, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Learning