MONICA: Benchmarking on Long-tailed Medical Image Classification

Lie Ju; Siyuan Yan; Yukun Zhou; Yang Nan; Xiaodan Xing; Peibo Duan,; Zongyuan Ge

arXiv:2410.02010·eess.IV·October 4, 2024

MONICA: Benchmarking on Long-tailed Medical Image Classification

Lie Ju, Siyuan Yan, Yukun Zhou, Yang Nan, Xiaodan Xing, Peibo Duan,, Zongyuan Ge

PDF

Open Access 1 Repo 3 Reviews

TL;DR

MONICA is a comprehensive benchmark and codebase for evaluating long-tailed medical image classification methods across multiple datasets and domains, aiming to standardize evaluation and foster progress.

Contribution

It provides a unified, well-structured benchmark with over 30 methods and 12 datasets, enabling fair comparison and detailed analysis in long-tailed medical image classification.

Findings

01

Effective evaluation of various methods across datasets

02

Insights into components contributing to performance

03

Guidance for future research in long-tailed medical learning

Abstract

Long-tailed learning is considered to be an extremely challenging problem in data imbalance learning. It aims to train well-generalized models from a large number of images that follow a long-tailed class distribution. In the medical field, many diagnostic imaging exams such as dermoscopy and chest radiography yield a long-tailed distribution of complex clinical findings. Recently, long-tailed learning in medical image analysis has garnered significant attention. However, the field currently lacks a unified, strictly formulated, and comprehensive benchmark, which often leads to unfair comparisons and inconclusive results. To help the community improve the evaluation and advance, we build a unified, well-structured codebase called Medical OpeN-source Long-taIled ClassifiCAtion (MONICA), which implements over 30 methods developed in relevant fields and evaluated on 12 long-tailed medical…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 4

Strengths

- A good overview of the datasets curated for this work - important contribution of decoupling the codebase - A good overview of the method approaches - practically useful to AI researchers in medical imaging

Weaknesses

- It would help to expand the benchmark datasets and bring in a canonical set for a field such as Camlyon for Pathology, etc. WILDS (medical subset) is a great example of a dataset to bring in to this benchmarking codebase - Resnet-50 is used as a backbone but the community has generally moved on to more complex backbones such as ConvNext / Swin or foundation model backbones for different datasets. - Generally the community uses pretrained backbones rather than training the backbones from the s

Reviewer 02Rating 5Confidence 4

Strengths

The paper attempts to provide a comprehensive benchmark for long-tailed medical image classification. The idea of integrating multiple existing methods and datasets into a unified platform could potentially be useful for researchers who want to compare various methodologies under a standardized framework.

Weaknesses

1. Motivation. The paper lacks sufficient justification for evaluating long-tailed problems specifically in medical imaging tasks. While the authors mention some motivations at the beginning, these arguments are not convincing. Is there a fundamental difference between long-tailed problems in medical imaging and those in conventional tasks? Would this difference necessitate different methodologies? Even if the data modalities and evaluation methods are distinct (e.g., balanced vs. imbalanced tes

Reviewer 03Rating 5Confidence 5

Strengths

Long-tailed learning is an extremely challenging problem, this work can serves as a comprehensive and reproducible benchmark, encouraging further advancements in long-tailed medical image learning. It covers most of the strategies that deal with long-tailed problems, and also include 12 datasets from different application domains.

Weaknesses

This work doesn't introduce any new datasets or methods. It is a collection of datasets (multi class or multi label) that are already publicly available without justifications as they are many other such kind of long tail datasets available. Also, they have changed some of the original datasets, it would not be useful if they don't share the modified datasets. They only tried ResNet for the tasks, would be nicer to make comparisons with other models. Also the discussions on SSL models seem not

Code & Models

Repositories

pyjulie/monica
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · COVID-19 diagnosis using AI · Brain Tumor Detection and Classification