Anatomy-R1: Enhancing Anatomy Reasoning in Multimodal Large Language Models via Anatomical Similarity Curriculum and Group Diversity Augmentation
Ziyang Song, Zelin Zang, Zuyao Chen, Xusheng Liang, Dong Yi, Jinlin Wu, Hongbin Liu, Jiebo Luo, Zhen. Lei

TL;DR
This paper introduces Anatomy-R1, a novel approach that improves anatomical reasoning in multimodal large language models through curriculum learning based on anatomical similarity and group diversity augmentation, leading to better performance on medical imaging benchmarks.
Contribution
It proposes two innovative methods—Anatomical Similarity Curriculum Learning and Group Diversity Question Augmentation—to enhance reasoning and knowledge sharing in MLLMs for medical imaging tasks.
Findings
Significant performance improvements on SGG-VQA and OmniMedVQA benchmarks.
Enhanced reasoning capabilities in medical image understanding.
Effective knowledge sharing across anatomical structures.
Abstract
Multimodal Large Language Models (MLLMs) have achieved impressive progress in natural image reasoning, yet their potential in medical imaging remains underexplored, especially in clinical anatomical surgical images. Anatomy understanding tasks demand precise understanding and clinically coherent answers, which are difficult to achieve due to the complexity of medical data and the scarcity of high-quality expert annotations. These challenges limit the effectiveness of conventional Supervised Fine-Tuning (SFT) strategies. While recent work has demonstrated that Group Relative Policy Optimization (GRPO) can enhance reasoning in MLLMs without relying on large amounts of data, we find two weaknesses that hinder GRPO's reasoning performance in anatomy recognition: 1) knowledge cannot be effectively shared between different anatomical structures, resulting in uneven information gain and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education · Topic Modeling
