DiagR1: A Vision-Language Model Trained via Reinforcement Learning for Digestive Pathology Diagnosis

Minxi Ouyang; Lianghui Zhu; Yaqing Bao; Qiang Huang; Jingli Ouyang; Tian Guan; Xitong Ling; Jiawen Li; Song Duan; Wenbin Dai; Li Zheng; Xuemei Zhang; Yonghong He

arXiv:2507.18433·eess.IV·July 25, 2025

DiagR1: A Vision-Language Model Trained via Reinforcement Learning for Digestive Pathology Diagnosis

Minxi Ouyang, Lianghui Zhu, Yaqing Bao, Qiang Huang, Jingli Ouyang, Tian Guan, Xitong Ling, Jiawen Li, Song Duan, Wenbin Dai, Li Zheng, Xuemei Zhang, Yonghong He

PDF

Open Access

TL;DR

This paper introduces DiagR1, a vision-language model trained with reinforcement learning that improves gastrointestinal pathology diagnosis by enhancing reasoning, reducing errors, and increasing clinical relevance through a specialized dataset and prompt strategy.

Contribution

The paper presents a large-scale gastrointestinal pathology dataset and a novel prompt argumentation strategy, combined with reinforcement learning, to improve diagnostic reasoning and output quality.

Findings

01

Outperforms existing models in clinical relevance by 18.7%

02

Achieves 32.4% better structural completeness

03

Reduces diagnostic errors by 41.2%

Abstract

Multimodal large models have shown great potential in automating pathology image analysis. However, current multimodal models for gastrointestinal pathology are constrained by both data quality and reasoning transparency: pervasive noise and incomplete annotations in public datasets predispose vision language models to factual hallucinations when generating diagnostic text, while the absence of explicit intermediate reasoning chains renders the outputs difficult to audit and thus less trustworthy in clinical practice. To address these issues, we construct a large scale gastrointestinal pathology dataset containing both microscopic descriptions and diagnostic conclusions, and propose a prompt argumentation strategy that incorporates lesion classification and anatomical site information. This design guides the model to better capture image specific features and maintain semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Artificial Intelligence in Healthcare