Advancing Multimodal Medical Capabilities of Gemini

Lin Yang; Shawn Xu; Andrew Sellergren; Timo Kohlberger; Yuchen Zhou,; Ira Ktena; Atilla Kiraly; Faruk Ahmed; Farhad Hormozdiari; Tiam Jaroensri,; Eric Wang; Ellery Wulczyn; Fayaz Jamil; Theo Guidroz; Chuck Lau; Siyuan Qiao,; Yun Liu; Akshay Goel; Kendall Park; Arnav Agharwal; Nick George; Yang Wang,; Ryutaro Tanno; David G. T. Barrett; Wei-Hung Weng; S. Sara Mahdavi; Khaled; Saab; Tao Tu; Sreenivasa Raju Kalidindi; Mozziyar Etemadi; Jorge Cuadros,; Gregory Sorensen; Yossi Matias; Katherine Chou; Greg Corrado; Joelle Barral,; Shravya Shetty; David Fleet; S. M. Ali Eslami; Daniel Tse; Shruthi; Prabhakara; Cory McLean; Dave Steiner; Rory Pilgrim; Christopher Kelly,; Shekoofeh Azizi; Daniel Golden

arXiv:2405.03162·cs.CV·May 7, 2024·28 cites

Advancing Multimodal Medical Capabilities of Gemini

Lin Yang, Shawn Xu, Andrew Sellergren, Timo Kohlberger, Yuchen Zhou,, Ira Ktena, Atilla Kiraly, Faruk Ahmed, Farhad Hormozdiari, Tiam Jaroensri,, Eric Wang, Ellery Wulczyn, Fayaz Jamil, Theo Guidroz, Chuck Lau, Siyuan Qiao,, Yun Liu, Akshay Goel, Kendall Park, Arnav Agharwal

PDF

Open Access 10 Models

TL;DR

Med-Gemini advances multimodal medical AI by developing specialized models for diverse clinical data, achieving state-of-the-art results in report generation, classification, and risk prediction across multiple medical domains.

Contribution

This work introduces the Med-Gemini family, fine-tuned models optimized for medical data, demonstrating significant improvements in report generation, classification, and genetic risk prediction.

Findings

01

Med-Gemini-2D exceeds previous best in chest X-ray report generation.

02

First large multimodal model for 3D CT report generation with 53% clinically acceptable reports.

03

Med-Gemini-2D surpasses baselines in 17 of 20 tasks including classification and VQA.

Abstract

Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histopathology, ophthalmology, dermatology and genomic data. Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation based on expert evaluation, exceeding previous best results across two separate datasets by an absolute margin of 1% and 12%, where 57% and 96% of AI reports on normal cases, and 43% and 65% on abnormal cases, are evaluated as "equivalent or better" than the original radiologists' reports. We demonstrate the first ever large multimodal model-based report…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical and Engineering Education · Software Engineering Techniques and Practices