Advancing Multimodal Medical Capabilities of Gemini
Lin Yang, Shawn Xu, Andrew Sellergren, Timo Kohlberger, Yuchen Zhou,, Ira Ktena, Atilla Kiraly, Faruk Ahmed, Farhad Hormozdiari, Tiam Jaroensri,, Eric Wang, Ellery Wulczyn, Fayaz Jamil, Theo Guidroz, Chuck Lau, Siyuan Qiao,, Yun Liu, Akshay Goel, Kendall Park, Arnav Agharwal

TL;DR
Med-Gemini advances multimodal medical AI by developing specialized models for diverse clinical data, achieving state-of-the-art results in report generation, classification, and risk prediction across multiple medical domains.
Contribution
This work introduces the Med-Gemini family, fine-tuned models optimized for medical data, demonstrating significant improvements in report generation, classification, and genetic risk prediction.
Findings
Med-Gemini-2D exceeds previous best in chest X-ray report generation.
First large multimodal model for 3D CT report generation with 53% clinically acceptable reports.
Med-Gemini-2D surpasses baselines in 17 of 20 tasks including classification and VQA.
Abstract
Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histopathology, ophthalmology, dermatology and genomic data. Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation based on expert evaluation, exceeding previous best results across two separate datasets by an absolute margin of 1% and 12%, where 57% and 96% of AI reports on normal cases, and 43% and 65% on abnormal cases, are evaluated as "equivalent or better" than the original radiologists' reports. We demonstrate the first ever large multimodal model-based report…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗google/medgemma-1.5-4b-itmodel· 86k dl· ♡ 53686k dl♡ 536
- 🤗google/medgemma-4b-itmodel· 170k dl· ♡ 925170k dl♡ 925
- 🤗google/medsiglip-448model· 22k dl· ♡ 12922k dl♡ 129
- 🤗unsloth/medgemma-27b-it-GGUFmodel· 4.4k dl· ♡ 384.4k dl♡ 38
- 🤗google/medgemma-4b-ptmodel· 1.1k dl· ♡ 1481.1k dl♡ 148
- 🤗google/medgemma-27b-itmodel· 107k dl· ♡ 330107k dl♡ 330
- 🤗pszemraj/medgemma-4b-it-hereticmodel· 46 dl· ♡ 546 dl♡ 5
- 🤗unsloth/medgemma-1.5-4b-it-GGUFmodel· 6.7k dl· ♡ 336.7k dl♡ 33
- 🤗unsloth/medgemma-4b-itmodel· 1.1k dl· ♡ 71.1k dl♡ 7
- 🤗unsloth/medgemma-4b-it-GGUFmodel· 11k dl· ♡ 6311k dl♡ 63
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical and Engineering Education · Software Engineering Techniques and Practices
