CNS-Obsidian: A Neurosurgical Vision-Language Model Built From Scientific Publications

Anton Alyakin; Jaden Stryker; Daniel Alexander Alber; Jin Vivian Lee; Karl L. Sangwon; Brandon Duderstadt; Akshay Save; David Kurland; Spencer Frome; Shrutika Singh; Jeff Zhang; Eunice Yang; Ki Yun Park; Cordelia Orillac; Aly A. Valliani; Sean Neifert; Albert Liu; Aneek Patel; Christopher Livia; Darryl Lau; Ilya Laufer; Peter A. Rozman; Eveline Teresa Hidalgo; Howard Riina; Rui Feng; Todd Hollon; Yindalon Aphinyanaphongs; John G. Golfinos; Laura Snyder; Eric Leuthardt; Douglas Kondziolka; Eric Karl Oermann

arXiv:2502.19546·cs.AI·November 26, 2025

CNS-Obsidian: A Neurosurgical Vision-Language Model Built From Scientific Publications

Anton Alyakin, Jaden Stryker, Daniel Alexander Alber, Jin Vivian Lee, Karl L. Sangwon, Brandon Duderstadt, Akshay Save, David Kurland, Spencer Frome, Shrutika Singh, Jeff Zhang, Eunice Yang, Ki Yun Park, Cordelia Orillac, Aly A. Valliani, Sean Neifert, Albert Liu, Aneek Patel

PDF

1 Models

TL;DR

CNS-Obsidian is a neurosurgical vision-language model trained on peer-reviewed literature, demonstrating potential for clinical decision support but with performance limitations compared to GPT-4o in real-world neurosurgical consultations.

Contribution

This work introduces CNS-Obsidian, a specialized neurosurgical VLM trained on scientific publications, showcasing its development and evaluation in clinical settings.

Findings

01

CNS-Obsidian matches GPT-4o on synthetic questions.

02

It achieves lower accuracy on human-generated questions.

03

Both models include correct diagnoses in about 60% of cases.

Abstract

General-purpose VLMs demonstrate impressive capabilities, but their opaque training on uncurated internet data poses critical limitations for high-stakes decision-making, such as in neurosurgery. We present CNS-Obsidian, a neurosurgical VLM trained on peer-reviewed literature, and demonstrate its clinical utility versus GPT-4o in a real-world setting. We compiled 23,984 articles from Neurosurgery Publications journals, yielding 78,853 figures and captions. Using GPT-4o and Claude Sonnet-3.5, we converted these into 263,064 training samples across three formats: instruction fine-tuning, multiple-choice questions, and differential diagnosis. We trained CNS-Obsidian, a fine-tune of the 34-billion parameter LLaVA-Next model. In a blinded, randomized trial at NYU Langone Health (Aug 30-Nov 30, 2024), neurosurgery consultations were assigned to either CNS-Obsidian or a HIPAA-compliant GPT-4o…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
NYU-OLAB/LLaVA-Next-Med-OLAB
model· 15 dl· ♡ 1
15 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.