# Preparing for Vascular Surgery Board Certification: A Comparative Study Using Large Language Models

**Authors:** Sonal Kumar, George Y Tadros, Taylor E Collignon, Otto Montero, Sophia Bampoh, Morris Sasson, Alberto Lopez

PMC · DOI: 10.7759/cureus.83848 · 2025-05-10

## TL;DR

This study compares how well different AI tools help prepare for vascular surgery board exams, finding that Claude 3.5 performs best.

## Contribution

The study evaluates and compares the effectiveness of large language models in vascular surgery board exam preparation.

## Key findings

- Claude 3.5 achieved the highest accuracy (65.7%) in answering vascular surgery questions.
- Claude 3.5 showed significant performance differences across disciplines like lower extremity and cerebrovascular conditions.
- Current LLMs do not fully meet the evolving needs of vascular surgery education.

## Abstract

Introduction and aim

Large language models (LLMs) are transforming medical education by offering innovative methods to enhance teaching and learning. Despite their demonstrated potential, research on its use in vascular surgery is limited. This study aimed to evaluate and compare the effectiveness of LLM in preparing for vascular surgery board certification exams, exploring their potential as educational supplements.

Methods

We selected 269 text-only multiple-choice questions of 642 from the Vascular Education and Self-Assessment Program (VESAP) version 6. We excluded 143 image-based questions. One independent reviewer input questions into the following four AI tools: ChatGPT 3.5 (San Francisco, CA: OpenAI), Google Gemini (London, UK: Google DeepMind), Microsoft Bing (Redmond, WA: Microsoft), and Claude 3.5 (San Francisco, CA: Anthropic Inc.). Each question with answer choices was entered into an incognito window of the AI tools without any context. A chi-square test was used to assess if the percentage of correct answers varied by question difficulty and discipline, with a significance level of p<0.05. Data analysis was conducted using Stata 18.5 (StataCorp LLC: College Station, TX).

Results

Claude 3.5 achieved the highest overall accuracy with 65.7% correct responses, outperforming Google Gemini (55.3%), ChatGPT (55.0%), and Microsoft Bing (53.9%). While ChatGPT, Google Gemini, and Microsoft Bing did not show significant accuracy variations by discipline (p=0.548, p=0.145, and p=0.797, respectively), Claude 3.5 demonstrated significant performance differences across disciplines (p=0.001), mastering lower extremity (86%), dialysis access (80%), cerebrovascular (77%), venous lymph (70%), and vascular medicine (68.9%).

Conclusion

Claude 3.5 outperformed other LLMs in solving Vascular Surgery Qualifying Examination version 6 (VSQE6) questions and shows promise as a supplementary tool in vascular surgery education. LLMs are well-versed in the topics of lower extremity vascular issues, dialysis access, and cerebrovascular conditions. At this time, current LLM capabilities do not fully meet the evolving needs of vascular surgery education. While traditional methods remain essential for vascular surgery, updated models of LLMs may provide more substantial benefits in the future.

## Full-text entities

- **Diseases:** conditions (MESH:D020763), LLMs (MESH:D007806), disorders of peripheral blood vessels (MESH:D009383), VSQE (MESH:D000267), aortoiliac disease (MESH:D004194), cerebrovascular disease (MESH:D002561)
- **Species:** Homo sapiens (human, species) [taxon 9606]

---
Source: https://tomesphere.com/paper/PMC12148048