# Integrating GPT-4o Into Data Mining in Neurosurgery: Feasibility and Proof-of-Concept Study

**Authors:** Arthur Henrique Almeida Sales, Jürgen Beck, Jürgen Grauvogel

PMC · DOI: 10.2196/77114 · 2026-03-09

## TL;DR

This study shows that GPT-4o can accurately extract structured data from neurosurgical reports, especially for simple variables, but needs prompt refinement for more complex information.

## Contribution

The study introduces a proof-of-concept evaluation of GPT-4o's feasibility for structured data extraction in neurosurgical documentation.

## Key findings

- GPT-4o achieved 100% accuracy for structured variables like patient ID and surgery date.
- Prompt refinement improved accuracy for complex variables like intraoperative complications from 50% to 90-100%.
- Accuracy varied by variable type, with categorical variables performing best and conditional text variables worst.

## Abstract

Large language models offer new possibilities for transforming unstructured clinical text into structured datasets. However, their performance in specialized and complex documentation environments, such as neurosurgery, remains insufficiently characterized. GPT-4o is a large language model with enhanced natural language capabilities, but its accuracy in extracting structured data from neurosurgical reports has not been systematically assessed.

This proof-of-concept study evaluated the feasibility and accuracy of GPT-4o for extracting predefined structured variables from unstructured neurosurgical reports of patients with vestibular schwannoma. Specific aims were to measure accuracy across variable types, assess the impact of prompt refinement, and explore the model’s potential utility for research-oriented data mining.

In this retrospective single-center study, 10 consecutive patients with histologically confirmed vestibular schwannoma who underwent surgery between August and December 2023 were included. Four anonymized German-language documents per patient (discharge, surgical, histopathology, and 3-month follow-up reports) were processed using GPT-4o. Seventeen variables were extracted using a standardized zero-shot prompt. Targeted prompt refinements were subsequently applied for variables with low baseline accuracy. Two board-certified neurosurgeons independently validated all outputs, with discrepancies resolved by a senior neurosurgeon. Accuracy metrics, 95% CIs (Wilson method), and descriptive comparisons between variable types were calculated.

GPT-4o achieved 100% accuracy for structured variables requiring minimal interpretation, including patient ID, date of birth, date of surgery, histopathological diagnosis, and World Health Organization grade. Several interpretative variables, such as symptoms at presentation, symptom type, symptom duration, extent of resection, and permanence of postoperative deficits, were also extracted with 100% accuracy. In contrast, intraoperative complications and new postoperative deficits were correctly identified in only 50% (5/10) of cases using the zero-shot prompt. After targeted prompt refinement, accuracy for these variables improved substantially, reaching 90% to 100% in most cases. The mean accuracy was highest for structured categorical variables (97.5%, SD 4.6%), intermediate for binary variables (80%, SD 27.4%), and lowest for conditional text variables (66.7%, SD 28.9%), without statistically significant differences (P=.25).

GPT-4o demonstrated strong feasibility for structured data extraction from standardized neurosurgical reports, particularly for variables with limited semantic complexity. However, the high accuracy observed reflects a narrow and highly controlled context and should not be interpreted as evidence of general reliability across diverse clinical settings. Larger, multi-institutional, and multilingual studies are needed to determine broader applicability and potential clinical integration.

## Linked entities

- **Diseases:** vestibular schwannoma (MONDO:0001569)

## Full-text entities

- **Diseases:** vestibular schwannoma (MESH:D009464), postoperative (MESH:D019106)
- **Chemicals:** GPT-4o (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12978928/full.md

---
Source: https://tomesphere.com/paper/PMC12978928