# Extracting structured data from unstructured breast imaging reports with transformer-based models

**Authors:** Mikel Carrilero-Mardones, Jorge Pérez-Martín, Francisco Javier Díez, Iñigo Bermejo Delgado

PMC · DOI: 10.3389/fdgth.2025.1718330 · 2026-01-09

## TL;DR

This paper compares transformer-based models for converting unstructured breast imaging reports into structured data, finding BioGPT to be the most effective.

## Contribution

The study introduces the use of generative models like BioGPT for multi-task extraction from medical reports, a novel approach compared to traditional BERT-based models.

## Key findings

- BioGPT outperformed BERT-based models in classification tasks with 96.10% accuracy and 90.30% macro F1 score.
- BioGPT could perform classification and extractive question answering simultaneously, a unique capability.
- Generative models show potential for efficient clinical data curation and integration into research workflows.

## Abstract

Structured clinical data is essential for research and informed decision-making, yet medical reports are frequently stored as unstructured free text. This study compared the performance of BERT-based and generative language models in converting unstructured breast imaging reports into structured, tabular data suitable for clinical and research applications.

A dataset of 286 anonymised breast imaging reports in Spanish was translated into English and used to evaluate five transformer-based models pre-trained in medical data: BlueBERT, BioBERT, BioMedBERT, BioGPT and ClinicalT5. Two natural language processing approaches were explored: classification of 19 categorical variables (e.g. diagnostic technique, report type, family history, BI-RADS category, tumour shape and margin) and extractive question answering of four entities (patient age, patient history, parenchymal distortion or asymmetries, and tumour size). Multiple fine-tuning strategies and input configurations were tested for each model, and performance was evaluated using accuracy and macro F1 scores.

BioGPT demonstrated the best performance in classification tasks, achieving an overall accuracy of 96.10% and a macro F1 score of 90.30%. This was significantly better than BERT-based models (p=0.012 for accuracy and p=0.017 for F1), particularly in underrepresented categories such as tumour descriptors. In extractive question answering tasks, BioGPT achieved an average accuracy of 93.24%, which is slightly lower than that of BioMedBERT and ClinicalT5, but not significantly so. Notably, BioGPT could perform classification and extractive question answering simultaneously, which is a capability unavailable in BERT-like models.

Generative models, particularly BioGPT, offer a robust and scalable approach to automating the extraction of structured information from unstructured breast imaging reports. Their superior performance, combined with their ability to handle multiple tasks concurrently, highlights their potential to reduce the manual effort required for clinical data curation and to enable the efficient integration of imaging data into research and clinical workflows.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Diseases:** tumour (MESH:D009369)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12827707/full.md

---
Source: https://tomesphere.com/paper/PMC12827707