Clinical named entity recognition in the Portuguese language: a benchmark of modern BERT models and LLMs

Vinicius Anjos de Almeida; Sandro Saorin da Silva; Josimar Chire; Leonardo Vicenzi; N\'icolas Henrique Borges; Helena Kociolek; Sarah Miri\~a de Castro Rocha; Frederico Nassif Gomes; J\'ulia Cristina Ferreira; Oge Marques; Lucas Emanuel Silva e Oliveira

arXiv:2603.26510·cs.CL·March 30, 2026

Clinical named entity recognition in the Portuguese language: a benchmark of modern BERT models and LLMs

Vinicius Anjos de Almeida, Sandro Saorin da Silva, Josimar Chire, Leonardo Vicenzi, N\'icolas Henrique Borges, Helena Kociolek, Sarah Miri\~a de Castro Rocha, Frederico Nassif Gomes, J\'ulia Cristina Ferreira, Oge Marques, Lucas Emanuel Silva e Oliveira

PDF

TL;DR

This study evaluates BERT-based models and LLMs for Portuguese clinical NER, demonstrating mmBERT's superior performance and the effectiveness of data balancing strategies.

Contribution

It provides a comprehensive benchmark of modern BERT models and LLMs for Portuguese clinical NER, highlighting mmBERT's effectiveness and strategies for class imbalance.

Findings

01

mmBERT-base achieved the highest micro F1 of 0.76.

02

Iterative stratification improved class balance and model performance.

03

Multilingual BERT models perform well for Portuguese clinical NER.

Abstract

Clinical notes contain valuable unstructured information. Named entity recognition (NER) enables the automatic extraction of medical concepts; however, benchmarks for Portuguese remain scarce. In this study, we aimed to evaluate BERT-based models and large language models (LLMs) for clinical NER in Portuguese and to test strategies for addressing multilabel imbalance. We compared BioBERTpt, BERTimbau, ModernBERT, and mmBERT with LLMs such as GPT-5 and Gemini-2.5, using the public SemClinBr corpus and a private breast cancer dataset. Models were trained under identical conditions and evaluated using precision, recall, and F1-score. Iterative stratification, weighted loss, and oversampling were explored to mitigate class imbalance. The mmBERT-base model achieved the best performance (micro F1 = 0.76), outperforming all other models. Iterative stratification improved class balance and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.