Teaching LLMs Brazilian Healthcare: Injecting Knowledge from Official Clinical Guidelines

Hugo Abonizio; Filipe Rocha Lopes; Roberto Lotufo; Rodrigo Nogueira

arXiv:2605.01077·cs.CL·May 5, 2026

Teaching LLMs Brazilian Healthcare: Injecting Knowledge from Official Clinical Guidelines

Hugo Abonizio, Filipe Rocha Lopes, Roberto Lotufo, Rodrigo Nogueira

PDF

1 Repo

TL;DR

This paper adapts and fine-tunes a large language model to Brazilian clinical guidelines, creating benchmarks and datasets, achieving high accuracy in clinical question answering, and releasing resources for Brazilian medical NLP research.

Contribution

It introduces a domain-specific adaptation process, new benchmarks, and datasets for Brazilian clinical NLP, and demonstrates improved performance over existing models.

Findings

01

Achieved 83.9% on HealthBench-BR

02

Achieved 85.4% on PCDT-QA

03

Generator diversity and reinforcement learning are crucial

Abstract

Brazil's Unified Health System (SUS) relies on official clinical guidelines that define diagnostic criteria, treatments, dosages, and monitoring procedures for over 200 million citizens. Yet current LLMs perform poorly on this guideline-specific knowledge, and no benchmark evaluates clinical recall grounded in Brazilian Portuguese protocols. We address this gap by adapting Qwen2.5-14B-Instruct to the Brazilian clinical domain. From 178 official guidelines (~5.4M tokens), we generate ~70M tokens of synthetic data in three formats -- rephrases, wiki-style articles, and question-answer pairs -- using four generator LLMs. We then apply continual pre-training followed by Group Relative Policy Optimization (GRPO). We introduce HealthBench-BR, with 1,780 balanced true/false clinical assertions, and PCDT-QA, with 890 open-ended clinical questions scored by an LLM judge. Our best model achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hugoabonizio/clinical-protocols-br
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.