# Small language models applied in text summarization task of health-related news to improve public health audit: an experimental case study

**Authors:** Alysson Guimarães, Methanias Colaço Junior, Samuel Santana De Almeida, Gabriely Garcia Ferreira de Araújo, Raphael Silva Fontes, Helder Prado, Luca Pareja Credidio Freire Alves, Natan Matos, Ricardo Alexsandro de Medeiros Valentim, João Paulo Queiroz dos Santos

PMC · DOI: 10.3389/frai.2026.1708993 · Frontiers in Artificial Intelligence · 2026-02-05

## TL;DR

This study tests small language models for summarizing health-related news to help with public health audits, finding that these models outperform humans in preserving key information.

## Contribution

The novel contribution is evaluating small language models for audit-related text summarization in public health, showing their superior performance over human summaries.

## Key findings

- SLMs like Hermes-3-Llama-3.2-3B outperformed human summaries in preserving contextual meaning and essential information.
- The top-performing models showed consistent results across multiple runs and metrics like ROUGE, BLEU, and BERTScore.
- Using SLMs can reduce information overload and improve audit efficiency for public health institutions.

## Abstract

Fraud and corruption are among the main crimes affecting public institutions, with the healthcare sector being particularly vulnerable due to its structural complexity, the coexistence of public and private providers, the large number of actors involved, the globalized nature of supply chains, the high financial costs, and the information asymmetry among stakeholders. These factors weaken healthcare systems, resulting in resource waste, reduced resilience during medical emergencies, and limited access to essential services.

This study aims to evaluate automatic text summarization methods by comparing the quality of machine-generated summaries with those produced by humans, from the perspective of Data Scientists and SUS Auditors, within the context of audits carried out by the National Department of Unified Health System (Sistema Único de Saúde—SUS) Auditing (AudSUS).

A controlled experiment was conducted to assess the performance of Small Language Models (SLMs) in summarization tasks, using the metrics ROUGE-N, ROUGE-L, BLEU, METEOR, and BERTScore. In addition, the consistency of results across 35 runs, their contribution to reducing information overload, and their pairwise performances were evaluated.

The models NousResearch/Hermes-3-Llama-3.2-3B, Qwen/Qwen2.5-7B-Instruct, and meta-llama/Llama-3.2-3B-Instruct achieved the highest average performances across all metrics, standing out for their ability to preserve contextual meaning and synthesize essential information more effectively than human-generated summaries.

The findings highlight the potential of SLMs as tools to reduce information overload, thereby enhancing the effectiveness of the analytical phase of audits and enabling faster preparation of teams for the operational stage.

## Full-text entities

- **Diseases:** LLMs (MESH:D007806), ROUGE-N (MESH:D016773)
- **Chemicals:** maritaca (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12916652/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12916652/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/PMC12916652/full.md

---
Source: https://tomesphere.com/paper/PMC12916652