Plain language adaptations of biomedical text using LLMs: Comparision of evaluation metrics

Primoz Kocbek; Leon Kopitar; Gregor Stiglic

arXiv:2512.16530·cs.CL·December 19, 2025

Plain language adaptations of biomedical text using LLMs: Comparision of evaluation metrics

Primoz Kocbek, Leon Kopitar, Gregor Stiglic

PDF

Open Access

TL;DR

This study compares different LLM-based methods for simplifying biomedical texts to improve health literacy, evaluating their performance with various quantitative and qualitative metrics.

Contribution

It introduces and compares prompt-based, multi-agent, and fine-tuning approaches for biomedical text simplification using LLMs, highlighting the effectiveness of gpt-4o-mini.

Findings

01

gpt-4o-mini outperformed other models

02

Fine-tuning approaches underperformed

03

G-Eval metric aligned well with qualitative assessments

Abstract

This study investigated the application of Large Language Models (LLMs) for simplifying biomedical texts to enhance health literacy. Using a public dataset, which included plain language adaptations of biomedical abstracts, we developed and evaluated several approaches, specifically a baseline approach using a prompt template, a two AI agent approach, and a fine-tuning approach. We selected OpenAI gpt-4o and gpt-4o mini models as baselines for further research. We evaluated our approaches with quantitative metrics, such as Flesch-Kincaid grade level, SMOG Index, SARI, and BERTScore, G-Eval, as well as with qualitative metric, more precisely 5-point Likert scales for simplicity, accuracy, completeness, brevity. Results showed a superior performance of gpt-4o-mini and an underperformance of FT approaches. G-Eval, a LLM based quantitative metric, showed promising results, ranking the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Health Literacy and Information Accessibility · Artificial Intelligence in Healthcare and Education