Making Knowledge Accessible: Divergent Readability-Accuracy Strategies of Mistral and QWen in Biomedical Text Simplification
P. Bilha Githinji, Aikaterini Melliou, Zeming Liang, Lian Zhang, Peiwu Qin

TL;DR
This study compares two large language models, Mistral and QWen, in biomedical text simplification, analyzing their strategies and performance against human benchmarks.
Contribution
It provides an empirical analysis of how instruction-tuned LLMs differ in simplifying biomedical text, highlighting operational strategies and metric redundancies.
Findings
Mistral improves readability while maintaining discourse fidelity (BERTScore 0.91).
QWen achieves readability gains but shows a trade-off with accuracy (BERTScore 0.89).
Strong redundancies found among 21 evaluation metrics.
Abstract
The growing public demand for accessible biomedical information calls for scalable text simplification. While large language models (LLMs) offer solutions, they too struggle with balancing improved readability against preservation of meaning. This report empirically compares how two LLMs - instruction-tuned Mistral-Small 3 24B and the reasoning-augmented QWen2.5 32B- navigate this trade-off in biomedical text simplification, benchmarked against human performance. Our analysis highlights how each model applies distinct operational strategies when simplifying biomedical text. Mistral exhibits a tempered lexical simplification approach that consistently enhances readability across multiple metrics while preserving discourse fidelity (BERTScore: 0.91, statistically comparable to that of humans). In comparison, QWen also attains enhanced readability performance and a reasonable BERTScore of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
