Benchmarking GPT-5 for biomedical natural language processing

Yu Hou; Zaifu Zhan; Min Zeng; Yifan Wu; Shuang Zhou; Rui Zhang

arXiv:2509.04462·cs.CL·October 24, 2025·2 cites

Benchmarking GPT-5 for biomedical natural language processing

Yu Hou, Zaifu Zhan, Min Zeng, Yifan Wu, Shuang Zhou, Rui Zhang

PDF

Open Access

TL;DR

This study benchmarks GPT-5 and GPT-4o on diverse biomedical NLP tasks, demonstrating GPT-5's superior performance, efficiency, and potential for deployment in complex biomedical applications.

Contribution

It extends a comprehensive benchmark to evaluate GPT-5 across multiple biomedical NLP tasks, highlighting its improved performance and cost-efficiency over GPT-4o.

Findings

01

GPT-5 outperforms GPT-4o on reasoning-intensive datasets

02

GPT-5 achieves better chemical NER and relation extraction scores

03

GPT-5 offers lower effective cost per correct prediction despite longer outputs

Abstract

Biomedical literature and clinical narratives pose multifaceted challenges for natural language understanding, from precise entity extraction and document synthesis to multi-step diagnostic reasoning. This study extends a unified benchmark to evaluate GPT-5 and GPT-4o under zero-, one-, and five-shot prompting across five core biomedical NLP tasks: named entity recognition, relation extraction, multi-label document classification, summarization, and simplification, and nine expanded biomedical QA datasets covering factual knowledge, clinical reasoning, and multimodal visual understanding. Using standardized prompts, fixed decoding parameters, and consistent inference pipelines, we assessed model performance, latency, and token-normalized cost under official pricing. GPT-5 consistently outperformed GPT-4o, with the largest gains on reasoning-intensive datasets such as MedXpertQA and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare