Capabilities of GPT-5 across critical domains: Is it the next breakthrough?
Georgios P. Georgiou

TL;DR
This study systematically compares GPT-4 and GPT-5 across multiple domains, showing GPT-5's superior performance in education, clinical diagnosis, research, and ethics, indicating its potential as a domain-specific AI tool.
Contribution
First empirical comparison of GPT-4 and GPT-5 across diverse practical domains using expert human ratings, demonstrating GPT-5's enhanced capabilities.
Findings
GPT-5 outperforms GPT-4 in lesson planning and clinical diagnosis.
GPT-5 shows improved research generation and ethical reasoning.
Both models perform similarly in assignment evaluation.
Abstract
The accelerated evolution of large language models has raised questions about their comparative performance across domains of practical importance. GPT-4 by OpenAI introduced advances in reasoning, multimodality, and task generalization, establishing itself as a valuable tool in education, clinical diagnosis, and academic writing, though it was accompanied by several flaws. Released in August 2025, GPT-5 incorporates a system-of-models architecture designed for task-specific optimization and, based on both anecdotal accounts and emerging evidence from the literature, demonstrates stronger performance than its predecessor in medical contexts. This study provides one of the first systematic comparisons of GPT-4 and GPT-5 using human raters from linguistics and clinical fields. Twenty experts evaluated model-generated outputs across five domains: lesson planning, assignment evaluation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
