# Can ChatGPT write better scientific titles? A comparative evaluation of human-written and AI-generated titles

**Authors:** Paul Sebo, Bing Nie, Ting Wang, Renz Alvin Gabay, Aaron A. Funa

PMC · DOI: 10.12688/f1000research.173647.1 · 2025-12-30

## TL;DR

This study shows that AI-generated scientific titles are rated higher in accuracy and appeal than human-written ones by researchers.

## Contribution

The first blinded comparative evaluation of AI-generated versus human-written scientific titles using researcher ratings.

## Key findings

- AI-generated titles scored significantly higher in perceived accuracy and appeal than human-written titles.
- Researchers were 1.7 times more likely to prefer AI-generated titles over human-written ones.
- Moderate to substantial inter-rater agreement was observed in the evaluations.

## Abstract

Large language models (LLMs) such as GPT-4 are increasingly used in scientific writing, yet little is known about how AI-generated scientific titles are perceived by researchers in terms of quality.

To compare the perceived accuracy, appeal, and overall preference for AI-generated versus human-written scientific titles.

We conducted a blinded comparative study with 21 researchers from diverse academic backgrounds. A random sample of 50 original titles was selected from 10 high-impact general internal medicine journals. For each title, an alternative version was generated using GPT-4.0. Each rater evaluated 50 pairs of titles, each pair consisting of one original and one AI-generated version, without knowing the source of the titles or the purpose of the study. For each pair, raters independently assessed both titles on perceived accuracy and appeal, and indicated their overall preference. We analyzed accuracy/appeal using Wilcoxon signed-rank tests and negative binomial models, preferences using McNemar’s test and mixed-effects logistic regression, and inter-rater agreement with Gwet’s AC.

AI-generated titles received significantly higher ratings for both perceived accuracy (mean=7.9 vs. 6.7,
p-value
<0.001) and appeal (mean=7.1 vs. 6.7,
p-value
<0.001) than human-written titles. The odds of preferring an AI-generated title were 1.7 times higher (
p-value
=0.001), with 61.8% of 1,049 paired judgments favoring the AI version. Inter-rater agreement was moderate to substantial (Gwet’s AC: 0.54–0.70).

AI-generated titles can surpass human-written titles in perceived accuracy, appeal, and preference, suggesting that LLMs may enhance the effectiveness of scientific communication. These findings support the responsible integration of AI tools in research.

## Full-text entities

- **Diseases:** LLMs (MESH:D007806)
- **Chemicals:** AC (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12982995/full.md

---
Source: https://tomesphere.com/paper/PMC12982995