# Distinguishing Between AI-Generated and Human-Written Electronic Residency Application Service (ERAS) Personal Statements in Otolaryngology

**Authors:** Rahul Menon, Donald Solomon, Mia Berenson, Vlad Kushnir, Yekaterina Shapiro

PMC · DOI: 10.7759/cureus.99202 · Cureus · 2025-12-14

## TL;DR

This study shows that AI can create convincing residency personal statements, but detection tools can still identify them.

## Contribution

The study evaluates AI-generated personal statements in otolaryngology residency applications and their detectability by experts and AI tools.

## Key findings

- AI-generated statements were rated similarly to human-written ones in readability, originality, and persuasiveness.
- An AI detection tool correctly flagged 93.6% of AI-generated text.
- The study was underpowered due to a small sample size, limiting statistical significance.

## Abstract

Background

Generative AI tools, such as ChatGPT, raise concerns about authenticity and fairness in residency applications. Since the transition of USMLE Step 1 to pass/fail, residency programs have placed greater emphasis on qualitative components such as the personal statement. The integrity of this document is critical, as it informs interview invitations and final rank decisions. However, it remains unclear whether reviewers can reliably distinguish AI-generated from human-written content and how this may affect the residency selection process.

Objective

In this small pilot study, we evaluated the ability of generative AI, specifically ChatGPT 4.0 (OpenAI, Inc., San Francisco, CA, USA), to produce convincing personal statements for otolaryngology residency applications and explored whether a limited sample of expert reviewers could distinguish AI-generated from human-written content. We used 2019 applicant-written statements because they predated the widespread availability of generative AI, ensuring a true “human-only” comparison group.

Methods

In 2024, ChatGPT was given a sophisticated and detailed prompt to generate five otolaryngology residency personal statements. These were combined with five de-identified, applicant-written statements submitted in 2019. Four otolaryngologists from a single academic medical center and four attorneys, blinded to the origin of the statements, rated them using a standardized rubric for readability, originality, persuasiveness, and interview desirability. An AI detection tool (Scribbr, Amsterdam, The Netherlands) also analyzed each statement. Statistical analyses included paired t-tests and inter-rater reliability assessment.

Results

Eight blinded reviewers (four otolaryngologists and four attorneys) evaluated 10 personal statements (five AI-generated and five human-written). Mean ratings for readability (AI 3.63 ± 0.22 vs. human 3.53 ± 0.40), originality (3.65 ± 0.16 vs. 3.53 ± 0.39), persuasiveness (3.35 ± 0.76 vs. 3.43 ± 0.36), and interview desirability (3.40 ± 0.51 vs. 3.30 ± 0.58) showed no significant differences (all P > 0.05), but given the small sample size, the study was underpowered (post hoc estimated power 14.1% for medium effect size), limiting the ability to detect statistically significant differences. Inter-rater reliability was moderate (intraclass correlation coefficient = 0.66), indicating consistent scoring trends across reviewers. The AI detection tool correctly flagged 93.6% of ChatGPT-generated text and had an AI probability rating ranging from 0% to 20% with a mean of 4% in the human-written text. Reviewer profession (medical vs. legal) did not significantly influence scoring patterns. Although underpowered due to a small sample size, the findings suggest that ChatGPT-generated personal statements are comparable in quality and perceived authenticity to those written by applicants, yet remain largely detectable by current AI-detection algorithms.

Conclusions

These findings suggest that ChatGPT can generate statements that may be difficult to distinguish from applicant-written work, but AI detection tools remain effective in identifying synthetic text. For residency programs, these results highlight the need for clear guidelines on AI use, integration of detection strategies, and ongoing research to understand how AI may affect perceptions of authenticity in applicant evaluation.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12799199/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12799199/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/PMC12799199/full.md

---
Source: https://tomesphere.com/paper/PMC12799199