Analyzing the Performance of GPT-3.5 and GPT-4 in Grammatical Error Correction
Steven Coyne, Keisuke Sakaguchi, Diana Galvan-Sosa, Michael Zock,, Kentaro Inui

TL;DR
This paper evaluates GPT-3.5 and GPT-4's effectiveness in grammatical error correction, revealing their strong performance and analyzing prompt strategies across major benchmarks with human assessments.
Contribution
It provides a detailed analysis of GPT models' GEC capabilities, including benchmark results, prompt comparisons, and human evaluation insights, which were previously underexplored.
Findings
GPT-4 achieves a new high score on JFLEG.
GPT models perform well in sentence-level GEC.
Human evaluations reveal differences in editing strategies.
Abstract
GPT-3 and GPT-4 models are powerful, achieving high performance on a variety of Natural Language Processing tasks. However, there is a relative lack of detailed published analysis of their performance on the task of grammatical error correction (GEC). To address this, we perform experiments testing the capabilities of a GPT-3.5 model (text-davinci-003) and a GPT-4 model (gpt-4-0314) on major GEC benchmarks. We compare the performance of different prompts in both zero-shot and few-shot settings, analyzing intriguing or problematic outputs encountered with different prompt formats. We report the performance of our best prompt on the BEA-2019 and JFLEG datasets, finding that the GPT models can perform well in a sentence-level revision setting, with GPT-4 achieving a new high score on the JFLEG benchmark. Through human evaluation experiments, we compare the GPT models' corrections to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Transformer · Discriminative Fine-Tuning · GPT-4
