Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels
Jianhao Yan, Pingchuan Yan, Yulong Chen, Jing Li, Xianchao Zhu, Yue, Zhang

TL;DR
This paper systematically compares GPT-4's translation performance with human translators across multiple languages, domains, and expertise levels, revealing GPT-4's strengths and limitations in real-world translation tasks.
Contribution
It provides the first comprehensive evaluation of GPT-4 against human translators across languages, domains, and proficiency levels, highlighting its consistent quality and unique translation patterns.
Findings
GPT-4 performs comparably to junior human translators.
GPT-4 maintains consistent quality across resource-poor languages.
Human translators sometimes hallucinate or over-interpret context.
Abstract
This study presents a comprehensive evaluation of GPT-4's translation capabilities compared to human translators of varying expertise levels. Through systematic human evaluation using the MQM schema, we assess translations across three language pairs (ChineseEnglish, RussianEnglish, and ChineseHindi) and three domains (News, Technology, and Biomedical). Our findings reveal that GPT-4 achieves performance comparable to junior-level translators in terms of total errors, while still lagging behind senior translators. Unlike traditional Neural Machine Translation systems, which show significant performance degradation in resource-poor language directions, GPT-4 maintains consistent translation quality across all evaluated language pairs. Through qualitative analysis, we identify distinctive patterns in translation approaches:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education
MethodsAttention Is All You Need · Dense Connections · Label Smoothing · Layer Normalization · Dropout · Adam · Residual Connection · Byte Pair Encoding · Linear Layer · Softmax
