GPTEval: A Survey on Assessments of ChatGPT and GPT-4
Rui Mao, Guanyi Chen, Xulang Zhang, Frank Guerin, Erik Cambria

TL;DR
This survey comprehensively reviews existing assessments of ChatGPT and GPT-4 across various domains, highlighting their capabilities, limitations, and evaluation methods, and offers recommendations for future research in large language model evaluation.
Contribution
It provides a thorough synthesis of prior assessment studies of ChatGPT and GPT-4, analyzing their performance, evaluation techniques, and ethical considerations, which was lacking in previous literature.
Findings
ChatGPT and GPT-4 show strong language and reasoning abilities.
Current evaluation methods have limitations and need improvement.
Recommendations are provided for future assessment research.
Abstract
The emergence of ChatGPT has generated much speculation in the press about its potential to disrupt social and economic systems. Its astonishing language ability has aroused strong curiosity among scholars about its performance in different domains. There have been many studies evaluating the ability of ChatGPT and GPT-4 in different tasks and disciplines. However, a comprehensive review summarizing the collective assessment findings is lacking. The objective of this survey is to thoroughly analyze prior assessments of ChatGPT and GPT-4, focusing on its language and reasoning abilities, scientific knowledge, and ethical considerations. Furthermore, an examination of the existing evaluation methods is conducted, offering several recommendations for future research in evaluating large language models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare
MethodsAttention Is All You Need · Linear Layer · Dropout · Byte Pair Encoding · Adam · Position-Wise Feed-Forward Layer · Multi-Head Attention · Layer Normalization · Absolute Position Encodings · Residual Connection
