Evaluating Large Language Models: A Comprehensive Survey
Zishan Guo, Renren Jin, Chuang Liu, Yufei Huang, Dan Shi, Supryadi,, Linhao Yu, Yan Liu, Jiaxuan Li, Bojian Xiong, Deyi Xiong

TL;DR
This survey provides a comprehensive overview of methods, benchmarks, and evaluations for large language models, emphasizing their capabilities, alignment, safety, and responsible development to maximize societal benefits.
Contribution
It categorizes and reviews evaluation methodologies for LLMs across capabilities, alignment, and safety, and discusses building comprehensive evaluation platforms.
Findings
Reviewed diverse evaluation benchmarks and methodologies.
Highlighted the importance of safety and alignment assessments.
Provided a curated list of related evaluation research.
Abstract
Large language models (LLMs) have demonstrated remarkable capabilities across a broad spectrum of tasks. They have attracted significant attention and been deployed in numerous downstream applications. Nevertheless, akin to a double-edged sword, LLMs also present potential risks. They could suffer from private data leaks or yield inappropriate, harmful, or misleading content. Additionally, the rapid progress of LLMs raises concerns about the potential emergence of superintelligent systems without adequate safeguards. To effectively capitalize on LLM capacities as well as ensure their safe and beneficial development, it is critical to conduct a rigorous and comprehensive evaluation of LLMs. This survey endeavors to offer a panoramic perspective on the evaluation of LLMs. We categorize the evaluation of LLMs into three major groups: knowledge and capability evaluation, alignment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Natural Language Processing Techniques
