AlignBench: Benchmarking Chinese Alignment of Large Language Models
Xiao Liu, Xuanyu Lei, Shengyuan Wang, Yue Huang, Zhuoer Feng, Bosi, Wen, Jiale Cheng, Pei Ke, Yifan Xu, Weng Lam Tam, Xiaohan Zhang, Lichao Sun,, Xiaotao Gu, Hongning Wang, Jing Zhang, Minlie Huang, Yuxiao Dong, Jie Tang

TL;DR
AlignBench is a comprehensive benchmark designed to evaluate the alignment of Chinese large language models across multiple dimensions, utilizing human-verified data and an LLM-based evaluation approach for high reliability.
Contribution
This paper introduces AlignBench, the first multi-dimensional Chinese LLM alignment benchmark with a human-in-the-loop data curation pipeline and an LLM-as-Judge evaluation method.
Findings
AlignBench covers 683 real-scenario queries with verified references.
It employs a rule-calibrated LLM-as-Judge for reliable evaluation.
AlignBench has been adopted by top Chinese LLMs for alignment assessment.
Abstract
Alignment has become a critical step for instruction-tuned Large Language Models (LLMs) to become helpful assistants. However, the effective evaluation of alignment for emerging Chinese LLMs is still largely unexplored. To fill in this gap, we introduce AlignBench, a comprehensive multi-dimensional benchmark for evaluating LLMs' alignment in Chinese. We design a human-in-the-loop data curation pipeline, containing eight main categories, 683 real-scenario rooted queries and corresponding human verified references. To ensure the correctness of references, each knowledge-intensive query is accompanied with evidences collected from reliable web sources (including URLs and quotations) by our annotators. For automatic evaluation, our benchmark employs a rule-calibrated multi-dimensional LLM-as-Judge~\cite{zheng2023judging} approach with Chain-of-Thought to generate explanations and final…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education
