RoleEval: A Bilingual Role Evaluation Benchmark for Large Language Models
Tianhao Shen, Sun Li, Quan Tu, Deyi Xiong

TL;DR
RoleEval is a bilingual benchmark designed to evaluate large language models' knowledge and reasoning about characters from diverse domains, highlighting differences in cultural and language-specific knowledge.
Contribution
This paper introduces RoleEval, a comprehensive bilingual benchmark with 6,000 questions to assess role knowledge in large language models across multiple languages and domains.
Findings
GPT-4 outperforms others on RoleEval-Global
Chinese LLMs excel on RoleEval-Chinese
Knowledge distribution varies across models and languages
Abstract
The rapid evolution of large language models necessitates effective benchmarks for evaluating their role knowledge, which is essential for establishing connections with the real world and providing more immersive interactions. This paper introduces RoleEval, a bilingual benchmark designed to assess the memorization, utilization, and reasoning capabilities of role knowledge. RoleEval comprises RoleEval-Global (including internationally recognized characters) and RoleEval-Chinese (including characters popular in China), with 6,000 Chinese-English parallel multiple-choice questions focusing on 300 influential people and fictional characters drawn from a variety of domains including celebrities, anime, comics, movies, TV series, games, and fictions. These questions cover basic knowledge and multi-hop reasoning abilities, aiming to systematically probe various aspects such as personal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Machine Learning in Healthcare
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Dropout · Layer Normalization · Residual Connection · Byte Pair Encoding
