RoleEval: A Bilingual Role Evaluation Benchmark for Large Language   Models

Tianhao Shen; Sun Li; Quan Tu; Deyi Xiong

arXiv:2312.16132·cs.CL·February 19, 2024·2 cites

RoleEval: A Bilingual Role Evaluation Benchmark for Large Language Models

Tianhao Shen, Sun Li, Quan Tu, Deyi Xiong

PDF

Open Access 1 Repo

TL;DR

RoleEval is a bilingual benchmark designed to evaluate large language models' knowledge and reasoning about characters from diverse domains, highlighting differences in cultural and language-specific knowledge.

Contribution

This paper introduces RoleEval, a comprehensive bilingual benchmark with 6,000 questions to assess role knowledge in large language models across multiple languages and domains.

Findings

01

GPT-4 outperforms others on RoleEval-Global

02

Chinese LLMs excel on RoleEval-Chinese

03

Knowledge distribution varies across models and languages

Abstract

The rapid evolution of large language models necessitates effective benchmarks for evaluating their role knowledge, which is essential for establishing connections with the real world and providing more immersive interactions. This paper introduces RoleEval, a bilingual benchmark designed to assess the memorization, utilization, and reasoning capabilities of role knowledge. RoleEval comprises RoleEval-Global (including internationally recognized characters) and RoleEval-Chinese (including characters popular in China), with 6,000 Chinese-English parallel multiple-choice questions focusing on 300 influential people and fictional characters drawn from a variety of domains including celebrities, anime, comics, movies, TV series, games, and fictions. These questions cover basic knowledge and multi-hop reasoning abilities, aiming to systematically probe various aspects such as personal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

magnetic2014/roleeval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Machine Learning in Healthcare

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Dropout · Layer Normalization · Residual Connection · Byte Pair Encoding