FAIRGAMER: Evaluating Social Biases in LLM-Based Video Game NPCs
Bingkang Shi, Jen-tse Huang, Long Luo, Tianyu Zong, Hongzhu Yi, Yuanxiang Wang, Songlin Hu, Xiaodan Zhang, Zhongjiang Yao

TL;DR
This paper introduces FairGamer, a benchmark for evaluating social biases in LLM-based video game NPCs across various interaction types and bias categories, revealing that larger models tend to exhibit more severe biases.
Contribution
The paper presents the first comprehensive benchmark, FairGamer, for assessing social biases in LLM-driven NPCs, including a novel metric and evaluation across multiple models and bias types.
Findings
Models show biased decision-making in NPC interactions.
Larger LLMs tend to have more severe social biases.
Grok-4-Fast exhibits the highest bias among evaluated models.
Abstract
Large Language Models (LLMs) have increasingly enhanced or replaced traditional Non-Player Characters (NPCs) in video games. However, these LLM-based NPCs inherit underlying social biases (e.g., race or class), posing fairness risks during in-game interactions. To address the limited exploration of this issue, we introduce FairGamer, the first benchmark to evaluate social biases across three interaction patterns: transaction, cooperation, and competition. FairGamer assesses four bias types, including class, race, age, and nationality, across 12 distinct evaluation tasks using a novel metric, FairMCV. Our evaluation of seven frontier LLMs reveals that: (1) models exhibit biased decision-making, with Grok-4-Fast demonstrating the highest bias (average FairMCV = 76.9%); and (2) larger LLMs display more severe social biases, suggesting that increased model capacity inadvertently amplifies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
