Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Han Jiang, Xiaoyuan Yi, Zhihua Wei, Ziang Xiao, Shu Wang, Xing Xie

TL;DR
This paper introduces GETA, a dynamic testing method that evolves with large language models to better assess their ethical boundaries and reduce evaluation biases caused by static benchmarks.
Contribution
GETA is a novel generative evolving testing approach that dynamically generates test items tailored to LLM capabilities, addressing evaluation chronoeffect and improving assessment accuracy.
Findings
GETA can generate difficulty-tailored test items.
GETA provides more consistent evaluation results on unseen data.
GETA effectively addresses static benchmark limitations.
Abstract
Warning: Contains harmful model outputs. Despite significant advancements, the propensity of Large Language Models (LLMs) to generate harmful and unethical content poses critical challenges. Measuring value alignment of LLMs becomes crucial for their regulation and responsible deployment. Although numerous benchmarks have been constructed to assess social bias, toxicity, and ethical issues in LLMs, those static benchmarks suffer from evaluation chronoeffect, in which, as models rapidly evolve, existing benchmarks may leak into training data or become saturated, overestimating ever-developing LLMs. To tackle this problem, we propose GETA, a novel generative evolving testing approach based on adaptive testing methods in measurement theory. Unlike traditional adaptive testing methods that rely on a static test item pool, GETA probes the underlying moral boundaries of LLMs by dynamically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Computational and Text Analysis Methods · Topic Modeling
