ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based   Evaluation

Jingnan Zheng; Han Wang; An Zhang; Tai D. Nguyen; Jun Sun; Tat-Seng; Chua

arXiv:2405.14125·cs.AI·November 8, 2024·3 cites

ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation

Jingnan Zheng, Han Wang, An Zhang, Tai D. Nguyen, Jun Sun, Tat-Seng, Chua

PDF

Open Access 1 Repo 1 Video

TL;DR

ALI-Agent introduces an adaptive, agent-based evaluation framework that automates and refines testing of LLMs' alignment with human values, effectively identifying misalignments and long-tail risks.

Contribution

This work presents ALI-Agent, a novel framework leveraging LLM-powered agents for dynamic, scalable, and in-depth assessment of LLMs' alignment with human values, overcoming static benchmark limitations.

Findings

01

Effectively identifies model misalignment in stereotypes, morality, and legality.

02

Generates meaningful, diverse test scenarios for real-world use cases.

03

Probes long-tail risks with enhanced scenario refinement.

Abstract

Large Language Models (LLMs) can elicit unintended and even harmful content when misaligned with human values, posing severe risks to users and society. To mitigate these risks, current evaluation benchmarks predominantly employ expert-designed contextual scenarios to assess how well LLMs align with human values. However, the labor-intensive nature of these benchmarks limits their test scope, hindering their ability to generalize to the extensive variety of open-world use cases and identify rare but crucial long-tail risks. Additionally, these static tests fail to adapt to the rapid evolution of LLMs, making it hard to evaluate timely alignment issues. To address these challenges, we propose ALI-Agent, an evaluation framework that leverages the autonomous abilities of LLM-powered agents to conduct in-depth and adaptive alignment assessments. ALI-Agent operates through two principal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sophiezheng998/ali-agent
pytorchOfficial

Videos

ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation· slideslive

Taxonomy

TopicsSemantic Web and Ontologies · Business Process Modeling and Analysis

MethodsALIGN