CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses
Jing Yao, Xiaoyuan Yi, Xing Xie

TL;DR
CLAVE introduces an adaptive, dual-model framework for evaluating LLM responses' values, effectively balancing generalizability and alignment with human values using minimal labeled data.
Contribution
The paper presents CLAVE, a novel framework combining large and small LLMs for robust, adaptable, reference-free value evaluation with minimal supervision.
Findings
Combining large and small models improves evaluation accuracy.
CLAVE effectively calibrates with fewer than 100 labels per value type.
Benchmark results highlight strengths and weaknesses of existing evaluators.
Abstract
The rapid progress in Large Language Models (LLMs) poses potential risks such as generating unethical content. Assessing LLMs' values can help expose their misalignment, but relies on reference-free evaluators, e.g., fine-tuned LLMs or close-source ones like GPT-4, to identify values reflected in generated responses. Nevertheless, these evaluators face two challenges in open-ended value evaluation: they should align with changing human value definitions with minimal annotation, against their own bias (adaptability), and detect varying value expressions and scenarios robustly (generalizability). To handle these challenges, we introduce CLAVE, a novel framework which integrates two complementary LLMs, a large one to extract high-level value concepts from a few human labels, leveraging its extensive knowledge and generalizability, and a smaller one fine-tuned on such concepts to better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsAttention Is All You Need · Residual Connection · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Adam · Dropout · Multi-Head Attention · Dense Connections
