From Descriptive to Prescriptive: Uncover the Social Value Alignment of LLM-based Agents
Jinxian Qu, Qingqing Gu, Teng Chen, Luo Ji

TL;DR
This paper introduces a value-based framework using GraphRAG to align LLM-based agents with human social values, improving their decision-making and emotional self-awareness.
Contribution
The novel framework converts principles into value-based instructions, enhancing alignment and behavior prediction in LLM agents compared to prompt-based methods.
Findings
Significant performance improvements on DAILYDILEMMAS benchmark.
Effective use of Maslow's Hierarchy and Plutchik's Wheel for behavior evaluation.
Potential for AI self-emotion emergence.
Abstract
Wide applications of LLM-based agents require strong alignment with human social values. However, current works still exhibit deficiencies in self-cognition and dilemma decision, as well as self-emotions. To remedy this, we propose a novel value-based framework that employs GraphRAG to convert principles into value-based instructions and steer the agent to behave as expected by retrieving the suitable instruction upon a specific conversation context. To evaluate the ratio of expected behaviors, we define the expected behaviors from two famous theories, Maslow's Hierarchy of Needs and Plutchik's Wheel of Emotion. By experimenting with our method on the benchmark of DAILYDILEMMAS, our method exhibits significant performance gains compared to prompt-based baselines, including ECoT, Plan-and-Solve, and Metacognitive prompting. Our method provides a basis for the emergence of self-emotion in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
