Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach
Wei Lu, Amit Dhanda, Daniel L. Chen, Christian B. Hansen

TL;DR
This paper presents a supervised fine-tuning method to align large language model agents with economic and moral preferences, improving their strategic behavior in multi-agent environments.
Contribution
It introduces a novel fine-tuning approach using economic theory to shape LLM agent behavior according to explicit utility specifications.
Findings
Fine-tuning shifts strategic behavior towards specified economic preferences.
Aligned agents produce distinct equilibrium outcomes in moral dilemmas.
Behavioral deviations from payoff sensitivity are reduced through the method.
Abstract
As large language models (LLMs) increasingly act as autonomous agents in markets and organizations, their behavior in strategic environments becomes economically consequential. We document that off-the-shelf LLM agents exhibit systematic deviations from payoff-sensitive behavior in canonical economic games, including excessive cooperation and limited responsiveness to incentives. We introduce a supervised fine-tuning approach that aligns agent behavior with explicit economic preferences. Specifically, we generate optimal strategies under two stylized utility specifications, homo economicus, which maximizes self-interest, and homo moralis, which incorporates Kantian universalizability, and use these utility-implied reasoning and strategies to guide fine-tuning. Fine-tuning on a small, theory-driven synthetic dataset induces persistent and interpretable shifts in strategic behavior. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
