Dual Optimal: Make Your LLM Peer-like with Dignity
Xiangqi Wang, Yue Huang, Haomin Zhuang, Kehan Guo, Xiangliang Zhang

TL;DR
This paper introduces the Dignified Peer framework to develop language models that are trustworthy, empathetic, and peer-like, addressing issues of sycophancy and evasiveness in current models.
Contribution
It presents novel datasets, algorithms, and evaluation protocols to enhance LLMs with dignity and peer qualities, overcoming data and bias challenges.
Findings
The approach reduces sycophantic behavior in LLMs.
It improves trustworthiness and empathy in model responses.
Empirical results show successful development of peer-like LLM agents.
Abstract
Current aligned language models exhibit a dual failure mode we term the Evasive Servant: they sycophantically validate flawed user beliefs while deflecting responsibility with boilerplate disclaimers. We propose the Dignified Peer framework, which counters servility with anti-sycophancy and trustworthiness, and mitigates evasiveness through empathy and creativity. Realizing this agent requires overcoming significant challenges in data supervision, objective collapse, and evaluation bias. We address these issues by introducing the PersonaKnob dataset which features a compositional partial order structure of multiple persona preference. This data is utilized alongside a tolerant constrained Lagrangian DPO algorithm that dynamically balances all persona dimensions to prevent behavioral collapse. Additionally, we employ a psychometrically calibrated Item Response Theory evaluation protocol…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
