Can Large Language Models Make Everyone Happy?
Usman Naseem, Gautam Siddharth Kashyap, Ebad Shabbir, Sushant Kumar Ray, Abdullah Mohammad, Rafiq Ali

TL;DR
This paper introduces MisAlign-Profile, a comprehensive benchmark to measure and analyze the complex trade-offs in aligning large language models across safety, value, and cultural dimensions, addressing limitations of existing isolated benchmarks.
Contribution
It presents MISALIGNTRADE, a novel dataset and benchmark for systematically evaluating cross-dimensional misalignment trade-offs in LLMs, incorporating diverse domains and semantic types.
Findings
LLMs show 12%-34% misalignment trade-offs across dimensions.
Existing benchmarks lack cross-dimensional analysis.
MisAlign-Profile enables systematic evaluation of alignment trade-offs.
Abstract
Misalignment in Large Language Models (LLMs) refers to the failure to simultaneously satisfy safety, value, and cultural dimensions, leading to behaviors that diverge from human expectations in real-world settings where these dimensions must co-occur. Existing benchmarks, such as SAFETUNEBED (safety-centric), VALUEBENCH (value-centric), and WORLDVIEW-BENCH (culture-centric), primarily evaluate these dimensions in isolation and therefore provide limited insight into their interactions and trade-offs. More recent efforts, including MIB and INTERPRETABILITY BENCHMARK-based on mechanistic interpretability, offer valuable perspectives on model failures; however, they remain insufficient for systematically characterizing cross-dimensional trade-offs. To address these gaps, we introduce MisAlign-Profile, a unified benchmark for measuring misalignment trade-offs inspired by mechanistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Explainable Artificial Intelligence (XAI)
