Are Aligned Large Language Models Still Misaligned?
Usman Naseem, Gautam Siddharth Kashyap, Rafiq Ali, Ebad Shabbir, Sushant Kumar Ray, Abdullah Mohammad, Agrima Seth

TL;DR
This paper introduces Mis-Align Bench, a comprehensive benchmark for evaluating large language models across safety, value, and cultural misalignments, revealing significant challenges in achieving multi-dimensional alignment.
Contribution
The paper presents a new unified benchmark and dataset for multi-dimensional misalignment analysis in LLMs, addressing limitations of existing single-dimension benchmarks.
Findings
High coverage of individual dimensions (up to 97.6%)
False failure rate exceeds 50% in joint evaluations
Alignment scores range from 63% to 66% under combined conditions
Abstract
Misalignment in Large Language Models (LLMs) arises when model behavior diverges from human expectations and fails to simultaneously satisfy safety, value, and cultural dimensions, which must co-occur in real-world settings to solve a real-world query. Existing misalignment benchmarks-such as INSECURE CODE (safety-centric), VALUEACTIONLENS (value-centric), and CULTURALHERITAGE (culture centric)-rely on evaluating misalignment along individual dimensions, preventing simultaneous evaluation. To address this gap, we introduce Mis-Align Bench, a unified benchmark for analyzing misalignment across safety, value, and cultural dimensions. First we constructs SAVACU, an English misaligned-aligned dataset of 382,424 samples spanning 112 domains (or labels), by reclassifying prompts from the LLM-PROMPT-DATASET via taxonomy into 14 safety domains, 56 value domains, and 42 cultural domains using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Computational and Text Analysis Methods
