Are Aligned Large Language Models Still Misaligned?

Usman Naseem; Gautam Siddharth Kashyap; Rafiq Ali; Ebad Shabbir; Sushant Kumar Ray; Abdullah Mohammad; Agrima Seth

arXiv:2602.11305·cs.CL·February 13, 2026

Are Aligned Large Language Models Still Misaligned?

Usman Naseem, Gautam Siddharth Kashyap, Rafiq Ali, Ebad Shabbir, Sushant Kumar Ray, Abdullah Mohammad, Agrima Seth

PDF

Open Access

TL;DR

This paper introduces Mis-Align Bench, a comprehensive benchmark for evaluating large language models across safety, value, and cultural misalignments, revealing significant challenges in achieving multi-dimensional alignment.

Contribution

The paper presents a new unified benchmark and dataset for multi-dimensional misalignment analysis in LLMs, addressing limitations of existing single-dimension benchmarks.

Findings

01

High coverage of individual dimensions (up to 97.6%)

02

False failure rate exceeds 50% in joint evaluations

03

Alignment scores range from 63% to 66% under combined conditions

Abstract

Misalignment in Large Language Models (LLMs) arises when model behavior diverges from human expectations and fails to simultaneously satisfy safety, value, and cultural dimensions, which must co-occur in real-world settings to solve a real-world query. Existing misalignment benchmarks-such as INSECURE CODE (safety-centric), VALUEACTIONLENS (value-centric), and CULTURALHERITAGE (culture centric)-rely on evaluating misalignment along individual dimensions, preventing simultaneous evaluation. To address this gap, we introduce Mis-Align Bench, a unified benchmark for analyzing misalignment across safety, value, and cultural dimensions. First we constructs SAVACU, an English misaligned-aligned dataset of 382,424 samples spanning 112 domains (or labels), by reclassifying prompts from the LLM-PROMPT-DATASET via taxonomy into 14 safety domains, 56 value domains, and 42 cultural domains using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Computational and Text Analysis Methods