The Realignment Problem: When Right becomes Wrong in LLMs

Aakash Sen Sharma; Debdeep Sanyal; Manodeep Ray; Vivek Srivastava; Shirish Karande; Murari Mandal

arXiv:2511.02623·cs.CL·May 12, 2026

The Realignment Problem: When Right becomes Wrong in LLMs

Aakash Sen Sharma, Debdeep Sanyal, Manodeep Ray, Vivek Srivastava, Shirish Karande, Murari Mandal

PDF

1 Repo

TL;DR

The paper introduces TRACE, a scalable framework for realigning large language models by optimizing existing data to address evolving alignment policies without requiring new human annotations.

Contribution

TRACE transforms realignment into an optimization problem over existing data, reducing reliance on re-annotation and handling evolving alignment guidelines effectively.

Findings

01

Demonstrates robust realignment on multiple LLMs and datasets.

02

Maintains general utility while improving alignment with policy changes.

03

Operates effectively without additional human annotation.

Abstract

Post-training alignment of large language models (LLMs) relies on large-scale human annotations guided by policy specifications that change over time. Cultural shifts, value reinterpretations, and regulatory or industrial updates make static alignment increasingly brittle. As policies evolve, deployed models can diverge from current alignment objectives, creating an Alignment-Reality Gap that is difficult to audit or correct. Existing remediation typically requires re-annotation under revised guidelines, which introduces systematic challenges, including guideline ambiguity, annotator interpretation drift, and reduced consistency at scale. We introduce TRACE (Triage and Re-align by Alignment Conflict Evaluation), a framework that transforms realignment into a structured optimization problem over existing data without requiring fresh human annotation. Leveraging a stronger model as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://respailab.github.io/TRACE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.