VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment
Jiawei Chen, Tianzhuo Yang, Guoxi Zhang, Jiaming Ji, Yaodong Yang, Juntao Dai

TL;DR
VISA introduces a novel framework for fine-grained, precise value alignment in LLMs that mitigates the alignment tax and preserves semantic integrity, outperforming traditional fine-tuning and prompting methods.
Contribution
The paper presents VISA, a new closed-loop framework with a value detector, translator, and rewriter, trained via GRPO to balance value precision and semantic preservation in LLMs.
Findings
VISA achieves better value alignment with less semantic drift.
It outperforms standard fine-tuning and prompting baselines.
The approach maintains factual consistency and general capabilities.
Abstract
Aligning Large Language Models (LLMs) with nuanced human values remains a critical challenge, as existing methods like Reinforcement Learning from Human Feedback (RLHF) often handle only coarse-grained attributes. In practice, fine-tuning LLMs on task-specific datasets to optimize value alignment inevitably incurs an alignment tax: the model's pre-calibrated value system drifts significantly due to latent bias absorption from training data, while the fine-tuning process also causes severe hallucinations and semantic information loss in generated responses. To address this, we propose VISA (Value Injection via Shielded Adaptation), a closed-loop framework designed to navigate this trade-off. VISA's architecture features a high-precision value detector, a semantic-to-value translator, and a core value-rewriter. The value-rewriter is trained via Group Relative Policy Optimization (GRPO)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
