ProSocialAlign: Preference Conditioned Test Time Alignment in Language Models

Somnath Banerjee; Sayan Layek; Sayantan Adak; Mykola Pechenizkiy; Animesh Mukherjee; Rima Hazra

arXiv:2512.06515·cs.CL·December 9, 2025

ProSocialAlign: Preference Conditioned Test Time Alignment in Language Models

Somnath Banerjee, Sayan Layek, Sayantan Adak, Mykola Pechenizkiy, Animesh Mukherjee, Rima Hazra

PDF

Open Access

TL;DR

ProSocialAlign is a test-time, parameter-efficient framework that enhances language model safety and alignment by steering responses toward safety and empathy without retraining, using constrained generation and preference modeling.

Contribution

It introduces a novel, modular approach combining harm mitigation and preference-aware decoding for safer, more aligned language model outputs at inference time.

Findings

01

Achieves state-of-the-art safety performance across benchmarks.

02

Reduces unsafe content leakage effectively.

03

Improves alignment with human values.

Abstract

Current language model safety paradigms often fall short in emotionally charged or high-stakes settings, where refusal-only approaches may alienate users and naive compliance can amplify risk. We propose ProSocialAlign, a test-time, parameter-efficient framework that steers generation toward safe, empathetic, and value-aligned responses without retraining the base model. We formalize five human-centered objectives and cast safety as lexicographic constrained generation: first, applying hard constraints to eliminate harmful continuations; then optimizing for prosocial quality within the safe set. Our method combines (i) directional regulation, a harm-mitigation mechanism that subtracts a learned "harm vector" in parameter space, and (ii) preference-aware autoregressive reward modeling trained jointly across attributes with gradient conflict resolution, enabling fine-grained,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)