SDA: Steering-Driven Distribution Alignment for Open LLMs without Fine-Tuning

Wei Xia; Zhi-Hong Deng

arXiv:2511.16324·cs.CL·November 21, 2025

SDA: Steering-Driven Distribution Alignment for Open LLMs without Fine-Tuning

Wei Xia, Zhi-Hong Deng

PDF

Open Access 1 Video

TL;DR

SDA is a training-free, model-agnostic framework that dynamically aligns open-source LLM outputs with human intent during inference, improving helpfulness, harmlessness, and honesty without retraining.

Contribution

SDA introduces a novel, lightweight, and resource-efficient method for aligning LLM behavior with human preferences during inference without fine-tuning.

Findings

01

Achieves 64.4% improvement in helpfulness

02

Increases honesty by 30%

03

Enhances harmlessness by 11.5% across 8 open-source LLMs

Abstract

With the rapid advancement of large language models (LLMs), their deployment in real-world applications has become increasingly widespread. LLMs are expected to deliver robust performance across diverse tasks, user preferences, and practical scenarios. However, as demands grow, ensuring that LLMs produce responses aligned with human intent remains a foundational challenge. In particular, aligning model behavior effectively and efficiently during inference, without costly retraining or extensive supervision, is both a critical requirement and a non-trivial technical endeavor. To address the challenge, we propose SDA (Steering-Driven Distribution Alignment), a training-free and model-agnostic alignment framework designed for open-source LLMs. SDA dynamically redistributes model output probabilities based on user-defined alignment instructions, enhancing alignment between model behavior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SDA: Steering-Driven Distribution Alignment for Open LLMs Without Fine-Tuning· underline

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Natural Language Processing Techniques