How Does Prefix Matter in Reasoning Model Tuning?
Raj Vardhan Tomar, Preslav Nakov, Yuxia Wang

TL;DR
This paper investigates how including reasoning and safety prefixes in fine-tuning datasets influences model performance, finding that prefixes improve safety and reasoning but may slightly hinder factuality and coding tasks.
Contribution
The study demonstrates that prefix conditioning acts as an implicit alignment method, enhancing safety and reasoning in models through systematic prefix inclusion during fine-tuning.
Findings
Prefix-conditioned SFT improves safety metrics by up to 6%.
Prefix inclusion enhances reasoning accuracy by 7% on GSM8K.
Prefix tokens like 'revised' stabilize reasoning trajectories.
Abstract
Recent alignment studies commonly remove introductory boilerplate phrases from supervised fine-tuning (SFT) datasets. This work challenges that assumption. We hypothesize that safety- and reasoning-oriented prefix sentences serve as lightweight alignment signals that can guide model decoding toward safer and more coherent responses. To examine this, we fine-tune three R1 series models across three core model capabilities: reasoning (mathematics, coding), safety, and factuality, systematically varying prefix inclusion from 0% to 100%. Results show that prefix-conditioned SFT improves both safety and reasoning performance, yielding up to +6% higher Safe@1 accuracy on adversarial benchmarks (WildJailbreak, StrongReject) and +7% improvement on GSM8K reasoning. However, factuality and coding tasks show marginal or negative effects, indicating that prefix-induced narrowing of the search…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
