Training with Harnesses: On-Policy Harness Self-Distillation for Complex Reasoning

Zhengyang Zhao; Lu Ma; Wentao Zhang

arXiv:2605.08741·cs.CL·May 12, 2026

Training with Harnesses: On-Policy Harness Self-Distillation for Complex Reasoning

Zhengyang Zhao, Lu Ma, Wentao Zhang

PDF

1 Repo

TL;DR

This paper introduces On-Policy Harness Self-Distillation (OPHSD), a method that internalizes harness-augmented capabilities into language models to improve complex reasoning without relying on external workflows during inference.

Contribution

The paper proposes OPHSD, a novel self-distillation approach that enhances model reasoning by internalizing harness capabilities, outperforming existing methods on reasoning tasks.

Findings

01

OPHSD outperforms strong baselines, e.g., +10.83% over OPSD on HMMT25.

02

Reattaching the harness during inference offers no benefits and can degrade performance.

03

Harnesses can serve as temporary training scaffolds, with benefits fed back into the base model.

Abstract

Inference-time harnesses substantially improve large language models on complex reasoning tasks. However, the intrinsic capabilities of the underlying model remain unchanged by the addition of these external workflows. To bridge this gap, we introduce \emph{On-Policy Harness Self-Distillation} (OPHSD), which employs the harness-augmented current model as a teacher for self-distillation, thereby introducing extra supervisory signals from the harness beyond training data. OPHSD internalizes task-specific harness capabilities into the student model, yielding robust generalizability and strong standalone performance across diverse reasoning tasks. Evaluated across draft--verify harness for text classification and plan--solve for mathematical reasoning tasks, OPHSD consistently outperforms strong baselines (e.g., +10.83\% over OPSD on HMMT25). Our analysis further indicates that reattaching…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zzy1127/OPHSD-On-Policy-Harness-Self-Distillation
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.