What properties of reasoning supervision are associated with improved downstream model quality?

Miko{\l}aj Langner; Dzmitry Pihulski; Jan Eliasz; Micha{\l} Rajkowski; Przemys{\l}aw Kazienko; Maciej Piasecki; Jan Koco\'n; Teddy Ferdinan

arXiv:2605.13290·cs.AI·May 14, 2026

What properties of reasoning supervision are associated with improved downstream model quality?

Miko{\l}aj Langner, Dzmitry Pihulski, Jan Eliasz, Micha{\l} Rajkowski, Przemys{\l}aw Kazienko, Maciej Piasecki, Jan Koco\'n, Teddy Ferdinan

PDF

TL;DR

This paper explores intrinsic data metrics that can predict the usefulness of reasoning datasets for training large language models, reducing the need for costly trial-and-error validation.

Contribution

It introduces a set of quantitative measures that correlate with downstream performance and reveals scale-dependent differences in data utility predictors.

Findings

01

Intrinsic metrics strongly correlate with model performance.

02

Smaller models rely on alignment-focused metrics for data validation.

03

Larger models benefit from redundancy and verbose traces.

Abstract

Validating training data for reasoning models typically requires expensive trial-and-error fine-tuning cycles. In this work, we investigate whether the utility of a reasoning dataset can be reliably predicted prior to training using intrinsic data metrics. We propose a suite of quantitative measures and evaluate their predictive power by fine-tuning 8B and 11B models on semantically distinct variants of a Polish reasoning dataset. Our analysis reveals that these intrinsic metrics demonstrate strong and significant correlations with downstream model performance. Crucially, we find that the predictors of utility are scale-dependent: smaller models rely on alignment-focused metrics to ensure precision, whereas larger models benefit from high redundancy, utilizing verbose traces to solve complex tasks. These findings establish a scale-aware framework for validating reasoning data, enabling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.