Do Post-Training Algorithms Actually Differ? A Controlled Study Across Model Scales Uncovers Scale-Dependent Ranking Inversions

Xiaoyi Li

arXiv:2603.19335·cs.LG·March 23, 2026

Do Post-Training Algorithms Actually Differ? A Controlled Study Across Model Scales Uncovers Scale-Dependent Ranking Inversions

Xiaoyi Li

PDF

Open Access

TL;DR

This study presents a large-scale, controlled comparison of 51 post-training algorithms across different model sizes, revealing that algorithm rankings are highly scale-dependent and that model scale has the most significant impact on performance.

Contribution

Introduces OXRL, a unified framework for fair comparison of post-training algorithms, and provides the first large-scale analysis showing scale-dependent ranking inversions and task-specific algorithm leverage.

Findings

01

Algorithm rankings invert with model scale.

02

Loss function modifications have negligible effects.

03

Algorithm leverage varies significantly across tasks.

Abstract

Post-training alignment has produced dozens of competing algorithms -- DPO, SimPO, KTO, GRPO, and others -- yet practitioners lack controlled comparisons to guide algorithm selection. We present OXRL, a unified framework implementing 51 post-training algorithms with identical infrastructure, enabling the first large-scale apples-to-apples evaluation. Our study spans 8 algorithms across 4 model scales (0.5B--7B), 3 evaluation domains, and a 20-variant DPO taxonomy (100 runs at 1.5B, 5 seeds each), totaling $\sim$ 240 training runs on H100 GPUs. Three headline findings emerge. (1)~Algorithm rankings are unstable across scale: at 1.5B, online RL (SGRPO) tops all methods at 58.0\%~ $\pm$ 0.57 on GSM8K; by 7B, the worst small-scale method (SimPO) becomes the best (85.8\%), a complete ranking inversion driven by model scale rather than LoRA regularization (confirmed via 2 $\times$ 2 factorial).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques