Dr. Post-Training: A Data Regularization Perspective on LLM Post-Training
Pingbang Hu, Xueshen Liu, Z. Morley Mao, Jiaqi W. Ma

TL;DR
This paper introduces Dr. Post-Training, a novel framework that treats general training data as a regularizer to improve LLM post-training, outperforming existing data selection methods with flexible bias-variance tradeoffs.
Contribution
It reconceptualizes data selection as a data-induced regularizer, offering a new flexible framework and practical methods for LLM post-training that outperform state-of-the-art baselines.
Findings
Methods outperform data selection baselines across SFT, RLHF, and RLVR tasks.
Proposed system optimizations enable minimal overhead at LLM scale.
Framework offers a richer design space with adjustable bias-variance tradeoffs.
Abstract
Data selection methods address a critical challenge in LLM post-training: effectively leveraging scarce, high-fidelity target data alongside abundant but imperfectly aligned general training data. In this work, we move beyond the data-selection framing and introduce Dr. Post-Training (Data-Regularized Post-Training), a novel framework that reconceptualizes general training data as a data-induced regularizer that prevents overfitting to the scarce target objective, rather than serving as a pool for selection. Specifically, our framework proposes that at each training step, construct a feasible set of model update directions using the general training data, and project the model update direction specified by the scarce target data onto that feasible set. Standard training and existing data selection methods arise as special cases with different choices of the data-induced regularizer, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
