SFT-GRPO Data Overlap as a Post-Training Hyperparameter for Autoformalization

Xiaole Su; Kasey Zhang; Andy Lyu

arXiv:2604.13515·cs.LG·April 16, 2026

SFT-GRPO Data Overlap as a Post-Training Hyperparameter for Autoformalization

Xiaole Su, Kasey Zhang, Andy Lyu

PDF

5 Models

TL;DR

This paper investigates how varying the overlap of data between supervised fine-tuning and group relative policy optimization affects autoformalization performance, revealing that less overlap improves accuracy.

Contribution

It provides the first controlled study on SFT-GRPO data overlap, showing that disjoint data enhances model performance and exposes semantic gaps.

Findings

01

Disjoint SFT and GRPO data outperform overlapping data at no extra cost.

02

Lower data overlap correlates with higher compilation and semantic accuracy.

03

Dual-metric evaluation uncovers significant compile-semantic gaps in models.

Abstract

Supervised fine-tuning (SFT) followed by Group Relative Policy Optimization (GRPO) is a common post-training recipe. We conduct a controlled ablation over SFT-GRPO data overlap, evaluating Qwen3-8B (thinking disabled) post-trained for Lean 4 autoformalization under six conditions that differ solely in training recipe: a base model, SFT-only, GRPO-only, and three SFT+GRPO configurations where 0 percent, 30 percent, or 100 percent of the GRPO prompts coincide with the SFT corpus. Keeping SFT and GRPO data disjoint consistently outperforms full overlap at zero additional compute cost. Evaluating on Gaokao-Formal and PutnamBench under both compile pass at k and semantic pass at k assessed by an LLM judge, we find that lower overlap is monotonically associated with higher compilation and semantic accuracy. At 0 percent overlap, GRPO yields a 10.4 percentage point semantic gain over SFT alone…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.