TL;DR
This paper introduces Optimsyn, a novel influence-guided optimization framework that refines rubrics for synthetic data generation, improving downstream model performance across various domains without task-specific tuning.
Contribution
It proposes a reinforcement learning approach that uses influence estimation to automatically optimize rubrics based on target model feedback, reducing reliance on expert-crafted heuristics.
Findings
Influence scores effectively measure synthetic data utility for training.
Optimized rubrics lead to consistent performance improvements across domains.
The framework generalizes well without domain-specific tuning.
Abstract
Large language models (LLMs) achieve strong downstream performance largely due to abundant supervised fine-tuning (SFT) data. However, high-quality SFT data in knowledge-intensive domains such as humanities, social sciences, medicine, law, and finance is scarce because expert curation is expensive, privacy constraints are strict, and label consistency is hard to ensure. Recent work uses synthetic data, typically by prompting a generator over domain documents and filtering outputs with handcrafted rubrics. Yet rubric design is expert-dependent, transfers poorly across domains, and is often optimized through a brittle heuristic loop of writing rubrics, synthesizing data, training, inspecting results, and manually guessing revisions. This process lacks reliable quantitative feedback about how a rubric affects downstream performance. We propose evaluating synthetic data by its training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
