A Unified Understanding of Offline Data Selection and Online Self-refining Generation for Post-training LLMs
Quan Xiao, Tianyi Chen

TL;DR
This paper presents a unified optimization-based framework for offline data selection and online self-refining generation in large language models, improving task adaptation and data quality.
Contribution
It introduces a bilevel data selection approach and a unified understanding of data weighting, with theoretical validation and practical performance improvements.
Findings
The bilevel data selection framework is theoretically effective.
Combining offline data with online self-refining improves fine-tuning.
Experiments show enhanced quality and safety in LLM fine-tuning.
Abstract
Offline data selection and online self-refining generation, which enhance the data quality, are crucial steps in adapting large language models (LLMs) to specific downstream tasks. We tackle offline data selection and online self-refining generations through an optimization perspective. Specifically, bilevel data selection is used for offline data selection with respect to the validation dataset, and we treat online self-refining generation as a model adaptation step of selecting the model trained on current responses that best fits the validation data. Our framework offers a unified understanding of offline data selection and self-refining generation by assigning a learned data weight to each question and response, either explicitly or implicitly. For the first time, we theoretically demonstrate the effectiveness of the bilevel data selection framework and demonstrate its performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Natural Language Processing Techniques
