How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective
Teng Xiao, Mingxiao Li, Yige Yuan, Huaisheng Zhu, Chao Cui, Vasant G, Honavar

TL;DR
This paper presents GSIL, a new self-imitation learning framework that efficiently aligns large language models with offline demonstration data, outperforming existing methods across multiple benchmarks.
Contribution
Introduction of GSIL, a generalized self-imitation learning method that simplifies and enhances offline alignment of large language models using density ratio estimates.
Findings
GSIL outperforms baseline methods on coding, reasoning, and instruction-following benchmarks.
Eliminates the need for adversarial training in imitation learning.
Provides a unified, efficient approach for offline model alignment.
Abstract
This paper introduces a novel generalized self-imitation learning () framework, which effectively and efficiently aligns large language models with offline demonstration data. We develop by deriving a surrogate objective of imitation learning with density ratio estimates, facilitating the use of self-generated data and optimizing the imitation learning objective with simple classification losses. eliminates the need for complex adversarial training in standard imitation learning, achieving lightweight and efficient fine-tuning for large language models. In addition, encompasses a family of offline losses parameterized by a general class of convex functions for density ratio estimation and enables a unified view for alignment with demonstration data. Extensive experiments show that consistently and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
