ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Feifan Song, Yuxuan Fan, Xin Zhang, Peiyi Wang, Houfeng Wang

TL;DR
ICDPO is a novel method that allows LLMs to borrow alignment capabilities from better models through in-context learning, improving response safety and quality without fine-tuning.
Contribution
The paper introduces ICDPO, a new approach that leverages in-context learning to enhance LLM alignment by using an instant scorer derived from model states.
Findings
ICDPO outperforms two fine-tuning-free baselines.
ICDPO is competitive with SFT + LoRA.
Extensive experiments validate its effectiveness.
Abstract
Large Language Models (LLMs) rely on Human Preference Alignment (HPA) to ensure the generation of safe content. Due to the heavy cost associated with fine-tuning, fine-tuning-free methods have emerged, typically modifying LLM decoding with external auxiliary methods. However, these methods do not essentially enhance the LLM itself. In this paper, we rethink the derivation procedures of DPO, based on which we conversely build an instant scorer using the states of the LLM before and after In-context Learning (ICL). Accordingly, we propose a novel approach called In-Context Direct Preference Optimization (ICDPO). It enables LLMs to borrow the HPA capabilities from superior LLMs with ICL, generating well-aligned responses as estimated by the aforementioned instant scorer, thereby enhancing the final performance. ICDPO can be further enhanced with a two-stage retriever and an upgraded…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPoverty, Education, and Child Welfare · European Monetary and Fiscal Policies · Local Government Finance and Decentralization
MethodsDirect Preference Optimization · Shrink and Fine-Tune
