ICDPO: Effectively Borrowing Alignment Capability of Others via   In-context Direct Preference Optimization

Feifan Song; Yuxuan Fan; Xin Zhang; Peiyi Wang; Houfeng Wang

arXiv:2402.09320·cs.CL·February 15, 2024·1 cites

ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

Feifan Song, Yuxuan Fan, Xin Zhang, Peiyi Wang, Houfeng Wang

PDF

Open Access 1 Repo

TL;DR

ICDPO is a novel method that allows LLMs to borrow alignment capabilities from better models through in-context learning, improving response safety and quality without fine-tuning.

Contribution

The paper introduces ICDPO, a new approach that leverages in-context learning to enhance LLM alignment by using an instant scorer derived from model states.

Findings

01

ICDPO outperforms two fine-tuning-free baselines.

02

ICDPO is competitive with SFT + LoRA.

03

Extensive experiments validate its effectiveness.

Abstract

Large Language Models (LLMs) rely on Human Preference Alignment (HPA) to ensure the generation of safe content. Due to the heavy cost associated with fine-tuning, fine-tuning-free methods have emerged, typically modifying LLM decoding with external auxiliary methods. However, these methods do not essentially enhance the LLM itself. In this paper, we rethink the derivation procedures of DPO, based on which we conversely build an instant scorer using the states of the LLM before and after In-context Learning (ICL). Accordingly, we propose a novel approach called In-Context Direct Preference Optimization (ICDPO). It enables LLMs to borrow the HPA capabilities from superior LLMs with ICL, generating well-aligned responses as estimated by the aforementioned instant scorer, thereby enhancing the final performance. ICDPO can be further enhanced with a two-stage retriever and an upgraded…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

f2-song/icdpo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPoverty, Education, and Child Welfare · European Monetary and Fiscal Policies · Local Government Finance and Decentralization

MethodsDirect Preference Optimization · Shrink and Fine-Tune