Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment

Geyang Guo; Ranchi Zhao; Tianyi Tang; Wayne Xin Zhao; Ji-Rong Wen

arXiv:2311.04072·cs.CL·April 16, 2024·1 cites

Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment

Geyang Guo, Ranchi Zhao, Tianyi Tang, Wayne Xin Zhao, Ji-Rong Wen

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces FIGA, an improved alignment method for large language models that uses fine-grained quality signals from contrasting responses, enhancing alignment beyond imitation learning.

Contribution

The paper presents a new dataset and a novel loss function that incorporate token-level quality signals for better alignment of LLMs.

Findings

01

FIGA outperforms baseline methods in alignment tasks.

02

Fine-grained signals improve the understanding of expected behaviors.

03

The approach enhances LLM alignment without relying solely on imitation learning.

Abstract

Alignment with human preference is a desired property of large language models (LLMs). Currently, the main alignment approach is based on reinforcement learning from human feedback (RLHF). Despite the effectiveness of RLHF, it is intricate to implement and train, thus recent studies explore how to develop alternative alignment approaches based on supervised fine-tuning (SFT). A major limitation of SFT is that it essentially does imitation learning, which cannot fully understand what are the expected behaviors. To address this issue, we propose an improved alignment approach named FIGA. Different from prior methods, we incorporate fine-grained (i.e., token or phrase level) quality signals that are derived by contrasting good and bad responses. Our approach has made two major contributions. Firstly, we curate a refined alignment dataset that pairs initial responses and the corresponding…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

1. This paper is well-motivated, makes a timely contribution to the distribution shift problem in the Human-LLM Alignment field, and contributes a useful LLM-in-the-loop method to create a higher-quality dataset better for model fine-tuning. 2. The experiment results are comprehensive and solid enough to support the major claims of the paper.

Weaknesses

1. **Narrowed Application Range**: It seems the proposed dataset creation method is strongly biased by the rollout model checkpoint and the reward model used for identifying low-quality outputs. As each rollout model and reward model combination may have different failure modes, it is not clear if the practitioners want to switch to other rollout models and other reward models, how much this dataset can still be helpful. It can be an unreasonably large consumption of computation power if we need

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 5

Strengths

- The paper is well written and is easy to follow. - The motivation of the paper is timely and important. - Many in-depth analysis on the constructed dataset and different experiment settings.

Weaknesses

- The paper needs more comparison with works that do not leverage reward models (e.g. DPO [1]), to demonstrate the advantage over recent RLless methods - The paper needs to specify what each performance metric is for the benchmark datasets. - In Table 2, performance improvement is marginal compared to the best performing baseline method. Also, multiplying the Reward metric by 10 while leaving Vicuna, WizardLM at the original scale is misleading. - In Figure 3, the win rate of FIGA is below 50% m

Reviewer 03Rating 8· accept, good paperConfidence 3

Strengths

--interesting, novel method for generating pairs of responses with localized improvements for fine-grained feedback --i can definitely see the main intuition (distilling the qualities of a strong response into a more surface-level-similar response) as potentially inspiring future work

Weaknesses

--how do you collect the human-preferred response $Y$? (i.e., the one that you feed into ChatGPT together with $\hat{Y}$ to get $\tilde{Y}$). based on Fig1, i assume this is not generated by the same model as used to get $\hat{Y}$, but either provided with the dataset or generated by ChatGPT in a distillation-like setting? but since you have a threshold $\eta_2$ for filtering the quality of $Y$ anyway, i wonder if you could generate $Y$ using your model as well-- I think this would really streng

Code & Models

Repositories

rucaibox/figa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsShrink and Fine-Tune