Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment
Geyang Guo, Ranchi Zhao, Tianyi Tang, Wayne Xin Zhao, Ji-Rong Wen

TL;DR
This paper introduces FIGA, an improved alignment method for large language models that uses fine-grained quality signals from contrasting responses, enhancing alignment beyond imitation learning.
Contribution
The paper presents a new dataset and a novel loss function that incorporate token-level quality signals for better alignment of LLMs.
Findings
FIGA outperforms baseline methods in alignment tasks.
Fine-grained signals improve the understanding of expected behaviors.
The approach enhances LLM alignment without relying solely on imitation learning.
Abstract
Alignment with human preference is a desired property of large language models (LLMs). Currently, the main alignment approach is based on reinforcement learning from human feedback (RLHF). Despite the effectiveness of RLHF, it is intricate to implement and train, thus recent studies explore how to develop alternative alignment approaches based on supervised fine-tuning (SFT). A major limitation of SFT is that it essentially does imitation learning, which cannot fully understand what are the expected behaviors. To address this issue, we propose an improved alignment approach named FIGA. Different from prior methods, we incorporate fine-grained (i.e., token or phrase level) quality signals that are derived by contrasting good and bad responses. Our approach has made two major contributions. Firstly, we curate a refined alignment dataset that pairs initial responses and the corresponding…
Peer Reviews
Decision·ICLR 2024 poster
1. This paper is well-motivated, makes a timely contribution to the distribution shift problem in the Human-LLM Alignment field, and contributes a useful LLM-in-the-loop method to create a higher-quality dataset better for model fine-tuning. 2. The experiment results are comprehensive and solid enough to support the major claims of the paper.
1. **Narrowed Application Range**: It seems the proposed dataset creation method is strongly biased by the rollout model checkpoint and the reward model used for identifying low-quality outputs. As each rollout model and reward model combination may have different failure modes, it is not clear if the practitioners want to switch to other rollout models and other reward models, how much this dataset can still be helpful. It can be an unreasonably large consumption of computation power if we need
- The paper is well written and is easy to follow. - The motivation of the paper is timely and important. - Many in-depth analysis on the constructed dataset and different experiment settings.
- The paper needs more comparison with works that do not leverage reward models (e.g. DPO [1]), to demonstrate the advantage over recent RLless methods - The paper needs to specify what each performance metric is for the benchmark datasets. - In Table 2, performance improvement is marginal compared to the best performing baseline method. Also, multiplying the Reward metric by 10 while leaving Vicuna, WizardLM at the original scale is misleading. - In Figure 3, the win rate of FIGA is below 50% m
--interesting, novel method for generating pairs of responses with localized improvements for fine-grained feedback --i can definitely see the main intuition (distilling the qualities of a strong response into a more surface-level-similar response) as potentially inspiring future work
--how do you collect the human-preferred response $Y$? (i.e., the one that you feed into ChatGPT together with $\hat{Y}$ to get $\tilde{Y}$). based on Fig1, i assume this is not generated by the same model as used to get $\hat{Y}$, but either provided with the dataset or generated by ChatGPT in a distillation-like setting? but since you have a threshold $\eta_2$ for filtering the quality of $Y$ anyway, i wonder if you could generate $Y$ using your model as well-- I think this would really streng
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsShrink and Fine-Tune
