Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models
Zhengxuan Wu, Yuhao Zhang, Peng Qi, Yumo Xu, Rujun Han and, Yian Zhang, Jifan Chen, Bonan Min, Zhiheng Huang

TL;DR
This paper investigates the trade-off between instruction following and faithfulness in language models, proposing a Rejection Sampling method that improves alignment with less data, revealing insights into training objectives.
Contribution
It introduces ReSet, a rejection sampling-based approach that enhances language model alignment, outperforming traditional multi-task learning methods with less data.
Findings
ReSet outperforms vanilla multi-task learning.
Training with less high-quality data yields better results.
Identifies a fundamental trade-off in alignment objectives.
Abstract
Modern language models (LMs) need to follow human instructions while being faithful; yet, they often fail to achieve both. Here, we provide concrete evidence of a trade-off between instruction following (i.e., follow open-ended instructions) and faithfulness (i.e., ground responses in given context) when training LMs with these objectives. For instance, fine-tuning LLaMA-7B on instruction following datasets renders it less faithful. Conversely, instruction-tuned Vicuna-7B shows degraded performance at following instructions when further optimized on tasks that require contextual grounding. One common remedy is multi-task learning (MTL) with data mixing, yet it remains far from achieving a synergic outcome. We propose a simple yet effective method that relies on Rejection Sampling for Continued Self-instruction Tuning (ReSet), which significantly outperforms vanilla MTL. Surprisingly, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
