Aligning Language Models with Demonstrated Feedback

Omar Shaikh; Michelle S. Lam; Joey Hejna; Yijia Shao; Hyundong Cho,; Michael S. Bernstein; Diyi Yang

arXiv:2406.00888·cs.CL·April 22, 2025·1 cites

Aligning Language Models with Demonstrated Feedback

Omar Shaikh, Michelle S. Lam, Joey Hejna, Yijia Shao, Hyundong Cho,, Michael S. Bernstein, Diyi Yang

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces DITTO, a method that aligns language models to specific tasks using a small number of demonstrations as feedback, outperforming traditional methods in style and task alignment.

Contribution

The paper proposes DITTO, a novel online imitation learning approach that efficiently aligns LLM outputs to user demonstrations with minimal data.

Findings

01

DITTO outperforms few-shot prompting, supervised fine-tuning, and self-play methods by an average of 19% in win-rate.

02

The method effectively learns fine-grained style and task alignment across diverse domains.

03

User study confirms DITTO's superior customization capabilities for LLMs.

Abstract

Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic output is possible through supervised finetuning or RLHF, but requires prohibitively large datasets for new ad-hoc tasks. We argue that it is instead possible to align an LLM to a specific setting by leveraging a very small number (< 10) of demonstrations as feedback. Our method, Demonstration ITerated Task Optimization (DITTO), directly aligns language model outputs to a user's demonstrated behaviors. Derived using ideas from online imitation learning, DITTO cheaply generates online comparison data by treating users' demonstrations as preferred over output from the LLM and its intermediate checkpoints. Concretely, DITTO operates by having an LLM generate examples that are presumed to be inferior to expert demonstrations. The…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

- The paper proposes a data-efficient training method that enables LLMs to follow expert demonstrations. The Reinforcement Learning from Human Feedback (RLHF) data can be continuously generated by simply comparing expert demonstrations with the intermodel's responses. This approach can also be seen as a blend of Reinforcement Learning from AI Feedback (RLAIF) and RLHF, making it a reasonable and effective method. - The authors demonstrate the performance improvements of DITTO-trained models usin

Weaknesses

- The authors did not investigate potential side effects, such as performance degradation on other benchmark datasets, after training with DITTO. Since the LLM is fine-tuned exclusively on targeted demonstrations, there’s a risk of significant performance drops in broader tasks. It is essential to preserve the LLM's original knowledge and abilities while adjusting its output to align with specific style and preference. - Also they overlooks the computational inefficiency of iterative training in

Reviewer 02Rating 5Confidence 4

Strengths

The paper introduces DITTO, a novel method designed to guide LLMs toward specific settings for effective customization, achieving sample efficiency with fewer than 10 demonstrations. DITTO outperforms strong baselines, including SFT and GPT-4 with few-shot prompting. Additionally, a detailed user study further reinforces the reliability of DITTO.

Weaknesses

1. The static experiments in Section 4.1 are not particularly convincing. Have you considered testing additional baselines or employing other automatic evaluation methods, such as calculating sentence embedding similarity to compare styles? 2. Have you evaluated DITTO on more benchmarks or tested its generalization ability? I noticed that only three authors were used for validation or testing. Can the DITTO method generalize to tasks beyond writing?

Reviewer 03Rating 8Confidence 4

Strengths

- DITTO introduces a new approach to user-specific alignment by using a small set of demonstrations to generate online comparison data. This is innovative and practical for settings where data collection is costly. - The paper provides a strong theoretical justification for DITTO, grounding it in online imitation learning. The derivation explains why DITTO can outperform traditional methods like SFT in low-data scenarios. - The paper completes various experiments, demonstrating DITTO’s effective

Weaknesses

- Limited exploration is done into how DITTO scales to broader and more diverse tasks that may require a more generalized alignment. This is seen in how the experiments primarily focus on a small number of demonstrations. - DITTO’s approach heavily relies on the quality of user-provided demonstrations. If demonstrations are unclear or poorly constructed, the alignment could suffer. This could limit DITTO’s real-world applicability when high-quality demonstrations are not readily available. - The

Code & Models

Repositories

SALT-NLP/demonstrated-feedback
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsALIGN