A Framework for Fine-Tuning LLMs using Heterogeneous Feedback

Ryan Aponte (1); Ryan A. Rossi (2); Shunan Guo (2); Franck Dernoncourt; (2); Tong Yu (2); Xiang Chen (2); Subrata Mitra (2); Nedim Lipka (2) ((1); Carnegie Mellon University; (2) Adobe Research)

arXiv:2408.02861·cs.CL·August 7, 2024

A Framework for Fine-Tuning LLMs using Heterogeneous Feedback

Ryan Aponte (1), Ryan A. Rossi (2), Shunan Guo (2), Franck Dernoncourt, (2), Tong Yu (2), Xiang Chen (2), Subrata Mitra (2), Nedim Lipka (2) ((1), Carnegie Mellon University, (2) Adobe Research)

PDF

Open Access

TL;DR

This paper introduces a framework for fine-tuning large language models using diverse feedback types, unifying data formats and selecting high-quality subsets to enhance performance across multiple tasks.

Contribution

The authors propose a novel framework that combines heterogeneous feedback into a unified format and extracts high-quality subsets, improving LLM fine-tuning effectiveness.

Findings

01

Unified feedback data improves fine-tuning efficiency.

02

High-quality subset selection enhances model performance.

03

Framework benefits multiple areas like instruction following and bias reduction.

Abstract

Large language models (LLMs) have been applied to a wide range of tasks, including text summarization, web navigation, and chatbots. They have benefitted from supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) following an unsupervised pretraining. These datasets can be difficult to collect, limited in scope, and vary in sample quality. Additionally, datasets can vary extensively in supervision format, from numerical to binary as well as multi-dimensional with many different values. We present a framework for fine-tuning LLMs using heterogeneous feedback, which has two main components. First, we combine the heterogeneous feedback data into a single supervision format, compatible with methods like SFT and RLHF. Next, given this unified feedback dataset, we extract a high-quality and diverse subset to obtain performance increases potentially exceeding the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIterative Learning Control Systems · Advanced Control Systems Design · Control Systems in Engineering

MethodsShrink and Fine-Tune