Suri: Multi-constraint Instruction Following for Long-form Text   Generation

Chau Minh Pham; Simeng Sun; Mohit Iyyer

arXiv:2406.19371·cs.CL·October 3, 2024

Suri: Multi-constraint Instruction Following for Long-form Text Generation

Chau Minh Pham, Simeng Sun, Mohit Iyyer

PDF

Open Access 1 Repo 2 Models 1 Datasets

TL;DR

This paper introduces Suri, a dataset and methods for training language models to follow multiple complex constraints in long-form text generation, addressing challenges in preference data collection.

Contribution

The work presents Suri dataset, a novel alignment method I-ORPO, and demonstrates effective long-form constrained text generation with improved human preference.

Findings

01

Models generate ~5K tokens of long text without quality loss.

02

Suri-I-ORPO outperforms other models in human evaluations.

03

Proposed methods enable better multi-constraint adherence in long-form generation.

Abstract

Existing research on instruction following largely focuses on tasks with simple instructions and short responses. In this work, we explore multi-constraint instruction following for generating long-form text. We create Suri, a dataset with 20K human-written long-form texts paired with LLM-generated backtranslated instructions that contain multiple complex constraints. Because of prohibitive challenges associated with collecting human preference judgments on long-form texts, preference-tuning algorithms such as DPO are infeasible in our setting; thus, we propose Instructional ORPO (I-ORPO), an alignment method based on the ORPO algorithm. Instead of receiving negative feedback from dispreferred responses, I-ORPO obtains negative feedback from synthetically corrupted instructions generated by an LLM. Using Suri, we perform supervised and I-ORPO fine-tuning on Mistral-7b-Instruct-v0.2. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chtmp223/suri
pytorchOfficial

Models

Datasets

chtmp223/suri
dataset· 107 dl
107 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsDirect Preference Optimization · Balanced Selection · Shrink and Fine-Tune