Suri: Multi-constraint Instruction Following for Long-form Text Generation
Chau Minh Pham, Simeng Sun, Mohit Iyyer

TL;DR
This paper introduces Suri, a dataset and methods for training language models to follow multiple complex constraints in long-form text generation, addressing challenges in preference data collection.
Contribution
The work presents Suri dataset, a novel alignment method I-ORPO, and demonstrates effective long-form constrained text generation with improved human preference.
Findings
Models generate ~5K tokens of long text without quality loss.
Suri-I-ORPO outperforms other models in human evaluations.
Proposed methods enable better multi-constraint adherence in long-form generation.
Abstract
Existing research on instruction following largely focuses on tasks with simple instructions and short responses. In this work, we explore multi-constraint instruction following for generating long-form text. We create Suri, a dataset with 20K human-written long-form texts paired with LLM-generated backtranslated instructions that contain multiple complex constraints. Because of prohibitive challenges associated with collecting human preference judgments on long-form texts, preference-tuning algorithms such as DPO are infeasible in our setting; thus, we propose Instructional ORPO (I-ORPO), an alignment method based on the ORPO algorithm. Instead of receiving negative feedback from dispreferred responses, I-ORPO obtains negative feedback from synthetically corrupted instructions generated by an LLM. Using Suri, we perform supervised and I-ORPO fine-tuning on Mistral-7b-Instruct-v0.2. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsDirect Preference Optimization · Balanced Selection · Shrink and Fine-Tune
