LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for   Enhanced Following of Instructions with Multiple Constraints

Thomas Palmeira Ferraz; Kartik Mehta; Yu-Hsiang Lin; Haw-Shiuan Chang,; Shereen Oraby; Sijia Liu; Vivek Subramanian; Tagyoung Chung; Mohit Bansal,; Nanyun Peng

arXiv:2410.06458·cs.CL·October 10, 2024

LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints

Thomas Palmeira Ferraz, Kartik Mehta, Yu-Hsiang Lin, Haw-Shiuan Chang,, Shereen Oraby, Sijia Liu, Vivek Subramanian, Tagyoung Chung, Mohit Bansal,, Nanyun Peng

PDF

Open Access 1 Video

TL;DR

This paper introduces RealInstruct, a benchmark for evaluating LLMs on real-world multi-constraint instructions, and proposes DeCRIM, a self-correction pipeline that significantly improves model performance in following complex instructions.

Contribution

The paper presents the first real-world multi-constraint instruction benchmark and a novel DeCRIM self-correction method that enhances LLMs' ability to follow complex constraints.

Findings

01

GPT-4 fails to meet at least one constraint in over 21% of instructions.

02

DeCRIM improves Mistral's performance by 7.3% on RealInstruct.

03

With strong feedback, open-source LLMs with DeCRIM can outperform GPT-4.

Abstract

Instruction following is a key capability for LLMs. However, recent studies have shown that LLMs often struggle with instructions containing multiple constraints (e.g. a request to create a social media post "in a funny tone" with "no hashtag"). Despite this, most evaluations focus solely on synthetic data. To address this, we introduce RealInstruct, the first benchmark designed to evaluate LLMs' ability to follow real-world multi-constrained instructions by leveraging queries real users asked AI assistants. We also investigate model-based evaluation as a cost-effective alternative to human annotation for this task. Our findings reveal that even the proprietary GPT-4 model fails to meet at least one constraint on over 21% of instructions, highlighting the limitations of state-of-the-art models. To address the performance gap between open-source and proprietary models, we propose the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints· underline

Taxonomy

TopicsHandwritten Text Recognition Techniques · Mathematics, Computing, and Information Processing · Natural Language Processing Techniques

MethodsDense Connections · Adam · Linear Layer · Residual Connection · Position-Wise Feed-Forward Layer · Attention Is All You Need · Label Smoothing · Dropout · Byte Pair Encoding · Absolute Position Encodings