FollowBench: A Multi-level Fine-grained Constraints Following Benchmark   for Large Language Models

Yuxin Jiang; Yufei Wang; Xingshan Zeng; Wanjun Zhong; Liangyou Li; Fei; Mi; Lifeng Shang; Xin Jiang; Qun Liu; Wei Wang

arXiv:2310.20410·cs.CL·June 6, 2024·1 cites

FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models

Yuxin Jiang, Yufei Wang, Xingshan Zeng, Wanjun Zhong, Liangyou Li, Fei, Mi, Lifeng Shang, Xin Jiang, Qun Liu, Wei Wang

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

FollowBench is a comprehensive benchmark designed to evaluate large language models' ability to follow complex, multi-level constraints across various dimensions, revealing current limitations and guiding future improvements.

Contribution

This paper introduces FollowBench, a novel multi-level, fine-grained constraints benchmark for LLMs, with a new mechanism to assess incremental constraint adherence.

Findings

01

LLMs struggle with complex, multi-level constraints

02

Current models show weaknesses in following diverse constraint types

03

Benchmark provides a new standard for evaluating constraint-following ability

Abstract

The ability to follow instructions is crucial for Large Language Models (LLMs) to handle various real-world applications. Existing benchmarks primarily focus on evaluating pure response quality, rather than assessing whether the response follows constraints stated in the instruction. To fill this research gap, in this paper, we propose FollowBench, a Multi-level Fine-grained Constraints Following Benchmark for LLMs. FollowBench comprehensively includes five different types (i.e., Content, Situation, Style, Format, and Example) of fine-grained constraints. To enable a precise constraint following estimation on diverse difficulties, we introduce a Multi-level mechanism that incrementally adds a single constraint to the initial instruction at each increased level. To assess whether LLMs' outputs have satisfied every individual constraint, we propose to prompt strong LLMs with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yjiangcm/followbench
pytorchOfficial

Datasets

YuxinJiang/FollowBench
dataset· 249 dl
249 dl

Videos

FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsFocus