StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following

Jinnan Li; Jinzhe Li; Yue Wang; Yi Chang; Yuan Wu

arXiv:2502.14494·cs.CL·June 2, 2025

StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following

Jinnan Li, Jinzhe Li, Yue Wang, Yi Chang, Yuan Wu

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

StructFlowBench introduces a new benchmark for evaluating multi-turn instruction following in language models, emphasizing the importance of structural dependencies between dialogue turns, revealing current models' deficiencies in understanding these structures.

Contribution

This work presents the first benchmark focusing on structural flow modeling in multi-turn dialogues, with a framework of six inter-turn relationships for evaluation and customization.

Findings

01

Current models show significant gaps in understanding dialogue structures.

02

The benchmark enables tailored dialogue flow generation for specific scenarios.

03

Systematic evaluation of 13 LLMs highlights deficiencies in multi-turn comprehension.

Abstract

Multi-turn instruction following capability constitutes a core competency of large language models (LLMs) in real-world applications. Existing evaluation benchmarks predominantly focus on fine-grained constraint satisfaction and domain-specific capability assessment, yet overlook the crucial structural dependencies between dialogue turns that distinguish multi-turn from single-turn interactions. These structural dependencies not only reflect user intent but also establish an essential second dimension for the instruction following evaluation beyond constraint satisfaction. To address this gap, we propose StructFlowBench, a multi-turn instruction following benchmark with structural flow modeling. The benchmark defines an innovative structural flow framework with six fundamental inter-turn relationships. These relationships introduce novel structural constraints for model evaluation and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mlgroupjlu/structflowbench
pytorchOfficial

Datasets

Jinnan/StructFlowBench
dataset· 34 dl
34 dl

Videos

StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following· underline

Taxonomy

TopicsAdvanced Data Storage Technologies

MethodsFocus