SEQUOR: A Multi-Turn Benchmark for Realistic Constraint Following

Beatriz Canaverde; Duarte M. Alves; Jos\'e Pombal; Giuseppe Attanasio; Andr\'e F. T. Martins

arXiv:2605.06353·cs.CL·May 11, 2026

SEQUOR: A Multi-Turn Benchmark for Realistic Constraint Following

Beatriz Canaverde, Duarte M. Alves, Jos\'e Pombal, Giuseppe Attanasio, Andr\'e F. T. Martins

PDF

TL;DR

SEQUOR introduces a benchmark for evaluating how well models adhere to constraints in long multi-turn conversations, revealing current limitations in instruction-following accuracy as conversations grow longer.

Contribution

The paper presents SEQUOR, a new automatic benchmark for assessing constraint adherence in long-horizon multi-turn conversations, highlighting challenges faced by current models.

Findings

01

Instruction-following accuracy drops over 11% in long conversations.

02

Accuracy decreases over 40% when following multiple constraints.

03

Model accuracy declines by over 9% when constraints are added or replaced.

Abstract

In a conversation, a helpful assistant must reliably follow user directives, even as they refine, modify, or contradict earlier requests. Yet most instruction-following benchmarks focus on single-turn or short multi-turn scenarios, leaving open how well models handle long-horizon instruction-following tasks. To bridge this gap, we present SEQUOR, an automatic benchmark for evaluating constraint adherence in long multi-turn conversations. SEQUOR consists of simulated persona-driven interactions built with constraints extracted from real-world conversations. Our results show that even when following a single constraint, instruction-following accuracy consistently decreases as the conversation grows longer, with drops exceeding 11%. This decline becomes larger when models have to follow multiple constraints simultaneously, reducing their accuracy by over 40%. In scenarios where constraints…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.