TurnWise: The Gap between Single- and Multi-turn Language Model Capabilities

Victoria Graf; Valentina Pyatkin; Nouha Dziri; Nathan Lambert; Hannaneh Hajishirzi

arXiv:2603.16759·cs.CL·March 18, 2026

TurnWise: The Gap between Single- and Multi-turn Language Model Capabilities

Victoria Graf, Valentina Pyatkin, Nouha Dziri, Nathan Lambert, Hannaneh Hajishirzi

PDF

Open Access

TL;DR

This paper introduces TurnWise, a benchmark and data pipeline to evaluate and improve multi-turn language model capabilities, revealing that multi-turn training significantly enhances performance.

Contribution

It presents TurnWiseEval for multi-turn evaluation and TurnWiseData for scalable multi-turn training data generation, addressing the gap in current single-turn focused assessments.

Findings

01

Training with multi-turn data improves multi-turn chat performance by 12%.

02

As little as 10k multi-turn conversations boost capabilities.

03

Multi-turn training is essential for strong multi-turn conversational skills.

Abstract

Multi-turn conversations are a common and critical mode of language model interaction. However, current open training and evaluation data focus on single-turn settings, failing to capture the additional dimension of these longer interactions. To understand this multi-/single-turn gap, we first introduce a new benchmark, TurnWiseEval, for multi-turn capabilities that is directly comparable to single-turn chat evaluation. Our evaluation isolates multi-turn specific conversational ability through pairwise comparison to equivalent single-turn settings. We additionally introduce our synthetic multi-turn data pipeline TurnWiseData which allows the scalable generation of multi-turn training data. Our experiments with Olmo 3 show that training with multi-turn data is vital to achieving strong multi-turn chat performance, and that including as little as 10k multi-turn conversations during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques