TL;DR
This paper introduces MDS, a dialogue-level data selection framework that improves the quality of multi-turn dialogue datasets for instruction tuning by scoring entire conversations based on coverage, reliability, and consistency.
Contribution
The paper presents a novel multi-turn dialogue selection method that outperforms existing single-turn and heuristic approaches, enhancing dataset quality for instruction-tuned models.
Findings
MDS outperforms strong baselines on three multi-turn benchmarks.
MDS achieves the best overall rank across reference-free and reference-based metrics.
MDS is more robust on long conversations under the same training budget.
Abstract
Instruction-tuned language models increasingly rely on large multi-turn dialogue corpora, but these datasets are often noisy and structurally inconsistent, with topic drift, repetitive chitchat, and mismatched answer formats across turns. We address this from a data selection perspective and propose \textbf{MDS} (Multi-turn Dialogue Selection), a dialogue-level framework that scores whole conversations rather than isolated turns. MDS combines a global coverage stage that performs bin-wise selection in the user-query trajectory space to retain representative yet non-redundant dialogues, with a local structural stage that evaluates within-dialogue reliability through entity-grounded topic grounding and information progress, together with query-answer form consistency for functional alignment. MDS outperforms strong single-turn selectors, dialogue-level LLM scorers, and heuristic baselines…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
