NC-Bench: An LLM Benchmark for Evaluating Conversational Competence

Robert J. Moore; Sungeun An; Farhan Ahmed; Jay Pankaj Gala

arXiv:2601.06426·cs.CL·March 10, 2026

NC-Bench: An LLM Benchmark for Evaluating Conversational Competence

Robert J. Moore, Sungeun An, Farhan Ahmed, Jay Pankaj Gala

PDF

Open Access 1 Datasets

TL;DR

NC-Bench is a new benchmark for evaluating large language models' conversational skills based on natural conversation structure, covering basic, retrieval-augmented, and complex interaction patterns.

Contribution

It introduces a theory-grounded, extensible framework for assessing LLMs' conversational competence beyond traditional content-focused benchmarks.

Findings

01

Models excel at answering but struggle with repair tasks.

02

Performance varies across interaction types and complexity.

03

Complex multi-turn requests remain challenging for models.

Abstract

The Natural Conversation Benchmark (NC-Bench) introduces a new approach to evaluating the general conversational competence of large language models (LLMs). Unlike prior benchmarks that focus on the content of model behavior, NC-Bench focuses on the form and structure of natural conversation. Grounded in the IBM Natural Conversation Framework (NCF), NC-Bench comprises three distinct sets: (1) the basic set evaluates fundamental sequence management practices, such as answering inquiries, repairing responses, and closing conversational pairs; (2) the retrieval-augmented generation (RAG) set applies the same sequence management patterns as the first set but incorporates information-seeking via RAG; (3) the complex request set extends to requests involving more intricate sequence management patterns. Each set tests a model's ability to produce contextually appropriate conversational actions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ibm-research/nc-bench
dataset· 100 dl
100 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Intelligent Tutoring Systems and Adaptive Learning