CoReflect: Conversational Evaluation via Co-Evolutionary Simulation and Reflective Rubric Refinement

Yunzhe Li; Richie Yueqi Feng; Tianxin Wei; Chin-Chia Hsu

arXiv:2601.12208·cs.CL·January 21, 2026

CoReflect: Conversational Evaluation via Co-Evolutionary Simulation and Reflective Rubric Refinement

Yunzhe Li, Richie Yueqi Feng, Tianxin Wei, Chin-Chia Hsu

PDF

Open Access

TL;DR

CoReflect introduces an adaptive, iterative framework for evaluating conversational AI, combining simulation and rubric refinement to better capture diverse dialogue behaviors with minimal human input.

Contribution

It presents a novel co-evolutionary process that jointly improves dialogue simulation and evaluation rubrics, enabling scalable and self-refining conversational system assessment.

Findings

01

Automates rubric refinement through dialogue analysis.

02

Enhances evaluation coverage of diverse conversational behaviors.

03

Reduces human effort in evaluation process.

Abstract

Evaluating conversational systems in multi-turn settings remains a fundamental challenge. Conventional pipelines typically rely on manually defined rubrics and fixed conversational context $-$ a static approach that limits coverage and fails to capture the diverse, emergent behaviors of dialogue models. To address this, we introduce CoReflect (Conversational Evaluation via Co-Evolutionary Simulation and Reflective Rubric Refinement), which unifies dialogue simulation and evaluation into an adaptive, iterative process. CoReflect employs a conversation planner that generates structured templates to guide a user simulator through diverse, goal-directed dialogues. Subsequently, a reflective analyzer processes these dialogues to identify systematic behavioral patterns and automatically refine the evaluation rubrics. Crucially, the insights from the conversation analysis are fed back into the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · AI in Service Interactions