CoReflect: Conversational Evaluation via Co-Evolutionary Simulation and Reflective Rubric Refinement
Yunzhe Li, Richie Yueqi Feng, Tianxin Wei, Chin-Chia Hsu

TL;DR
CoReflect introduces an adaptive, iterative framework for evaluating conversational AI, combining simulation and rubric refinement to better capture diverse dialogue behaviors with minimal human input.
Contribution
It presents a novel co-evolutionary process that jointly improves dialogue simulation and evaluation rubrics, enabling scalable and self-refining conversational system assessment.
Findings
Automates rubric refinement through dialogue analysis.
Enhances evaluation coverage of diverse conversational behaviors.
Reduces human effort in evaluation process.
Abstract
Evaluating conversational systems in multi-turn settings remains a fundamental challenge. Conventional pipelines typically rely on manually defined rubrics and fixed conversational contexta static approach that limits coverage and fails to capture the diverse, emergent behaviors of dialogue models. To address this, we introduce CoReflect (Conversational Evaluation via Co-Evolutionary Simulation and Reflective Rubric Refinement), which unifies dialogue simulation and evaluation into an adaptive, iterative process. CoReflect employs a conversation planner that generates structured templates to guide a user simulator through diverse, goal-directed dialogues. Subsequently, a reflective analyzer processes these dialogues to identify systematic behavioral patterns and automatically refine the evaluation rubrics. Crucially, the insights from the conversation analysis are fed back into the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · AI in Service Interactions
