SKG-Eval: Stateful Evaluation of Multi-Turn Dialogue via Incremental Semantic Knowledge Graphs
Avijit Shil, Suman Samui

TL;DR
SKG-Eval introduces a structured, interpretable framework using incremental semantic knowledge graphs to evaluate multi-turn dialogue systems, effectively detecting long-range issues and aligning better with human judgments.
Contribution
The paper presents SKG-Eval, a novel framework that models dialogue as an evolving semantic knowledge graph, improving long-range inconsistency detection and interpretability over existing methods.
Findings
Achieves higher correlation with human judgments across benchmarks.
Effectively detects long-range contradictions and topic drifts.
Provides explicit contradiction certificates and deterministic scores.
Abstract
Evaluating multi-turn dialogue systems remains challenging because response quality depends not only on the current prompt, but also on previously established entities, claims, and conversational commitments. Existing automatic evaluators, including LLM-as-a-judge frameworks and embedding-based metrics, largely rely on flat or turn-isolated representations, making them less effective at detecting long-range issues such as contradiction, topic drift, and entity inconsistency. To address this, we propose SKG-Eval, a quasi-deterministic and interpretable framework that models dialogue as an evolving Semantic Knowledge Graph (SKG) of entities, relations, and commitments across turns. The framework incrementally updates the graph through structured triple extraction and computes three complementary signals: (i) local relevance, measuring alignment with the current prompt and optional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
