VISTA: Verification In Sequential Turn-based Assessment
Ashley Lewis, Andrew Perrault, Eric Fosler-Lussier, Michael White

TL;DR
VISTA is a new framework for evaluating conversational AI's factual accuracy by verifying atomic claims and tracking consistency across multi-turn dialogues, improving hallucination detection.
Contribution
It introduces claim-level verification and sequential assessment for factuality, addressing limitations of existing metrics in multi-turn dialogue evaluation.
Findings
VISTA outperforms FACTSCORE and LLM-as-Judge in hallucination detection.
Human evaluation shows VISTA's decomposition improves annotator agreement.
VISTA reveals inconsistencies in existing dialogue factuality benchmarks.
Abstract
Hallucination--defined here as generating statements unsupported or contradicted by available evidence or conversational context--remains a major obstacle to deploying conversational AI systems in settings that demand factual reliability. Existing metrics either evaluate isolated responses or treat unverifiable content as errors, limiting their use for multi-turn dialogue. We introduce VISTA (Verification In Sequential Turn-based Assessment), a framework for evaluating conversational factuality through claim-level verification and sequential consistency tracking. VISTA decomposes each assistant turn into atomic factual claims, verifies them against trusted sources and dialogue history, and categorizes unverifiable statements (subjective, contradicted, lacking evidence, or abstaining). Across eight large language models and four dialogue factuality benchmarks (AIS, BEGIN, FAITHDIAL, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
