GSAR: Typed Grounding for Hallucination Detection and Recovery in Multi-Agent LLMs
Federico A. Kamelhar

TL;DR
GSAR introduces a novel framework for evaluating and improving the grounding of claims in multi-agent LLM systems by categorizing claims, weighting evidence, and guiding decision-making under compute constraints.
Contribution
It is the first groundedness framework combining evidence-typed scoring with tiered recovery and explicit compute budgeting for multi-agent LLMs.
Findings
GSAR achieves consistent evaluation across multiple LLM judges.
Ablation studies confirm the importance of complementary evidence in grounding.
GSAR outperforms existing methods like Vectara HHEM-2.1-Open in grounding accuracy.
Abstract
Autonomous multi-agent LLM systems are increasingly deployed to investigate operational incidents and produce structured diagnostic reports. Their trustworthiness hinges on whether each claim is grounded in observed evidence rather than model-internal inference. Existing groundedness evaluators (binary classifiers, LLM-as-judge scalars, self-correction loops) treat supporting evidence as interchangeable and emit a single signal that offers no principled control over downstream action. We present GSAR, a grounding-evaluation and replanning framework that (i) partitions claims into a four-way typology (grounded, ungrounded, contradicted, complementary), giving first-class standing to non-redundant alternative perspectives; (ii) assigns evidence-type-specific weights reflecting epistemic strength; (iii) computes an asymmetric contradiction-penalised weighted groundedness score; and (iv)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
