Semantic Grounding Index: Geometric Bounds on Context Engagement in RAG Systems
Javier Mar\'in

TL;DR
The paper introduces the Semantic Grounding Index (SGI), a geometric measure in embedding space that detects when RAG system responses are semantically lazy or hallucinated, with strong empirical validation and theoretical backing.
Contribution
It proposes the SGI metric based on angular distances, providing a new, theoretically grounded tool for assessing response engagement in RAG systems.
Findings
SGI effectively detects semantic laziness in RAG responses.
SGI's discriminative power increases with question-context angular separation.
SGI scores correlate with response length and question brevity.
Abstract
When retrieval-augmented generation (RAG) systems hallucinate, what geometric trace does this leave in embedding space? We introduce the Semantic Grounding Index (SGI), defined as the ratio of angular distances from the response to the question versus the context on the unit hypersphere .Our central finding is \emph{semantic laziness}: hallucinated responses remain angularly proximate to questions rather than departing toward retrieved contexts. On HaluEval (=5,000), we observe large effect sizes (Cohen's ranging from 0.92 to 1.28) across five embedding models with mean cross-model correlation =0.85. Crucially, we derive from the spherical triangle inequality that SGI's discriminative power should increase with question-context angular separation -a theoretical prediction confirmed empirically: effect size rises monotonically from =0.61 -low…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Deception detection and forensic psychology
