Evergreen: Efficient Claim Verification for Semantic Aggregates
Alexander W. Lee, Benjamin Han, Shayak Sen, Sam Yeom, Ugur Cetintemel, Anupam Datta

TL;DR
Evergreen is a system that efficiently verifies claims in semantic aggregates by optimizing LLM usage, reducing costs and latency while maintaining high accuracy on real-world datasets.
Contribution
It introduces a novel approach to claim verification as a semantic query processing task with tailored optimizations and provenance capture, improving efficiency and accuracy.
Findings
Achieves F1 = 1.00 with a strong LLM, reducing cost by 3.2x and latency by 4.0x.
Outperforms LLM-as-a-judge baseline in F1 at 48x lower cost and 2.3x lower latency.
Matches F1 with a weaker LLM at 63x lower cost and 4.2x lower latency.
Abstract
With recent semantic query processing engines, semantic aggregation has become a primitive operator, enabling the reduction of a relation into a natural language aggregate using an LLM. However, the resulting semantic aggregate may contain claims that are not grounded in the underlying relation. Verifying such claims is challenging: they often involve quantifiers, groupings, and comparisons over relations that far exceed LLM context windows and require a costly combination of semantic and symbolic processing. We present Evergreen, a system that recasts claim verification as a semantic query processing task with tailored optimizations and provenance capture. Evergreen compiles each claim into a declarative semantic verification query and executes it on the same engine that produced the aggregate. To reduce cost and latency, Evergreen avoids unnecessary LLM calls through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
