GLEAN: Grounded Lightweight Evaluation Anchors for Contamination-Aware Tabular Reasoning
Qizhi Wang

TL;DR
GLEAN is a lightweight, contamination-aware evaluation protocol for small models on tabular reasoning tasks, providing diagnostics and error attribution under hardware constraints.
Contribution
It introduces GLEAN, a novel evaluation framework that integrates contamination-aware probes, structured error attribution, and diagnostics for small models on tabular reasoning benchmarks.
Findings
GLEAN achieves 95.2% execution accuracy using Squall gold SQL as an anchor.
It reveals distinct error modes: grounding errors (L3) and hallucination/abstention errors (L2/L0).
Retrieval Recall@K can saturate even when end-to-end accuracy remains limited.
Abstract
Tabular reasoning benchmarks mix semantic inference, numerical computation, and brittle table formatting, yet evaluations for small models remain vulnerable to contamination, dataset artifacts, and retrieval failures. We propose GLEAN, a lightweight evaluation protocol that integrates contamination-aware probes, weak-supervision governance, retrieval-reasoning diagnostics, and structured error attribution under tight hardware constraints. We evaluate across TabFact, WTQ via Squall, TableBench, RobuT, and SciTab under a 16GB GPU budget. Using Squall gold SQL as an executable anchor (95.2% execution), GLEAN assigns a deterministic error taxonomy (L0-L4 plus L0.5 context miss) and reveals a stable error-mode separation: TAPEX errors skew toward grounding (L3) while TAPAS errors skew toward hallucination/abstention (L2/L0). We validate evidence-row heuristics against SQL-derived rows on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Data Quality and Management · Research Data Management Practices
