Streaming Model Cascades for Semantic SQL
Pawe{\l} Liskowski, Kyle Schmaus

TL;DR
This paper introduces two adaptive streaming cascade algorithms, SUPG-IT and GAMCAL, for cost-effective semantic SQL inference using large language models, enabling independent per-partition processing with formal guarantees.
Contribution
The paper presents novel streaming, per-partition cascade algorithms that operate without inter-worker communication, extending existing frameworks to distributed, cost-sensitive semantic SQL inference.
Findings
Both algorithms achieve F1 > 0.95 on all datasets.
GAMCAL outperforms in cost-sensitive scenarios with higher F1 per oracle call.
SUPG-IT provides formal guarantees on precision and recall.
Abstract
Modern data warehouses extend SQL with semantic operators that invoke large language models on each qualifying row, but the per-row inference cost is prohibitive at scale. Model cascades reduce this cost by routing most rows through a fast proxy model and delegating uncertain cases to an expensive oracle. Existing frameworks, however, require global dataset access and optimize a single quality metric, limiting their applicability in distributed systems where data is partitioned across independent workers. We present two adaptive cascade algorithms designed for streaming, per-partition execution in which each worker processes its partition independently without inter-worker communication. SUPG-IT extends the SUPG statistical framework to streaming execution with iterative threshold refinement and joint precision-recall guarantees. GAMCAL replaces user-specified quality targets with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
