100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models

Yeounoh Chung; Rushabh Desai; Jian He; Yu Xiao; Thibaud Hottelier; Yves-Laurent Kom Samo; Pushkar Khadilkar; Xianshun Chen; Sam Idicula; Fatma \"Ozcan; Alon Halevy; Yannis Papakonstantinou

arXiv:2603.15970·cs.DB·April 16, 2026

100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models

Yeounoh Chung, Rushabh Desai, Jian He, Yu Xiao, Thibaud Hottelier, Yves-Laurent Kom Samo, Pushkar Khadilkar, Xianshun Chen, Sam Idicula, Fatma \"Ozcan, Alon Halevy, Yannis Papakonstantinou

PDF

TL;DR

This paper evaluates a proxy model approach that significantly reduces the cost and latency of AI queries in SQL, maintaining accuracy and enabling scalable, efficient data analysis.

Contribution

It introduces a proxy model-based method for AI query approximation that achieves over 100x cost and latency reduction with preserved accuracy.

Findings

01

>100x cost and latency reduction for semantic filter operator

02

Proxy models preserve or improve accuracy across datasets

03

OLAP and HTAP architectures enable scalable online AI queries

Abstract

Several data warehouse and database providers have recently introduced extensions to SQL called AI Queries, enabling users to specify functions and conditions in SQL that are evaluated by LLMs, thereby broadening significantly the kinds of queries one can express over the combination of structured and unstructured data. LLMs offer remarkable semantic reasoning capabilities, making them an essential tool for complex and nuanced queries that blend structured and unstructured data. While extremely powerful, these AI queries can become prohibitively costly when invoked thousands of times. This paper provides an extensive evaluation of a recent AI query approximation approach that enables low cost analytics and database applications to benefit from AI queries. The approach delivers >100x cost and latency reduction for the semantic filter operator and also important gains for semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.