The Case for Instance-Optimized LLMs in OLAP Databases

Bardia Mohammadi; Laurent Bindschaedler

arXiv:2507.04967·cs.DB·July 8, 2025

The Case for Instance-Optimized LLMs in OLAP Databases

Bardia Mohammadi, Laurent Bindschaedler

PDF

TL;DR

This paper introduces IOLM-DB, a system that creates query-specific, lightweight LLMs to enable scalable, accurate, and efficient LLM-enhanced database analytics at large scale.

Contribution

IOLM-DB develops a method for generating specialized, compressed LLMs per query, significantly reducing model size and increasing throughput in OLAP systems.

Findings

01

Model footprints reduced by up to 76%.

02

Throughput increased by up to 3.31×.

03

Maintains accuracy with aggressive compression.

Abstract

Large Language Models (LLMs) can enhance analytics systems with powerful data summarization, cleaning, and semantic transformation capabilities. However, deploying LLMs at scale -- processing millions to billions of rows -- remains prohibitively expensive in computation and memory. We present IOLM-DB, a novel system that makes LLM-enhanced database queries practical through query-specific model optimization. Instead of using general-purpose LLMs, IOLM-DB generates lightweight, specialized models tailored to each query's specific needs using representative data samples. IOLM-DB reduces model footprints by up to 76% and increases throughput by up to 3.31 $\times$ while maintaining accuracy through aggressive compression techniques, including quantization, sparsification, and structural pruning. We further show how our approach enables higher parallelism on existing hardware and seamlessly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.