The Case for Instance-Optimized LLMs in OLAP Databases
Bardia Mohammadi, Laurent Bindschaedler

TL;DR
This paper introduces IOLM-DB, a system that creates query-specific, lightweight LLMs to enable scalable, accurate, and efficient LLM-enhanced database analytics at large scale.
Contribution
IOLM-DB develops a method for generating specialized, compressed LLMs per query, significantly reducing model size and increasing throughput in OLAP systems.
Findings
Model footprints reduced by up to 76%.
Throughput increased by up to 3.31×.
Maintains accuracy with aggressive compression.
Abstract
Large Language Models (LLMs) can enhance analytics systems with powerful data summarization, cleaning, and semantic transformation capabilities. However, deploying LLMs at scale -- processing millions to billions of rows -- remains prohibitively expensive in computation and memory. We present IOLM-DB, a novel system that makes LLM-enhanced database queries practical through query-specific model optimization. Instead of using general-purpose LLMs, IOLM-DB generates lightweight, specialized models tailored to each query's specific needs using representative data samples. IOLM-DB reduces model footprints by up to 76% and increases throughput by up to 3.31 while maintaining accuracy through aggressive compression techniques, including quantization, sparsification, and structural pruning. We further show how our approach enables higher parallelism on existing hardware and seamlessly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
