Optimizing LLM Queries in Relational Data Analytics Workloads

Shu Liu; Asim Biswal; Amog Kamsetty; Audrey Cheng; Luis Gaspar; Schroeder; Liana Patel; Shiyi Cao; Xiangxi Mo; Ion Stoica; Joseph E.; Gonzalez; Matei Zaharia

arXiv:2403.05821·cs.LG·April 10, 2025·6 cites

Optimizing LLM Queries in Relational Data Analytics Workloads

Shu Liu, Asim Biswal, Amog Kamsetty, Audrey Cheng, Luis Gaspar, Schroeder, Liana Patel, Shiyi Cao, Xiangxi Mo, Ion Stoica, Joseph E., Gonzalez, Matei Zaharia

PDF

Open Access

TL;DR

This paper introduces algorithms to reorder data in relational analytics workloads to maximize cache reuse, significantly reducing LLM inference costs and improving processing times.

Contribution

It presents novel reordering algorithms that enhance cache efficiency for LLM queries in relational data analytics, a previously underexplored area.

Findings

01

Up to 3.4x faster job completion times

02

32% cost savings on LLM inference

03

Effective for diverse LLM-based queries

Abstract

Batch data analytics is a growing application for Large Language Models (LLMs). LLMs enable users to perform a wide range of natural language tasks, such as classification, entity extraction, and translation, over large datasets. However, LLM inference is highly costly and slow: for example, an NVIDIA L4 GPU running Llama3-8B can only process 6 KB of text per second, taking about a day to handle 15 GB of data; processing a similar amount of data costs around $10K on OpenAI's GPT-4o. In this paper, we propose novel techniques that can significantly reduce the cost of LLM calls for relational data analytics workloads. Our key contribution is developing efficient algorithms for reordering the rows and the fields within each row of an input table to maximize key-value (KV) cache reuse when performing LLM serving. As such, our approach can be easily applied to existing analytics systems and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Semantic Web and Ontologies · Advanced Database Systems and Queries