LLM-Powered Proactive Data Systems

Sepanta Zeighami; Yiming Lin; Shreya Shankar; Aditya Parameswaran

arXiv:2502.13016·cs.DB·February 19, 2025

LLM-Powered Proactive Data Systems

Sepanta Zeighami, Yiming Lin, Shreya Shankar, Aditya Parameswaran

PDF

Open Access

TL;DR

This paper advocates for proactive data systems powered by LLMs that understand, rework, and optimize user inputs and data, moving beyond reactive, black-box approaches to improve efficiency and correctness.

Contribution

It introduces a new proactive framework for data systems leveraging LLMs to understand and manipulate data and queries, enabling more intelligent and user-aware operations.

Findings

01

Proposed a proactive data system framework using LLMs.

02

Demonstrated improved efficiency in real-world tasks.

03

Outlined future research directions in proactive data management.

Abstract

With the power of LLMs, we now have the ability to query data that was previously impossible to query, including text, images, and video. However, despite this enormous potential, most present-day data systems that leverage LLMs are reactive, reflecting our community's desire to map LLMs to known abstractions. Most data systems treat LLMs as an opaque black box that operates on user inputs and data as is, optimizing them much like any other approximate, expensive UDFs, in conjunction with other relational operators. Such data systems do as they are told, but fail to understand and leverage what the LLM is being asked to do (i.e. the underlying operations, which may be error-prone), the data the LLM is operating on (e.g., long, complex documents), or what the user really needs. They don't take advantage of the characteristics of the operations and/or the data at hand, or ensure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Advanced Data Storage Technologies