TL;DR
This paper introduces a unified taxonomy and a comprehensive benchmark for Large Language Model-Enhanced Relational Operators (LROs), analyzing their design, implementation, and performance across diverse datasets.
Contribution
It establishes a unified taxonomy for LROs, creates the LROBench benchmark suite, and provides empirical insights and best practices for designing effective LRO systems.
Findings
LROs can be categorized into Select, Match, Impute, Cluster, and Order.
LROBench includes 290 single-LRO and 60 multi-LRO queries across 27 databases.
Empirical evaluation reveals key design choices and performance trade-offs.
Abstract
With the development of large language models (LLMs), numerous studies integrate LLMs through operator-like components to enhance relational data processing tasks, e.g., filters with semantic predicates, knowledge-augmented table imputation, reasoning-driven entity matching and more challenging semantic query processing. These components invoke LLMs while preserving a relational input/output interface, which we refer to as LLM-Enhanced Relational Operators (LROs). From an operator perspective, unfortunately, these existing LROs suffer from fragmented definition, various implementation strategies and inadequate evaluation benchmarks. To this end, in this paper, we first establish a unified LRO taxonomy to align existing LROs, and categorize them into: Select, Match, Impute, Cluster and Order, along with their operands and implementation variants. Second, we design LROBench, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
