QUEST: Query Optimization in Unstructured Document Analysis
Zhaoze Sun, Qiyan Deng, Chengliang Chai, Kaisen Jin, Xinyu Guo, Han Han, Ye Yuan, Guoren Wang, Lei Cao

TL;DR
QUEST introduces novel optimization strategies for unstructured document analysis using LLMs, significantly reducing costs and improving accuracy by employing index-based retrieval, evidence augmentation, and document-specific query plans.
Contribution
The paper presents a new query optimization framework, QUEST, tailored for LLM-based unstructured document analysis, with strategies to minimize extraction costs and enhance retrieval accuracy.
Findings
Achieves 30%-6x cost savings over baselines.
Improves F1 score by 10%-27%.
Demonstrates effectiveness on three real-world datasets.
Abstract
Most recently, researchers have started building large language models (LLMs) powered data systems that allow users to analyze unstructured text documents like working with a database because LLMs are very effective in extracting attributes from documents. In such systems, LLM-based extraction operations constitute the performance bottleneck of query execution due to the high monetary cost and slow LLM inference. Existing systems typically borrow the query optimization principles popular in relational databases to produce query execution plans, which unfortunately are ineffective in minimizing LLM cost. To fill this gap, we propose QUEST, which features a bunch of novel optimization strategies for unstructured document analysis. First, we introduce an index-based strategy to minimize the cost of each extraction operation. With this index, QUEST quickly retrieves the text segments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Big Data and Digital Economy
