Task Cascades for Efficient Unstructured Data Processing
Shreya Shankar, Sepanta Zeighami, Aditya Parameswaran

TL;DR
This paper introduces task cascades, a flexible framework for unstructured data processing that optimizes cost and accuracy by decomposing operations and selecting relevant document portions, outperforming traditional model cascades.
Contribution
The paper proposes a generalized task cascades framework that leverages LLMs to generate and select task decompositions and document segments, significantly improving efficiency over standard cascades.
Findings
Achieves an average of 36% cost reduction across eight real-world tasks.
Provides an extension with statistical accuracy guarantees.
Demonstrates effectiveness at a 90% accuracy target.
Abstract
Modern database systems allow users to query or process unstructured text or document columns using LLM-powered functions. Users can express an operation in natural language (e.g., "identify if this review mentions billing issues"), with the system executing the operation on each document, in a row-by-row fashion. One way to reduce cost on a batch of documents is to employ the model cascade framework: a cheap proxy model processes each document, and only uncertain cases are escalated to a more accurate, expensive oracle. However, model cascades miss important optimization opportunities; for example, often only part of a document is needed to answer a query, or other related, but simpler operations (e.g., "is the review sentiment negative?", "does the review mention money?") can be handled by cheap models more effectively than the original operation, while still being correlated with it.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Graph Theory and Algorithms · Data Quality and Management
