Abacus: A Cost-Based Optimizer for Semantic Operator Systems
Matthew Russo, Chunwei Liu, Sivaprasad Sudhir, Gerardo Vitagliano, Michael Cafarella, Tim Kraska, Samuel Madden

TL;DR
Abacus is a cost-based optimizer for semantic operator systems that improves performance and reduces costs by intelligently selecting implementations based on minimal validation data and prior knowledge.
Contribution
It introduces a novel, extensible optimizer that effectively balances cost, quality, and latency in semantic operator systems using minimal validation data and LLM judgments.
Findings
Achieves 6.7%-39.4% better quality
Reduces costs by 10.8x
Speeds up processing by 3.4x
Abstract
LLMs enable an exciting new class of data processing applications over large collections of unstructured documents. Several new programming frameworks have enabled developers to build these applications by composing them out of semantic operators: a declarative set of AI-powered data transformations with natural language specifications. These include LLM-powered maps, filters, joins, etc. used for document processing tasks such as information extraction, summarization, and more. While systems of semantic operators have achieved strong performance on benchmarks, they can be difficult to optimize. An optimizer for this setting must determine how to physically implement each semantic operator in a way that optimizes the system globally. Existing optimizers are limited in the number of optimizations they can apply, and most (if not all) cannot optimize system quality, cost, or latency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Natural Language Processing Techniques
MethodsSparse Evolutionary Training
