The Design of an LLM-powered Unstructured Analytics System
Eric Anderson, Jonathan Fritz, Austin Lee, Bohou Li, Mark Lindblad,, Henry Lindeman, Alex Meyer, Parth Parmar, Tanvi Ranade, Mehul A. Shah,, Benjamin Sowell, Dan Tecuci, Vinayak Thapliyal, Matt Welsh

TL;DR
This paper presents Aryn, an LLM-powered unstructured analytics system that enables natural language queries over large document collections, combining semantic planning, document processing, and explainability to improve accuracy and trust.
Contribution
The paper introduces Aryn, a novel system integrating LLMs, a declarative processing engine, and explainability features for scalable unstructured data analytics.
Findings
Aryn achieves higher accuracy than RAG on real-world report analysis.
The system effectively converts raw documents into analyzable formats.
Explainability features enhance user trust in LLM-based analytics.
Abstract
LLMs demonstrate an uncanny ability to process unstructured data, and as such, have the potential to go beyond search and run complex, semantic analyses at scale. We describe the design of an unstructured analytics system, Aryn, and the tenets and use cases that motivate its design. With Aryn, users specify queries in natural language and the system automatically determines a semantic plan and executes it to compute an answer from a large collection of unstructured documents. At the core of Aryn is Sycamore, a declarative document processing engine, that provides a reliable distributed abstraction called DocSets. Sycamore allows users to analyze, enrich, and transform complex documents at scale. Aryn includes Luna, a query planner that translates natural language queries to Sycamore scripts, and DocParse, which takes raw PDFs and document images, and converts them to DocSets for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Byte Pair Encoding · Attention Dropout · Linear Layer · Softmax · Dense Connections · Linear Warmup With Linear Decay · Dropout · Attention Is All You Need · WordPiece
