Analytical Queries for Unstructured Data
Daniel Kang

TL;DR
This paper reviews recent advances in data management systems for unstructured data, especially video analytics, highlighting challenges in query execution, expression, and error handling, and discussing solutions involving ML models.
Contribution
It provides a comprehensive overview of recent techniques addressing the challenges of querying unstructured data with ML, including query formulation, optimization, and error mitigation.
Findings
User-defined functions enable flexible query expression.
Approximate query processing reduces ML model costs.
Outlier and drift detection improve ML model reliability.
Abstract
Unstructured data, in the form of text, images, video, and audio, is produced at exponentially higher rates. In tandem, machine learning (ML) methods have become increasingly powerful at analyzing unstructured data. Modern ML methods can now detect objects in images, understand actions in videos, and even classify complex legal texts based on legal intent. Combined, these trends make it increasingly feasible for analysts and researchers to automatically understand the "real world." However, there are major challenges in deploying these techniques: 1) executing queries efficiently given the expense of ML methods, 2) expressing queries over bespoke forms of data, and 3) handling errors in ML methods. In this monograph, we discuss challenges and advances in data management systems for unstructured data using ML, with a particular focus on video analytics. Using ML to answer queries…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Anomaly Detection Techniques and Applications · Advanced Database Systems and Queries
