Towards General-Purpose Data Discovery: A Programming Languages Approach
Andrew Kang, Yashnil Saha, Sainyam Galhotra

TL;DR
This paper introduces TQL, a domain-specific language for data discovery that leverages programming language research to enable expressive and formal query characterization, supported by a formal model and prototype.
Contribution
It presents TQL, a new formal language for data discovery, and its algebraic model ImpRAT, advancing the development of general-purpose data discovery tools.
Findings
Formal characterization of TQL using ImpRAT
Implementation of a modular prototype system
Enhanced expressiveness for data discovery queries
Abstract
Efficient and effective data discovery is critical for many modern applications in machine learning and data science. One major bottleneck to the development of a general-purpose data discovery tool is the absence of an expressive formal language, and corresponding implementation, for characterizing and solving generic discovery queries. To this end, we present TQL, a domain-specific language for data discovery well-designed to leverage and exploit the results of programming languages research in both its syntax and semantics. In this paper, we fully and formally characterize the core language through an algebraic model, Imperative Relational Algebra with Types (ImpRAT), and implement a modular proof-of-concept system prototype.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Data Mining Algorithms and Applications · Advanced Database Systems and Queries
