TorchQL: A Programming Framework for Integrity Constraints in Machine   Learning

Aaditya Naik; Adam Stein; Yinjun Wu; Mayur Naik; Eric Wong

arXiv:2308.06686·cs.DB·October 17, 2024

TorchQL: A Programming Framework for Integrity Constraints in Machine Learning

Aaditya Naik, Adam Stein, Yinjun Wu, Mayur Naik, Eric Wong

PDF

Open Access

TL;DR

TorchQL is a flexible programming framework that enables scalable, expressive integrity constraint checks in machine learning applications, improving correctness and efficiency across diverse use cases.

Contribution

It introduces TorchQL, a novel framework combining relational algebra and functional programming for specifying and checking integrity constraints in ML workflows.

Findings

01

Up to 13x faster query execution compared to Pandas and MongoDB.

02

Queries are up to 40% shorter than native Python.

03

User study shows TorchQL is intuitive for Python developers.

Abstract

Finding errors in machine learning applications requires a thorough exploration of their behavior over data. Existing approaches used by practitioners are often ad-hoc and lack the abstractions needed to scale this process. We present TorchQL, a programming framework to evaluate and improve the correctness of machine learning applications. TorchQL allows users to write queries to specify and check integrity constraints over machine learning models and datasets. It seamlessly integrates relational algebra with functional programming to allow for highly expressive queries using only eight intuitive operators. We evaluate TorchQL on diverse use-cases including finding critical temporal inconsistencies in objects detected across video frames in autonomous driving, finding data imputation errors in time-series medical records, finding data labeling errors in real-world images, and evaluating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Advanced Database Systems and Queries