Automating Data Science Pipelines with Tensor Completion
Shaan Pakala, Bryce Graw, Dawon Ahn, Tam Dinh, Mehnaz Tabassum Mahin,, Vassilis Tsotras, Jia Chen, Evangelos E. Papalexakis

TL;DR
This paper models key data science pipeline operations as tensor completion problems, evaluating methods and proposing adaptations to automate and improve hyperparameter tuning, neural architecture search, and query estimation.
Contribution
It introduces a tensor completion framework for automating various data science tasks and proposes domain-inspired adaptations and ensemble methods for improved performance.
Findings
State-of-the-art tensor completion methods are effective for data science pipeline tasks.
Domain-inspired adaptations improve tensor completion accuracy.
The proposed ensemble technique achieves superior results across multiple datasets.
Abstract
Hyperparameter optimization is an essential component in many data science pipelines and typically entails exhaustive time and resource-consuming computations in order to explore the combinatorial search space. Similar to this problem, other key operations in data science pipelines exhibit the exact same properties. Important examples are: neural architecture search, where the goal is to identify the best design choices for a neural network, and query cardinality estimation, where given different predicate values for a SQL query the goal is to estimate the size of the output. In this paper, we abstract away those essential components of data science pipelines and we model them as instances of tensor completion, where each variable of the search space corresponds to one mode of the tensor, and the goal is to identify all missing entries of the tensor, corresponding to all combinations of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Tensor decomposition and applications
