PRESISTANT: Learning based assistant for data pre-processing
Besim Bilalli, Alberto Abell\'o, Tom\`as Aluja-Banet, Robert Wrembel

TL;DR
PRESISTANT is a learning-based assistant that recommends data pre-processing operators to non-experts by predicting their impact on classification performance, thereby improving analysis outcomes.
Contribution
This work introduces PRESISTANT, a tool using Random Forests to rank pre-processing operators based on their effect on classifier accuracy, aiding non-expert users.
Findings
PRESISTANT effectively improves classification accuracy for non-expert users.
The tool successfully predicts the impact of pre-processing operators across multiple classifiers.
Extensive evaluations demonstrate the usefulness of PRESISTANT in real-world scenarios.
Abstract
Data pre-processing is one of the most time consuming and relevant steps in a data analysis process (e.g., classification task). A given data pre-processing operator (e.g., transformation) can have positive, negative or zero impact on the final result of the analysis. Expert users have the required knowledge to find the right pre-processing operators. However, when it comes to non-experts, they are overwhelmed by the amount of pre-processing operators and it is challenging for them to find operators that would positively impact their analysis (e.g., increase the predictive accuracy of a classifier). Existing solutions either assume that users have expert knowledge, or they recommend pre-processing operators that are only "syntactically" applicable to a dataset, without taking into account their impact on the final analysis. In this work, we aim at providing assistance to non-expert…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLogistic Regression
