Guided Data Repair
Mohamed Yakout (Purdue University), Ahmed K. Elmagarmid (Qatar, Computing Research Institute), Jennifer Neville (Purdue University), Mourad, Ouzzani (Purdue University), Ihab F. Ilyas (University of Waterloo)

TL;DR
GDR is a framework that combines user feedback and machine learning to improve data repair processes, reducing user effort while enhancing data quality.
Contribution
It introduces a novel approach that integrates active learning and value of information to prioritize repairs and minimize user involvement in data cleaning.
Findings
Significant data quality improvement demonstrated empirically.
Effective trade-off between user effort and data quality achieved.
Adaptive refinement of the repair model based on user feedback.
Abstract
In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the cleaning process to enhance and accelerate existing automatic repair techniques while minimizing user involvement. GDR consults the user on the updates that are most likely to be beneficial in improving data quality. GDR also uses machine learning methods to identify and apply the correct updates directly to the database without the actual involvement of the user on these specific updates. To rank potential updates for consultation by the user, we first group these repairs and quantify the utility of each group using the decision-theory concept of value of information (VOI). We then apply active learning to order updates within a group based on their ability to improve the learned model. User feedback is used to repair the database and to adaptively refine the training set for the model.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management
