Data Curation with Deep Learning [Vision]
Saravanan Thirumuruganathan, Nan Tang, Mourad Ouzzani, AnHai Doan

TL;DR
This paper explores how deep learning innovations can enhance data curation processes, aiming to reduce human effort and improve efficiency in managing big data.
Contribution
It provides a comprehensive overview of deep learning's potential to transform data curation and identifies key research opportunities in this intersection.
Findings
Deep learning can automate data discovery and cleaning tasks.
Current solutions are insufficient for evolving data ecosystems.
Research opportunities exist for integrating deep learning into data curation workflows.
Abstract
Data curation - the process of discovering, integrating, and cleaning data - is one of the oldest, hardest, yet inevitable data management problems. Despite decades of efforts from both researchers and practitioners, it is still one of the most time consuming and least enjoyable work of data scientists. In most organizations, data curation plays an important role so as to fully unlock the value of big data. Unfortunately, the current solutions are not keeping up with the ever-changing data ecosystem, because they often require substantially high human cost. Meanwhile, deep learning is making strides in achieving remarkable successes in multiple areas, such as image recognition, natural language processing, and speech recognition. In this vision paper, we explore how some of the fundamental innovations in deep learning could be leveraged to improve existing data curation solutions and to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Scientific Computing and Data Management · Research Data Management Practices
