Improving Data Quality through Deep Learning and Statistical Models
Wei Dai, Kenji Yoshigoe, William Parsley

TL;DR
This paper proposes a novel data quality framework that combines deep learning and statistical models to improve data accuracy and outlier detection, surpassing traditional methods in efficiency and effectiveness.
Contribution
The authors introduce a new integrated framework leveraging deep learning and statistical models for enhanced data quality control.
Findings
Effective outlier detection in salary data
Improved data quality through deep learning techniques
Framework applicable to various data types
Abstract
Traditional data quality control methods are based on users experience or previously established business rules, and this limits performance in addition to being a very time consuming process with lower than desirable accuracy. Utilizing deep learning, we can leverage computing resources and advanced techniques to overcome these challenges and provide greater value to users. In this paper, we, the authors, first review relevant works and discuss machine learning techniques, tools, and statistical quality models. Second, we offer a creative data quality framework based on deep learning and statistical model algorithm for identifying data quality. Third, we use data involving salary levels from an open dataset published by the state of Arkansas to demonstrate how to identify outlier data and how to improve data quality via deep learning. Finally, we discuss future work.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
