Improving Data Quality through Deep Learning and Statistical Models

Wei Dai; Kenji Yoshigoe; William Parsley

arXiv:1810.07132·cs.AI·October 17, 2018

Improving Data Quality through Deep Learning and Statistical Models

Wei Dai, Kenji Yoshigoe, William Parsley

PDF

TL;DR

This paper proposes a novel data quality framework that combines deep learning and statistical models to improve data accuracy and outlier detection, surpassing traditional methods in efficiency and effectiveness.

Contribution

The authors introduce a new integrated framework leveraging deep learning and statistical models for enhanced data quality control.

Findings

01

Effective outlier detection in salary data

02

Improved data quality through deep learning techniques

03

Framework applicable to various data types

Abstract

Traditional data quality control methods are based on users experience or previously established business rules, and this limits performance in addition to being a very time consuming process with lower than desirable accuracy. Utilizing deep learning, we can leverage computing resources and advanced techniques to overcome these challenges and provide greater value to users. In this paper, we, the authors, first review relevant works and discuss machine learning techniques, tools, and statistical quality models. Second, we offer a creative data quality framework based on deep learning and statistical model algorithm for identifying data quality. Third, we use data involving salary levels from an open dataset published by the state of Arkansas to demonstrate how to identify outlier data and how to improve data quality via deep learning. Finally, we discuss future work.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.