Identifying Excessively Rounded or Truncated Data
Kevin H. Knuth, J. Patrick Castle, and Kevin R. Wheeler

TL;DR
This paper presents a simple method using optimal histogram binning to detect when digitization effects in data are significant enough to cause information loss, ensuring data quality before analysis.
Contribution
It introduces a novel, straightforward technique to identify excessive rounding or truncation in digitized data using optimal histogram binning.
Findings
Effective detection of digitization artifacts in data sets
Ability to identify when digitization impacts data structure
Prevents irreversible information loss in data analysis
Abstract
All data are digitized, and hence are essentially integers rather than true real numbers. Ordinarily this causes no difficulties since the truncation or rounding usually occurs below the noise level. However, in some instances, when the instruments or data delivery and storage systems are designed with less than optimal regard for the data or the subsequent data analysis, the effects of digitization may be comparable to important features contained within the data. In these cases, information has been irrevocably lost in the truncation process. While there exist techniques for dealing with truncated data, we propose a straightforward method that will allow us to detect this problem before the data analysis stage. It is based on an optimal histogram binning algorithm that can identify when the statistical structure of the digitization is on the order of the statistical structure of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Neural Networks and Applications · Computational Physics and Python Applications
