Should data ever be thrown away? Pooling interval-censored data sets with different precision
Krasymyr Tretiak, Scott Ferson

TL;DR
This paper investigates whether it is beneficial to pool data sets of varying quality, especially imprecise versus precise measurements, in engineering applications, using simulation to determine optimal data inclusion strategies.
Contribution
It provides a comparative analysis of pooling versus excluding imprecise data based on mathematical representations of imprecision and simulation results.
Findings
Pooling is advantageous when low-quality data's uncertainty is below a threshold.
Excluding imprecise data can be justified if it does not significantly increase sampling uncertainty.
The choice depends on the balance between data imprecision and the reduction of sampling uncertainty.
Abstract
Data quality is an important consideration in many engineering applications and projects. Data collection procedures do not always involve careful utilization of the most precise instruments and strictest protocols. As a consequence, data are invariably affected by imprecision and sometimes sharply varying levels of quality of the data. Different mathematical representations of imprecision have been suggested, including a classical approach to censored data which is considered optimal when the proposed error model is correct, and a weaker approach called interval statistics based on partial identification that makes fewer assumptions. Maximizing the quality of statistical results is often crucial to the success of many engineering projects, and a natural question that arises is whether data of differing qualities should be pooled together or we should include only precise measurements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProbabilistic and Robust Engineering Design · Numerical Methods and Algorithms · Scientific Measurement and Uncertainty Evaluation
