A computational study on imputation methods for missing environmental data
Paul Dixneuf, Fausto Errico, Mathias Glaus

TL;DR
This study compares imputation methods for missing environmental data, finding that missForest generally outperforms MICE and KNN, especially in mixed-type datasets, and demonstrates its application in wastewater treatment monitoring.
Contribution
The paper provides a comprehensive computational comparison of missForest, MICE, and KNN for imputing missing environmental data, highlighting missForest's superior accuracy in mixed datasets.
Findings
missForest outperforms MICE and KNN in imputation accuracy
missForest reduces error by up to 150% in mixed datasets
KNN is the fastest method among those tested
Abstract
Data acquisition and recording in the form of databases are routine operations. The process of collecting data, however, may experience irregularities, resulting in databases with missing data. Missing entries might alter analysis efficiency and, consequently, the associated decision-making process. This paper focuses on databases collecting information related to the natural environment. Given the broad spectrum of recorded activities, these databases typically are of mixed nature. It is therefore relevant to evaluate the performance of missing data processing methods considering this characteristic. In this paper we investigate the performances of several missing data imputation methods and their application to the problem of missing data in environment. A computational study was performed to compare the method missForest (MF) with two other imputation methods, namely Multivariate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoil and Water Nutrient Dynamics · Statistical Methods and Bayesian Inference · Soil Geostatistics and Mapping
