Exploring the Limitations of kNN Noisy Feature Detection and Recovery for Self-Driving Labs
Qiuyu Shi, Kangming Li, Yao Fehlis, Runze Zhang, Daniel Persaud, Robert Black, Jason Hattrick-Simpers

TL;DR
This paper presents a model-agnostic workflow for detecting and recovering noisy features in self-driving laboratories, improving data quality for materials discovery through systematic analysis of noise effects and kNN imputation performance.
Contribution
It introduces a novel automated framework for identifying and correcting noisy features in SDL datasets, with comprehensive analysis of factors affecting recoverability and a benchmark of kNN imputation.
Findings
High noise and large datasets improve detection and correction.
Low noise can be mitigated with more clean data.
Feature distribution impacts recoverability.
Abstract
Self-driving laboratories (SDLs) have shown promise to accelerate materials discovery by integrating machine learning with automated experimental platforms. However, errors in the capture of input parameters may corrupt the features used to model system performance, compromising current and future campaigns. This study develops an automated workflow to systematically detect noisy features, determine sample-feature pairings that can be corrected, and finally recover the correct feature values. A systematic study is then performed to examine how dataset size, noise intensity, noise type, and feature value distribution affect both the detectability and recoverability of noisy features on both Density Functional Theory (DFT) and SDL datasets. In general, high-intensity noise and large training datasets are conducive to the detection and correction of noisy features. Low-intensity noise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Anomaly Detection Techniques and Applications · Industrial Vision Systems and Defect Detection
