Peoples Water Data: Enabling Reliable Field Data Generation and Microbial Contamination Screening in Household Drinking Water
Suzan Kagan, Shira Spigelman, Sankar Sudhir, Thalappil Pradeep, Hadas Mamane

TL;DR
This study presents a machine-learning framework to predict E. coli contamination in household drinking water using low-cost indicators, aiming to improve water safety monitoring in resource-limited settings.
Contribution
The paper introduces a scalable, AI-supported decision-support tool for microbiological risk assessment in decentralized water testing environments.
Findings
Successfully predicted E. coli presence with the framework.
Enhanced data reliability through AI-supported field implementation.
Provided a scalable approach for microbiological screening in low-resource areas.
Abstract
Unsafe drinking water remains a major public health concern globally, particularly in low-resource regions where routine microbiological surveillance is limited. Although Escherichia coli is the internationally recognized indicator of fecal contamination, laboratory-based testing is often inaccessible at scale. In this study, we developed and evaluated a two-stage machine-learning framework for predicting E. coli presence in decentralized household point-of-use drinking water in Chennai, India using low-cost physicochemical and contextual indicators. The dataset comprised 3,023 samples collected under the Peoples Water Data initiative; after harmonization, technical cleaning, and outlier screening, 2,207 valid samples were retained. This framework provides a scalable decision-support tool for prioritizing microbiological testing in resource-constrained environments and addresses an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
