A Data Science Approach to Understanding Residential Water Contamination in Flint
Alex Chojnacki, Chengyu Dai, Arya Farahi, Guangsha Shi, Jared Webb,, Daniel T. Zhang, Jacob Abernethy, Eric Schwartz

TL;DR
This paper uses a large dataset of water tests from Flint to predict lead contamination levels, identify risk factors, and develop tools to inform residents, with broader implications for water safety analysis.
Contribution
It introduces a predictive model for lead contamination based on home attributes and incorporates community-sourced data into accessible risk assessment tools.
Findings
Elevated lead risks can be weakly predicted from home attributes.
Factors influencing testing behavior are identified.
A web and mobile app provides risk assessments to residents.
Abstract
When the residents of Flint learned that lead had contaminated their water system, the local government made water-testing kits available to them free of charge. The city government published the results of these tests, creating a valuable dataset that is key to understanding the causes and extent of the lead contamination event in Flint. This is the nation's largest dataset on lead in a municipal water system. In this paper, we predict the lead contamination for each household's water supply, and we study several related aspects of Flint's water troubles, many of which generalize well beyond this one city. For example, we show that elevated lead risks can be (weakly) predicted from observable home attributes. Then we explore the factors associated with elevated lead. These risk assessments were developed in part via a crowd sourced prediction challenge at the University of Michigan.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
