Data-driven Lake Water Quality Forecasting for Time Series with Missing Data using Machine Learning
Rishit Chatterjee, Tahiya Chowdhury

TL;DR
This study develops a machine learning approach to forecast lake water quality using irregular, incomplete time series data, identifying minimal data and feature requirements for accurate predictions to guide efficient monitoring.
Contribution
It introduces a joint feasibility function that determines minimal sampling and feature sets needed for accurate lake water quality forecasting with missing data.
Findings
Ridge regression outperforms other models in accuracy.
Approximately 176 samples per lake suffice for 5% accuracy.
A four-feature subset matches full feature set performance.
Abstract
Volunteer-led lake monitoring yields irregular, seasonal time series with many gaps arising from ice cover, weather-related access constraints, and occasional human errors, complicating forecasting and early warning of harmful algal blooms. We study Secchi Disk Depth (SDD) forecasting on a 30-lake, data-rich subset drawn from three decades of in situ records collected across Maine lakes. Missingness is handled via Multiple Imputation by Chained Equations (MICE), and we evaluate performance with a normalized Mean Absolute Error (nMAE) metric for cross-lake comparability. Among six candidates, ridge regression provides the best mean test performance. Using ridge regression, we then quantify the minimal sample size, showing that under a backward, recent-history protocol, the model reaches within 5% of full-history accuracy with approximately 176 training samples per lake on average. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAquatic Ecosystems and Phytoplankton Dynamics · Hydrological Forecasting Using AI · Oceanographic and Atmospheric Processes
