Categorical data as a stone guest in a data science project for predicting defective water meters
Giovanni Delnevo, Marco Roccetti, Luca Casini

TL;DR
This study developed a deep learning classifier to predict water meter failures using large-scale data, revealing that categorical data did not significantly enhance prediction accuracy, highlighting the importance of domain expertise in data science.
Contribution
The paper demonstrates the limited impact of categorical data on prediction accuracy in water meter failure detection and emphasizes the need for domain knowledge in feature importance assessment.
Findings
Prediction accuracy exceeded 80% with continuous data.
Adding categorical data did not significantly improve performance.
Highlights the importance of domain expertise in data feature relevance.
Abstract
After a one-year long effort of research on the field, we developed a machine learning-based classifier, tailored to predict whether a mechanical water meter would fail with passage of time and intensive use as well. A recurrent deep neural network (RNN) was trained with data extrapolated from 15 million readings of water consumption, gathered from 1 million meters. The data we used for training were essentially of two types: continuous vs categorical. Categorical being a type of data that can take on one of a limited and fixed number of possible values, on the basis of some qualitative property; while continuous, in this case, are the values of the measurements. taken at the meters, of the quantity of consumed water (cubic meters). In this paper, we want to discuss the fact that while the prediction accuracy of our RNN has exceeded the 80% on average, based on the use of continuous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Water Systems and Optimization · Machine Learning and Data Classification
