Probabilistic Models for Anomaly Detection in Remote Sensor Data Streams
Ethan W. Dereszynski, Thomas G. Dietterich

TL;DR
This paper presents a Dynamic Bayesian Network model for detecting sensor failures in ecological data streams, aiming to automate data cleaning and improve accuracy in identifying valid sensor observations.
Contribution
It introduces a novel probabilistic model that combines temperature variation patterns with a fault detection mechanism for sensor data cleaning.
Findings
Model achieves precision and recall comparable to domain experts
Experiments validate the model's effectiveness on historical data
System is being deployed for real-time data cleaning
Abstract
Remote sensors are becoming the standard for observing and recording ecological data in the field. Such sensors can record data at fine temporal resolutions, and they can operate under extreme conditions prohibitive to human access. Unfortunately, sensor data streams exhibit many kinds of errors ranging from corrupt communications to partial or total sensor failures. This means that the raw data stream must be cleaned before it can be used by domain scientists. In our application environment|the H.J. Andrews Experimental Forest|this data cleaning is performed manually. This paper introduces a Dynamic Bayesian Network model for analyzing sensor observations and distinguishing sensor failures from valid data for the case of air temperature measured at 15 minute time resolution. The model combines an accurate distribution of long-term and short-term temperature variations with a single…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Anomaly Detection Techniques and Applications · Data Quality and Management
