tsrobprep - an R package for robust preprocessing of time series data
Micha{\l} Narajewski, Jens Kley-Holsteg, Florian Ziel

TL;DR
The tsrobprep R package offers robust, efficient methods for cleaning and preprocessing time series data, including missing value imputation and outlier detection, tailored for energy system datasets.
Contribution
It introduces model-based imputation and clustering-based outlier detection methods that are robust, tunable, and easy to apply in a single step.
Findings
Effective handling of missing data and outliers in energy time series.
Probabilistic outlier detection with cause attribution.
Single-function preprocessing simplifies data cleaning.
Abstract
Data cleaning is a crucial part of every data analysis exercise. Yet, the currently available R packages do not provide fast and robust methods for cleaning and preparation of time series data. The open source package tsrobprep introduces efficient methods for handling missing values and outliers using model based approaches. For data imputation a probabilistic replacement model is proposed, which may consist of autoregressive components and external inputs. For outlier detection a clustering algorithm based on finite mixture modelling is introduced, which considers time series properties in terms of the gradient and the underlying seasonality as features. The procedure allows to return a probability for each observation being outlying data as well as a specific cause for an outlier assignment in terms of the provided feature space. The methods work robust and are fully tunable.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
