An Evaluation of Low Overhead Time Series Preprocessing Techniques for Downstream Machine Learning
Matthew L. Weiss, Joseph McDonald, David Bestor, Charles Yee, Daniel, Edelman, Michael Jones, Andrew Prout, Andrew Bowne, Lindsey McEvoy, Vijay, Gadepally, Siddharth Samsi

TL;DR
This paper evaluates low overhead preprocessing techniques for multi-channel time series data to improve machine learning classification accuracy, addressing the alignment problem caused by data misalignment and demonstrating high accuracy results.
Contribution
It introduces three low overhead preprocessing methods for aligning multi-channel time series data, achieving over 95% classification accuracy and outperforming previous approaches.
Findings
Achieved over 95% classification accuracy.
Outperformed previous methods by 5%.
Proposed low overhead alignment techniques.
Abstract
In this paper we address the application of pre-processing techniques to multi-channel time series data with varying lengths, which we refer to as the alignment problem, for downstream machine learning. The misalignment of multi-channel time series data may occur for a variety of reasons, such as missing data, varying sampling rates, or inconsistent collection times. We consider multi-channel time series data collected from the MIT SuperCloud High Performance Computing (HPC) center, where different job start times and varying run times of HPC jobs result in misaligned data. This misalignment makes it challenging to build AI/ML approaches for tasks such as compute workload classification. Building on previous supervised classification work with the MIT SuperCloud Dataset, we address the alignment problem via three broad, low overhead approaches: sampling a fixed subset from a full time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Data Stream Mining Techniques · Anomaly Detection Techniques and Applications
