An Evaluation of Classification and Outlier Detection Algorithms
Victoria J. Hodge, Jim Austin

TL;DR
This paper compares the accuracy of six fast classification and outlier detection algorithms on temporal data, providing heuristics for selecting the best method based on task and data characteristics.
Contribution
It offers a systematic evaluation of rapid algorithms for classification and outlier detection in time-series data, with practical heuristics for algorithm choice.
Findings
Gradient Boosting Machines excel in classification.
No single best algorithm for outlier detection; GBM and Random Forest perform well.
Heuristics can guide algorithm selection based on data and task.
Abstract
This paper evaluates algorithms for classification and outlier detection accuracies in temporal data. We focus on algorithms that train and classify rapidly and can be used for systems that need to incorporate new data regularly. Hence, we compare the accuracy of six fast algorithms using a range of well-known time-series datasets. The analyses demonstrate that the choice of algorithm is task and data specific but that we can derive heuristics for choosing. Gradient Boosting Machines are generally best for classification but there is no single winner for outlier detection though Gradient Boosting Machines (again) and Random Forest are better. Hence, we recommend running evaluations of a number of algorithms using our heuristics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Fault Detection and Control Systems · Artificial Immune Systems Applications
