# catch22: CAnonical Time-series CHaracteristics

**Authors:** Carl H Lubba, Sarab S Sethi, Philip Knaute, Simon R Schultz, and Ben D Fulcher, Nick S Jones

arXiv: 1901.10200 · 2019-02-05

## TL;DR

The paper introduces catch22, a compact set of 22 interpretable features for time series analysis, achieving near state-of-the-art classification with significantly reduced computational cost.

## Contribution

It presents a systematic method to select a small, effective, and minimally redundant feature set from a large library, enabling efficient time series classification.

## Key findings

- Catch22 reduces feature set size from 4791 to 22.
- Achieves 1000-fold faster computation with only 7% accuracy loss.
- Applicable across diverse scientific and industrial domains.

## Abstract

Capturing the dynamical properties of time series concisely as interpretable feature vectors can enable efficient clustering and classification for time-series applications across science and industry. Selecting an appropriate feature-based representation of time series for a given application can be achieved through systematic comparison across a comprehensive time-series feature library, such as those in the hctsa toolbox. However, this approach is computationally expensive and involves evaluating many similar features, limiting the widespread adoption of feature-based representations of time series for real-world applications. In this work, we introduce a method to infer small sets of time-series features that (i) exhibit strong classification performance across a given collection of time-series problems, and (ii) are minimally redundant. Applying our method to a set of 93 time-series classification datasets (containing over 147000 time series) and using a filtered version of the hctsa feature library (4791 features), we introduce a generically useful set of 22 CAnonical Time-series CHaracteristics, catch22. This dimensionality reduction, from 4791 to 22, is associated with an approximately 1000-fold reduction in computation time and near linear scaling with time-series length, despite an average reduction in classification accuracy of just 7%. catch22 captures a diverse and interpretable signature of time series in terms of their properties, including linear and non-linear autocorrelation, successive differences, value distributions and outliers, and fluctuation scaling properties. We provide an efficient implementation of catch22, accessible from many programming environments, that facilitates feature-based time-series analysis for scientific, industrial, financial and medical applications using a common language of interpretable time-series properties.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.10200/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/1901.10200/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/1901.10200/full.md

---
Source: https://tomesphere.com/paper/1901.10200