Statistical Industry Classification

Zura Kakushadze; Willie Yu

arXiv:1607.04883·q-fin.PM·January 1, 2019

Statistical Industry Classification

Zura Kakushadze, Willie Yu

PDF

Open Access

TL;DR

This paper presents algorithms and source code for constructing multilevel statistical industry classifications, enhancing traditional classifications with data-driven clustering methods for improved quantitative trading strategies.

Contribution

It introduces complete algorithms and source code for statistical industry classification, including hybrid methods that improve existing fundamental classifications.

Findings

01

Backtests show clustering choices significantly impact results

02

Algorithms effectively create multilevel classifications

03

Hybrid classifications outperform traditional methods

Abstract

We give complete algorithms and source code for constructing (multilevel) statistical industry classifications, including methods for fixing the number of clusters at each level (and the number of levels). Under the hood there are clustering algorithms (e.g., k-means). However, what should we cluster? Correlations? Returns? The answer turns out to be neither and our backtests suggest that these details make a sizable difference. We also give an algorithm and source code for building "hybrid" industry classifications by improving off-the-shelf "fundamental" industry classifications by applying our statistical industry classification methods to them. The presentation is intended to be pedagogical and geared toward practical applications in quantitative trading.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Complex Systems and Time Series Analysis · Advanced Statistical Methods and Models