Missing Values Handling for Machine Learning Portfolios

Andrew Y. Chen; Jack McCoy

arXiv:2207.13071·stat.ME·January 15, 2024·20 cites

Missing Values Handling for Machine Learning Portfolios

Andrew Y. Chen, Jack McCoy

PDF

Open Access 1 Repo

TL;DR

This paper investigates the structure of missing data in financial predictors and finds that simple mean imputation often outperforms complex methods due to the data's block structure and low cross-sectional correlation.

Contribution

It characterizes the origins of missingness in financial predictors and evaluates the effectiveness of different missing value handling techniques in machine learning portfolios.

Findings

01

Simple mean imputation performs well compared to EM methods.

02

Missingness occurs in large blocks organized by time and source.

03

Sophisticated imputations can introduce noise and reduce performance.

Abstract

We characterize the structure and origins of missingness for 159 cross-sectional return predictors and study missing value handling for portfolios constructed using machine learning. Simply imputing with cross-sectional means performs well compared to rigorous expectation-maximization methods. This stems from three facts about predictor data: (1) missingness occurs in large blocks organized by time, (2) cross-sectional correlations are small, and (3) missingness tends to occur in blocks organized by the underlying data source. As a result, observed data provide little information about missing data. Sophisticated imputations introduce estimation noise that can lead to underperformance if machine learning is not carefully applied.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jack-mccoy/missing_data
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFinancial Markets and Investment Strategies · Forecasting Techniques and Applications · Financial Risk and Volatility Modeling