High-Dimensional Learning in Finance
Hasan Fallahgoul

TL;DR
This paper investigates the theoretical limits of high-dimensional machine learning models in finance, revealing fundamental constraints and explaining why observed successes are often due to artifacts rather than true high-dimensional learning.
Contribution
It provides new theoretical insights into the limitations of high-dimensional models in finance, including kernel approximation effects and lower bounds on learnability.
Findings
Standardization alters kernel approximation in Random Fourier Features.
Lower bounds show practical sample sizes are insufficient for reliable high-dimensional learning.
Observed out-of-sample success likely results from low-complexity artifacts.
Abstract
Recent advances in machine learning have shown promising results for financial prediction using large, over-parameterized models. This paper provides theoretical foundations and empirical validation for understanding when and how these methods achieve predictive success. I examine two key aspects of high-dimensional learning in finance. First, I prove that within-sample standardization in Random Fourier Features implementations fundamentally alters the underlying Gaussian kernel approximation, replacing shift-invariant kernels with training-set dependent alternatives. Second, I establish information-theoretic lower bounds that identify when reliable learning is impossible no matter how sophisticated the estimator. A detailed quantitative calibration of the polynomial lower bound shows that with typical parameter choices, e.g., 12,000 features, 12 monthly observations, and R-square 2-3%,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Stock Market Forecasting Methods · Generative Adversarial Networks and Image Synthesis
