Investigation of robustness and numerical stability of multiple regression and PCA in modeling world development data
Chen Ye Gan

TL;DR
This paper evaluates the robustness and numerical stability of multiple regression and PCA when applied to complex, high-dimensional world development data, revealing limitations but also some qualitative insights.
Contribution
It systematically assesses the numerical stability of regression and PCA on complex datasets, highlighting their limitations and potential for qualitative analysis.
Findings
Both methods show poor numerical stability.
Limited variance capture by PCA and regression.
Qualitative insights are still obtainable.
Abstract
Popular methods for modeling data both labelled and unlabeled, multiple regression and PCA has been used in research for a vast number of datasets. In this investigation, we attempt to push the limits of these two methods by running a fit on world development data, a set notorious for its complexity and high dimensionality. We assess the robustness and numerical stability of both methods using their matrix condition number and ability to capture variance in the dataset. The result indicates poor performance from both methods from a numerical standpoint, yet certain qualitative insights can still be captured.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWater Resources and Sustainability
