Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery
Jianqing Fan, Runze Li

TL;DR
This paper reviews the statistical challenges of high-dimensional data in various fields and proposes a unified framework using penalized likelihood methods for variable selection and feature extraction.
Contribution
It provides a comprehensive overview of high-dimensional statistical challenges and introduces a unified penalized likelihood approach for variable selection across diverse applications.
Findings
Penalized likelihood methods effectively address high-dimensional variable selection.
Model estimation remains reliable when dimensionality is not excessively large.
Theoretical properties like persistence in risk minimization are established.
Abstract
Technological innovations have revolutionized the process of scientific research and knowledge discovery. The availability of massive data and challenges from frontiers of research and development have reshaped statistical thinking, data analysis and theoretical studies. The challenges of high-dimensionality arise in diverse fields of sciences and the humanities, ranging from computational biology and health studies to financial engineering and risk management. In all of these fields, variable selection and feature extraction are crucial for knowledge discovery. We first give a comprehensive overview of statistical challenges with high dimensionality in these diverse disciplines. We then approach the problem of variable selection and feature extraction using a unified framework: penalized likelihood methods. Issues relevant to the choice of penalty functions are addressed. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Statistical Methods and Inference · Gene expression and cancer classification
