Feature Selection: A Data Perspective
Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P., Trevino, Jiliang Tang, Huan Liu

TL;DR
This paper provides a comprehensive overview of recent advances in feature selection, emphasizing data perspectives and categorizing algorithms for various data types, with an open-source repository for evaluation.
Contribution
It offers a structured survey of feature selection algorithms across different data types and introduces an open-source repository for benchmarking and evaluation.
Findings
Categorizes feature selection methods into four main groups.
Reviews algorithms for conventional, structured, heterogeneous, and streaming data.
Provides an open-source repository for algorithm evaluation.
Abstract
Feature selection, as a data preprocessing strategy, has been proven to be effective and efficient in preparing data (especially high-dimensional data) for various data mining and machine learning problems. The objectives of feature selection include: building simpler and more comprehensible models, improving data mining performance, and preparing clean, understandable data. The recent proliferation of big data has presented some substantial challenges and opportunities to feature selection. In this survey, we provide a comprehensive and structured overview of recent advances in feature selection research. Motivated by current challenges and opportunities in the era of big data, we revisit feature selection research from a data perspective and review representative feature selection algorithms for conventional data, structured data, heterogeneous data and streaming data.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Face and Expression Recognition · Machine Learning and Data Classification
