Large-scale Online Feature Selection for Ultra-high Dimensional Sparse Data
Yue Wu, Steven C. H. Hoi, Tao Mei, Nenghai Yu

TL;DR
This paper introduces a second-order online feature selection method that is highly scalable and efficient for ultra-high dimensional sparse data streams, significantly outperforming existing approaches in speed and efficacy.
Contribution
The paper presents a novel second-order online feature selection algorithm using a MaxHeap approach, improving efficiency and scalability over existing methods for large-scale sparse data.
Findings
Successfully processed a 1-billion-dimensional dataset in 8 minutes
Outperformed traditional batch methods in speed by orders of magnitude
Demonstrated effectiveness on synthetic datasets with extreme sparsity
Abstract
Feature selection with large-scale high-dimensional data is important yet very challenging in machine learning and data mining. Online feature selection is a promising new paradigm that is more efficient and scalable than batch feature section methods, but the existing online approaches usually fall short in their inferior efficacy as compared with batch approaches. In this paper, we present a novel second-order online feature selection scheme that is simple yet effective, very fast and extremely scalable to deal with large-scale ultra-high dimensional sparse data streams. The basic idea is to improve the existing first-order online feature selection methods by exploiting second-order information for choosing the subset of important features with high confidence weights. However, unlike many second-order learning methods that often suffer from extra high computational cost, we devise a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Machine Learning in Bioinformatics · Gene expression and cancer classification
