Online Feature Screening for Data Streams with Concept Drift

Mingyuan Wang; Adrian Barbu

arXiv:2104.02883·stat.ML·April 8, 2021

Online Feature Screening for Data Streams with Concept Drift

Mingyuan Wang, Adrian Barbu

PDF

TL;DR

This paper introduces online feature screening methods capable of handling high-dimensional, streaming data with sparsity and concept drift, demonstrating faster processing and improved feature detection in classification tasks.

Contribution

The proposed online screening methods are novel in their ability to handle streaming data with concept drift, offering comparable feature importance to offline methods with enhanced efficiency.

Findings

01

Methods match offline feature importance with faster speed.

02

Online screening with model adaptation detects true features better.

03

Advantages include reduced computation, storage, and improved accuracy.

Abstract

Screening feature selection methods are often used as a preprocessing step for reducing the number of variables before training step. Traditional screening methods only focus on dealing with complete high dimensional datasets. Modern datasets not only have higher dimension and larger sample size, but also have properties such as streaming input, sparsity and concept drift. Therefore a considerable number of online feature selection methods were introduced to handle these kind of problems in recent years. Online screening methods are one of the categories of online feature selection methods. The methods that we proposed in this research are capable of handling all three situations mentioned above. Our research study focuses on classification datasets. Our experiments show proposed methods can generate the same feature importance as their offline version with faster speed and less storage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFeature Selection