A principle feature analysis
Tim Breitenbach, Lauritz Rasbach, Chunguang Liang, Patrick Jahnke

TL;DR
This paper introduces a framework for identifying and reducing features by detecting dependencies, leading to simpler, noise-reduced models that improve prediction accuracy and interpretability in data science applications.
Contribution
The work presents a novel method to detect linear and non-linear dependencies among features, enabling effective model reduction and improved modeling across various domains.
Findings
Reduced features from 2154 to 161 in data center classification, with improved accuracy.
Identified 9 key genes from 9513 in gene expression data to distinguish cell clusters.
Framework structures feature dependencies, aiding classical and machine learning models.
Abstract
A key task of data science is to identify relevant features linked to certain output variables that are supposed to be modeled or predicted. To obtain a small but meaningful model, it is important to find stochastically independent variables capturing all the information necessary to model or predict the output variables sufficiently. Therefore, we introduce in this work a framework to detect linear and non-linear dependencies between different features. As we will show, features that are actually functions of other features do not represent further information. Consequently, a model reduction neglecting such features conserves the relevant information, reduces noise and thus improves the quality of the model. Furthermore, a smaller model makes it easier to adopt a model of a given system. In addition, the approach structures dependencies within all the considered features. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Gene Regulatory Network Analysis · Gene expression and cancer classification
