Unsupervised Feature Construction for Improving Data Representation and Semantics
Marian-Andrei Rizoiu, Julien Velcin, St\'ephane Lallich

TL;DR
This paper introduces two unsupervised algorithms for constructing human-understandable, more informative features by combining initial features, reducing correlations, and capturing hidden relations in datasets, thereby improving data representation.
Contribution
It presents novel unsupervised methods for creating interpretable feature conjunctions that enhance data description and reveal hidden relations, balancing informativeness and complexity.
Findings
Generated feature sets have lower correlations.
Features capture hidden relations in data.
Approaches work across multiple datasets.
Abstract
Feature-based format is the main data representation format used by machine learning algorithms. When the features do not properly describe the initial data, performance starts to degrade. Some algorithms address this problem by internally changing the representation space, but the newly-constructed features are rarely comprehensible. We seek to construct, in an unsupervised way, new features that are more appropriate for describing a given dataset and, at the same time, comprehensible for a human user. We propose two algorithms that construct the new features as conjunctions of the initial primitive features or their negations. The generated feature sets have reduced correlations between features and succeed in catching some of the hidden relations between individuals in a dataset. For example, a feature like would be true for non-urban images…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
