Unsupervised Feature Construction for Improving Data Representation and   Semantics

Marian-Andrei Rizoiu; Julien Velcin; St\'ephane Lallich

arXiv:1512.05467·cs.AI·December 18, 2015

Unsupervised Feature Construction for Improving Data Representation and Semantics

Marian-Andrei Rizoiu, Julien Velcin, St\'ephane Lallich

PDF

TL;DR

This paper introduces two unsupervised algorithms for constructing human-understandable, more informative features by combining initial features, reducing correlations, and capturing hidden relations in datasets, thereby improving data representation.

Contribution

It presents novel unsupervised methods for creating interpretable feature conjunctions that enhance data description and reveal hidden relations, balancing informativeness and complexity.

Findings

01

Generated feature sets have lower correlations.

02

Features capture hidden relations in data.

03

Approaches work across multiple datasets.

Abstract

Feature-based format is the main data representation format used by machine learning algorithms. When the features do not properly describe the initial data, performance starts to degrade. Some algorithms address this problem by internally changing the representation space, but the newly-constructed features are rarely comprehensible. We seek to construct, in an unsupervised way, new features that are more appropriate for describing a given dataset and, at the same time, comprehensible for a human user. We propose two algorithms that construct the new features as conjunctions of the initial primitive features or their negations. The generated feature sets have reduced correlations between features and succeed in catching some of the hidden relations between individuals in a dataset. For example, a feature like $s k y \land \neg b u i l d in g \land p an or ama$ would be true for non-urban images…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.