SparCA: Sparse Compressed Agglomeration for Feature Extraction and   Dimensionality Reduction

Leland Barnard; Farwa Ali; Hugo Botha; David T. Jones

arXiv:2302.10776·cs.LG·February 22, 2023

SparCA: Sparse Compressed Agglomeration for Feature Extraction and Dimensionality Reduction

Leland Barnard, Farwa Ali, Hugo Botha, David T. Jones

PDF

Open Access 1 Repo

TL;DR

SparCA is a new hierarchical feature grouping and compression method that produces interpretable features across diverse data types, performing well on supervised tasks without hyperparameter tuning.

Contribution

Introduces SparCA, a novel hyperparameter-free dimensionality reduction technique that generalizes across data types and enhances interpretability and performance.

Findings

01

Effective on synthetic and real-world datasets

02

Produces highly interpretable features

03

Achieves strong supervised learning performance

Abstract

The most effective dimensionality reduction procedures produce interpretable features from the raw input space while also providing good performance for downstream supervised learning tasks. For many methods, this requires optimizing one or more hyperparameters for a specific task, which can limit generalizability. In this study we propose sparse compressed agglomeration (SparCA), a novel dimensionality reduction procedure that involves a multistep hierarchical feature grouping, compression, and feature selection process. We demonstrate the characteristics and performance of the SparCA method across heterogenous synthetic and real-world datasets, including images, natural language, and single cell gene expression data. Our results show that SparCA is applicable to a wide range of data types, produces highly interpretable features, and shows compelling performance on downstream…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Neurology-AI-Program/sparca
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Single-cell and spatial transcriptomics · COVID-19 diagnosis using AI

MethodsPrincipal Components Analysis · Feature Selection