Sparse clustering via the Deterministic Information Bottleneck algorithm

Efthymios Costa; Ioanna Papatsouma; Angelos Markos

arXiv:2601.20628·stat.ML·April 14, 2026

Sparse clustering via the Deterministic Information Bottleneck algorithm

Efthymios Costa, Ioanna Papatsouma, Angelos Markos

PDF

TL;DR

This paper introduces an information theoretic framework for sparse clustering that jointly performs feature weighting and clustering, effectively handling sparse data challenges.

Contribution

It presents a novel deterministic information bottleneck algorithm for sparse clustering, outperforming existing methods in simulations and real-world genomics data.

Findings

01

Effective in synthetic data simulations

02

Outperforms existing algorithms for sparse data

03

Successfully applied to genomics dataset

Abstract

Cluster analysis relates to the task of assigning objects into groups which ideally present some desirable characteristics. When a cluster structure is confined to a subset of the feature space, traditional clustering techniques face unprecedented challenges. We present an information theoretic framework that overcomes the problems associated with sparse data, allowing for joint feature weighting and clustering. Our proposal constitutes a competitive alternative to existing clustering algorithms for sparse data, as demonstrated through simulations on synthetic data. The effectiveness of our method is established by an application on a real-world genomics data set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.