HBIC: A Biclustering Algorithm for Heterogeneous Datasets

Ad\'an Jos\'e-Garc\'ia; Julie Jacques; Cl\'ement Chauvet; Vincent; Sobanski; Clarisse Dhaenens

arXiv:2408.13217·cs.LG·August 26, 2024

HBIC: A Biclustering Algorithm for Heterogeneous Datasets

Ad\'an Jos\'e-Garc\'ia, Julie Jacques, Cl\'ement Chauvet, Vincent, Sobanski, Clarisse Dhaenens

PDF

Open Access 1 Repo

TL;DR

HBIC is a novel biclustering algorithm designed for heterogeneous datasets with mixed data types, enabling the discovery of meaningful biclusters in complex real-world data.

Contribution

The paper introduces HBIC, a biclustering method capable of handling numeric, binary, and categorical data, with a two-stage process for bicluster generation and selection.

Findings

01

Successfully identified high-quality biclusters in synthetic benchmarks.

02

Demonstrated effectiveness on biomedical clinical data.

03

Outperformed existing biclustering approaches on heterogeneous datasets.

Abstract

Biclustering is an unsupervised machine-learning approach aiming to cluster rows and columns simultaneously in a data matrix. Several biclustering algorithms have been proposed for handling numeric datasets. However, real-world data mining problems often involve heterogeneous datasets with mixed attributes. To address this challenge, we introduce a biclustering approach called HBIC, capable of discovering meaningful biclusters in complex heterogeneous data, including numeric, binary, and categorical data. The approach comprises two stages: bicluster generation and bicluster model selection. In the initial stage, several candidate biclusters are generated iteratively by adding and removing rows and columns based on the frequency of values in the original matrix. In the second stage, we introduce two approaches for selecting the most suitable biclusters by considering their size and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

clementchauvet/py-hbic
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications