A Hash-based Co-Clustering Algorithm for Categorical Data

Fabricio Olivetti de Fran\c{c}a

arXiv:1407.7753·cs.LG·July 30, 2014

A Hash-based Co-Clustering Algorithm for Categorical Data

Fabricio Olivetti de Fran\c{c}a

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel hash-based co-clustering algorithm for categorical data that efficiently finds meaningful clusters by leveraging Locality Sensitive Hashing, addressing challenges of feature importance and multiple cluster interpretations.

Contribution

The paper presents a new co-clustering method using Locality Sensitive Hashing to improve clustering quality and scalability for categorical data.

Findings

01

Capable of finding high-quality co-clusters across various datasets

02

Scales linearly with dataset size

03

Effective in handling feature importance and multiple cluster interpretations

Abstract

Many real-life data are described by categorical attributes without a pre-classification. A common data mining method used to extract information from this type of data is clustering. This method group together the samples from the data that are more similar than all other samples. But, categorical data pose a challenge when extracting information because: the calculation of two objects similarity is usually done by measuring the number of common features, but ignore a possible importance weighting; if the data may be divided differently according to different subsets of the features, the algorithm may find clusters with different meanings from each other, difficulting the post analysis. Data Co-Clustering of categorical data is the technique that tries to find subsets of samples that share a subset of features in common. By doing so, not only a sample may belong to more than one…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

folivetti/HBLCoClust
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Face and Expression Recognition