Mixed Data Clustering Survey and Challenges

Guillaume Guerard; Sonia Djebali

arXiv:2512.03070·cs.LG·December 4, 2025

Mixed Data Clustering Survey and Challenges

Guillaume Guerard, Sonia Djebali

PDF

Open Access

TL;DR

This paper surveys the challenges of mixed-data clustering in big data, introduces a new pretopological clustering method, and compares it with existing algorithms to evaluate its effectiveness and interpretability.

Contribution

It presents a novel pretopological clustering approach tailored for mixed data and provides benchmarking insights against classical and existing methods.

Findings

01

The proposed method shows competitive performance on mixed data.

02

Hierarchical and explainable algorithms enhance interpretability.

03

Benchmarking reveals strengths and limitations of the new approach.

Abstract

The advent of the big data paradigm has transformed how industries manage and analyze information, ushering in an era of unprecedented data volume, velocity, and variety. Within this landscape, mixed-data clustering has become a critical challenge, requiring innovative methods that can effectively exploit heterogeneous data types, including numerical and categorical variables. Traditional clustering techniques, typically designed for homogeneous datasets, often struggle to capture the additional complexity introduced by mixed data, underscoring the need for approaches specifically tailored to this setting. Hierarchical and explainable algorithms are particularly valuable in this context, as they provide structured, interpretable clustering results that support informed decision-making. This paper introduces a clustering method grounded in pretopological spaces. In addition, benchmarking…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Stochastic Gradient Optimization Techniques