Nonparametric Clustering of Mixed Data Using Modified Chi-square Tests
Yawen Xu, Xin Gao, Xiaogang Wang

TL;DR
This paper introduces a non-parametric clustering method for mixed continuous and discrete data using modified Chi-square tests, avoiding the need for a global distance function, and demonstrates superior performance over existing methods.
Contribution
The paper presents a novel non-parametric clustering approach for mixed data that leverages local neighborhood analysis and modified Chi-square tests, without requiring a global distance metric.
Findings
Outperforms AutoClass in simulation studies
Effective for various data settings
No need for a global distance function
Abstract
We propose a non-parametric method to cluster mixed data containing both continuous and discrete random variables. The product space of continuous and categorical sample spaces is approximated locally by analyzing neighborhoods with cluster patterns. Detection of cluster patterns on the product space is determined by using a modified Chi-square test. The proposed method does not impose a global distance function which could be difficult to specify in practice. Results from simulation studies have shown that our proposed methods out-performed the benchmark method, AutoClass, for various settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Data Mining Algorithms and Applications
