# Semi-supervised model-based clustering with controlled clusters leakage

**Authors:** Marek \'Smieja, {\L}ukasz Struski, Jacek Tabor

arXiv: 1705.01877 · 2017-05-05

## TL;DR

This paper introduces C3L, a semi-supervised Gaussian mixture model that incorporates user-defined leakage control to effectively discover natural subgroups in partially labeled data, enhancing clustering accuracy.

## Contribution

The paper presents a novel semi-supervised clustering model with leakage control, along with theoretical analysis and an efficient optimization algorithm.

## Key findings

- C3L effectively identifies meaningful clusters in partially labeled data.
- The model's leakage parameter allows flexible control over clustering consistency.
- Experimental results demonstrate high-quality clustering performance.

## Abstract

In this paper, we focus on finding clusters in partially categorized data sets. We propose a semi-supervised version of Gaussian mixture model, called C3L, which retrieves natural subgroups of given categories. In contrast to other semi-supervised models, C3L is parametrized by user-defined leakage level, which controls maximal inconsistency between initial categorization and resulting clustering. Our method can be implemented as a module in practical expert systems to detect clusters, which combine expert knowledge with true distribution of data. Moreover, it can be used for improving the results of less flexible clustering techniques, such as projection pursuit clustering. The paper presents extensive theoretical analysis of the model and fast algorithm for its efficient optimization. Experimental results show that C3L finds high quality clustering model, which can be applied in discovering meaningful groups in partially classified data.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.01877/full.md

## Figures

29 figures with captions in the complete paper: https://tomesphere.com/paper/1705.01877/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/1705.01877/full.md

---
Source: https://tomesphere.com/paper/1705.01877