# Semi-supervised cross-entropy clustering with information bottleneck   constraint

**Authors:** Marek \'Smieja, Bernhard C. Geiger

arXiv: 1705.01601 · 2017-11-15

## TL;DR

This paper introduces CEC-IB, a semi-supervised clustering method that combines cross-entropy clustering with the information bottleneck, effectively balancing model accuracy, simplicity, and consistency with partial user-provided labels.

## Contribution

The paper presents a novel semi-supervised clustering algorithm, CEC-IB, that integrates cross-entropy clustering with the information bottleneck, offering robustness, automatic cluster number determination, and applicability to hierarchical data.

## Key findings

- CEC-IB performs comparably to Gaussian mixture models in semi-supervised tasks.
- CEC-IB is faster and more robust to noisy labels.
- CEC-IB can discover natural subgroups using hierarchical side information.

## Abstract

In this paper, we propose a semi-supervised clustering method, CEC-IB, that models data with a set of Gaussian distributions and that retrieves clusters based on a partial labeling provided by the user (partition-level side information). By combining the ideas from cross-entropy clustering (CEC) with those from the information bottleneck method (IB), our method trades between three conflicting goals: the accuracy with which the data set is modeled, the simplicity of the model, and the consistency of the clustering with side information. Experiments demonstrate that CEC-IB has a performance comparable to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but is faster, more robust to noisy labels, automatically determines the optimal number of clusters, and performs well when not all classes are present in the side information. Moreover, in contrast to other semi-supervised models, it can be successfully applied in discovering natural subgroups if the partition-level side information is derived from the top levels of a hierarchical clustering.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.01601/full.md

## Figures

71 figures with captions in the complete paper: https://tomesphere.com/paper/1705.01601/full.md

## References

49 references — full list in the complete paper: https://tomesphere.com/paper/1705.01601/full.md

---
Source: https://tomesphere.com/paper/1705.01601