# Clustering without Over-Representation

**Authors:** Sara Ahmadian, Alessandro Epasto, Ravi Kumar, Mohammad Mahdian

arXiv: 1905.12753 · 2019-05-31

## TL;DR

This paper introduces algorithms for clustering data points with color labels, ensuring no over-representation of any color in clusters, with proven guarantees and effective real-world performance.

## Contribution

It presents new algorithms with provable guarantees for constrained clustering that prevents color over-representation, including a linear programming approach and a simpler combinatorial method.

## Key findings

- Algorithms effectively prevent color over-representation in clusters.
- Proven performance guarantees for both general and special cases.
- Successful experiments on real-world data demonstrate practical effectiveness.

## Abstract

In this paper we consider clustering problems in which each point is endowed with a color. The goal is to cluster the points to minimize the classical clustering cost but with the additional constraint that no color is over-represented in any cluster. This problem is motivated by practical clustering settings, e.g., in clustering news articles where the color of an article is its source, it is preferable that no single news source dominates any cluster.   For the most general version of this problem, we obtain an algorithm that has provable guarantees of performance; our algorithm is based on finding a fractional solution using a linear program and rounding the solution subsequently. For the special case of the problem where no color has an absolute majority in any cluster, we obtain a simpler combinatorial algorithm also with provable guarantees. Experiments on real-world data shows that our algorithms are effective in finding good clustering without over-representation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.12753/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1905.12753/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1905.12753/full.md

---
Source: https://tomesphere.com/paper/1905.12753