# Size Matters: Cardinality-Constrained Clustering and Outlier Detection   via Conic Optimization

**Authors:** Napat Rujeerapaiboon, Kilian Schindler, Daniel Kuhn, Wolfram Wiesemann

arXiv: 1705.07837 · 2019-01-11

## TL;DR

This paper introduces a novel approach to clustering that incorporates outlier detection and cardinality constraints using conic optimization, improving robustness and balance in clustering results.

## Contribution

It formulates a joint outlier detection and clustering problem as a MILP with relaxations and deterministic rounding, providing optimal solutions under certain conditions.

## Key findings

- Proposes a MILP formulation for combined outlier detection and clustering.
- Develops relaxations and rounding schemes with optimality guarantees.
- Addresses cluster balance and outlier sensitivity issues in K-means.

## Abstract

Plain vanilla K-means clustering has proven to be successful in practice, yet it suffers from outlier sensitivity and may produce highly unbalanced clusters. To mitigate both shortcomings, we formulate a joint outlier detection and clustering problem, which assigns a prescribed number of datapoints to an auxiliary outlier cluster and performs cardinality-constrained K-means clustering on the residual dataset, treating the cluster cardinalities as a given input. We cast this problem as a mixed-integer linear program (MILP) that admits tractable semidefinite and linear programming relaxations. We propose deterministic rounding schemes that transform the relaxed solutions to feasible solutions for the MILP. We also prove that these solutions are optimal in the MILP if a cluster separation condition holds.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.07837/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/1705.07837/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/1705.07837/full.md

---
Source: https://tomesphere.com/paper/1705.07837