# Information theoretical clustering is hard to approximate

**Authors:** Ferdinando Cicalese, Eduardo Laber

arXiv: 1812.07075 · 2019-11-19

## TL;DR

This paper proves that clustering based on the Entropy impurity measure is computationally hard to approximate, even under simplified conditions, highlighting fundamental limitations in impurity-based clustering algorithms.

## Contribution

It establishes the inapproximability of impurity measure-based clustering with Entropy, resolving an open problem in the field.

## Key findings

- No PTAS exists for entropy impurity clustering with vectors of same  norm.
- The result applies even when all vectors have identical  norms.
- This work advances understanding of the computational complexity of impurity-based clustering.

## Abstract

An impurity measures $I: \mathbb{R}^d \mapsto \mathbb{R}^+$ is a function that assigns a $d$-dimensional vector ${\bf v}$ to a non-negative value $I({\bf v})$ so that the more homogeneous ${\bf v}$, with respect to the values of its coordinates, the larger its impurity. A well known example of impurity measures is the Entropy impurity.   We study the problem of clustering based on impurity measures. Let $V$ be a collection of $n$ many $d$-dimensional vectors with non-negative components. Given $V$ and an impurity measure $I$, the goal is to find a partition ${\mathcal P}$ of $V$ into $k$ groups $V_1,\ldots,V_k$ so as to minimize the sum of the impurities of the groups in ${\cal P}$, i.e., $I({\cal P})= \sum_{i=1}^{k} I\bigg(\sum_{ {\bf v} \in V_i} {\bf v} \bigg).$   Impurity minimization has been widely used as quality assessment measure in probability distribution clustering (KL-divergence) as well as in categorical clustering. However, in contrast to the case of metric based clustering, the current knowledge of impurity measure based clustering in terms of approximation and inapproximability results is very limited. Here, we contribute to change this scenario by proving that for the Entropy impurity measure the problem does not admit a PTAS even when all vectors have the same $\ell_1$ norm. This result solves a question that remained open in previous work on this topic [Chaudhuri and McGregor COLT 08; Ackermann et. al. ECCC 11].

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.07075/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1812.07075/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/1812.07075/full.md

---
Source: https://tomesphere.com/paper/1812.07075