# K-means Algorithm over Compressed Binary Data

**Authors:** Elsa Dupraz

arXiv: 1701.03403 · 2018-01-18

## TL;DR

This paper introduces a method to perform K-means clustering directly on compressed binary sensor data, reducing transmission costs while maintaining clustering accuracy, by analyzing error probabilities and validating with simulations.

## Contribution

It proposes applying K-means directly on compressed data without decoding, providing error probability approximations and demonstrating effective clustering with lower coding rates.

## Key findings

- K-means can be accurately performed in the compressed domain.
- Applying K-means directly on compressed data reduces transmission rates.
- The method maintains clustering accuracy comparable to original data clustering.

## Abstract

We consider a network of binary-valued sensors with a fusion center. The fusion center has to perform K-means clustering on the binary data transmitted by the sensors. In order to reduce the amount of data transmitted within the network, the sensors compress their data with a source coding scheme based on binary sparse matrices. We propose to apply the K-means algorithm directly over the compressed data without reconstructing the original sensors measurements, in order to avoid potentially complex decoding operations. We provide approximated expressions of the error probabilities of the K-means steps in the compressed domain. From these expressions, we show that applying the K-means algorithm in the compressed domain enables to recover the clusters of the original domain. Monte Carlo simulations illustrate the accuracy of the obtained approximated error probabilities, and show that the coding rate needed to perform K-means clustering in the compressed domain is lower than the rate needed to reconstruct all the measurements.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1701.03403/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1701.03403/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/1701.03403/full.md

---
Source: https://tomesphere.com/paper/1701.03403