# Contraction Clustering (RASTER): A Very Fast Big Data Algorithm for Sequential and Parallel Density-Based Clustering in Linear Time, Constant Memory, and a Single Pass

**Authors:** Gregor Ulm, Simon Smith, Adrian Nilsson, Emil Gustavsson, Mats Jirstrand

arXiv: 1907.03620 · 2026-01-27

## TL;DR

RASTER is a novel, highly efficient density-based clustering algorithm designed for big data, offering linear time complexity, constant memory use, and single-pass processing, suitable for both sequential and parallel execution.

## Contribution

It introduces a new clustering method that significantly improves speed and memory efficiency for large datasets compared to existing algorithms.

## Key findings

- RASTER outperforms standard clustering algorithms in speed.
- Parallel implementation achieves near-linear speedup with multiple cores.
- Processes 500 million points with 1 million clusters in under 50 seconds.

## Abstract

Clustering is an essential data mining tool for analyzing and grouping similar objects. In big data applications, however, many clustering algorithms are infeasible due to their high memory requirements and/or unfavorable runtime complexity. In contrast, Contraction Clustering (RASTER) is a single-pass algorithm for identifying density-based clusters with linear time complexity. Due to its favorable runtime and the fact that its memory requirements are constant, this algorithm is highly suitable for big data applications where the amount of data to be processed is huge. It consists of two steps: (1) a contraction step which projects objects onto tiles and (2) an agglomeration step which groups tiles into clusters. This algorithm is extremely fast in both sequential and parallel execution. Our quantitative evaluation shows that a sequential implementation of RASTER performs significantly better than various standard clustering algorithms. Furthermore, the parallel speedup is significant: on a contemporary workstation, an implementation in Rust processes a batch of 500 million points with 1 million clusters in less than 50 seconds on one core. With 8 cores, the algorithm is about four times faster.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.03620/full.md

## Figures

18 figures with captions in the complete paper: https://tomesphere.com/paper/1907.03620/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/1907.03620/full.md

---
Source: https://tomesphere.com/paper/1907.03620