# Algorithms to compute the Burrows-Wheeler Similarity Distribution

**Authors:** Felipe A. Louza, Guilherme P. Telles, Simon Gog, Liang Zhao

arXiv: 1903.10583 · 2020-09-10

## TL;DR

This paper introduces two novel algorithms that efficiently compute similarity measures between strings using the Burrows-Wheeler transform, improving speed and memory usage for large collections.

## Contribution

The authors present practical and theoretical improvements for computing the Burrows-Wheeler similarity distribution across all string pairs in a collection.

## Key findings

- Algorithms reduce computation time for large datasets.
- Memory footprint is minimized through compressed data structures.
- Experimental results demonstrate efficiency on real and artificial data.

## Abstract

The Burrows-Wheeler transform (BWT) is a well studied text transformation widely used in data compression and text indexing. The BWT of two strings can also provide similarity measures between them, based on the observation that the more their symbols are intermixed in the transformation, the more the strings are similar. In this article we present two new algorithms to compute similarity measures based on the BWT for string collections. In particular, we present practical and theoretical improvements to the computation of the Burrows-Wheeler similarity distribution for all pairs of strings in a collection. Our algorithms take advantage of the BWT computed for the concatenation of all strings, and use compressed data structures that allow reducing the running time with a small memory footprint, as shown by a set of experiments with real and artificial datasets.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.10583/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1903.10583/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/1903.10583/full.md

---
Source: https://tomesphere.com/paper/1903.10583