Compressed Communication Complexity of Hamming Distance
Shiori Mitsuya, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai,, Masayuki Takeda

TL;DR
This paper introduces a randomized protocol for computing the Hamming distance between strings compressed with LZ77, utilizing advanced factorization techniques, and explores properties of LZ77 compression sizes.
Contribution
It presents a novel randomized protocol for Hamming distance in LZ77 compressed strings and analyzes the non-monotonicity of LZ77 compression sizes.
Findings
Protocol effectively computes Hamming distance from compressed data.
LZ77 compression size can increase when prefixes are removed.
Analysis uses Crochemore's C-factorization and Rytter's AVL-grammar.
Abstract
We consider the communication complexity of the Hamming distance of two strings. Bille et al. [SPIRE 2018] considered the communication complexity of the longest common prefix (LCP) problem in the setting where the two parties have their strings in a compressed form, i.e., represented by the Lempel-Ziv 77 factorization (LZ77) with/without self-references. We present a randomized public-coin protocol for a joint computation of the Hamming distance of two strings represented by LZ77 without self-references. While our scheme is heavily based on Bille et al.'s LCP protocol, our complexity analysis is original which uses Crochemore's C-factorization and Rytter's AVL-grammar. As a byproduct, we also show that LZ77 with/without self-references are not monotonic in the sense that their sizes can increase by a factor of 4/3 when a prefix of the string is removed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · DNA and Biological Computing
