Fine-Grained Complexity of Analyzing Compressed Data: Quantifying Improvements over Decompress-And-Solve
Amir Abboud, Arturs Backurs, Karl Bringmann, Marvin K\"unnemann

TL;DR
This paper investigates the complexity of analyzing compressed data directly, establishing bounds and optimality results for various problems under common compression schemes, and introduces a framework for conditional lower bounds.
Contribution
It provides a unified framework for proving lower bounds on analyzing compressed data and determines the optimality of decompress-and-solve for several fundamental problems.
Findings
LCS and Pattern Matching with Wildcards bounds are optimal under ETH.
Decompress-and-solve is optimal for Grammar Parsing and RNA Folding under the k-Clique conjecture.
Decompress-and-solve is not optimal for Disjointness, with a new algorithm.
Abstract
Can we analyze data without decompressing it? As our data keeps growing, understanding the time complexity of problems on compressed inputs, rather than in convenient uncompressed forms, becomes more and more relevant. Suppose we are given a compression of size of data that originally has size , and we want to solve a problem with time complexity . The naive strategy of "decompress-and-solve" gives time , whereas "the gold standard" is time : to analyze the compression as efficiently as if the original data was small. We restrict our attention to data in the form of a string (text, files, genomes, etc.) and study the most ubiquitous tasks. While the challenge might seem to depend heavily on the specific compression scheme, most methods of practical relevance (Lempel-Ziv-family, dictionary methods, and others) can be unified under the elegant notion of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
