The Complexity of the Co-Occurrence Problem
Philip Bille, Inge Li G{\o}rtz, Tord Stordalen

TL;DR
This paper introduces a new parameterized approach to the co-occurrence problem, providing optimal space and time bounds, and simplifies existing solutions with intuitive combinatorial methods.
Contribution
It presents a simple, optimal data structure for the co-occurrence problem based on a new parameter, improving understanding and efficiency over prior work.
Findings
The data structure uses O(d) space with O(log log n) query time.
O(d) space is proven to be optimal for the problem.
The bounds match the state of the art, with tight space complexity.
Abstract
Let be a string of length over an alphabet and let be a subset of of size . The 'co-occurrence problem' is to construct a compact data structure that supports the following query: given an integer return the number of length- substrings of that contain each character of at least once. This is a natural string problem with applications to, e.g., data mining, natural language processing, and DNA analysis. The state of the art is an space data structure that -- with some minor additions -- supports queries in time [CPM 2021]. Our contributions are as follows. Firstly, we analyze the problem in terms of a new, natural parameter , giving a simple data structure that uses space and supports queries in time. The preprocessing algorithm does a single pass over , runs in expected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · DNA and Biological Computing
