Position Heaps for Cartesian-tree Matching on Strings and Tries

Akio Nishimoto; Noriki Fujisato; Yuto Nakashima; Shunsuke Inenaga

arXiv:2106.01595·cs.DS·August 17, 2021

Position Heaps for Cartesian-tree Matching on Strings and Tries

Akio Nishimoto, Noriki Fujisato, Yuto Nakashima, Shunsuke Inenaga

PDF

Open Access

TL;DR

This paper introduces the Cartesian-tree Position Heap (CPH), an efficient indexing structure for pattern matching based on Cartesian trees in strings and tries, with optimized query and construction times.

Contribution

The paper presents the novel CPH data structure for Cartesian-tree pattern matching on strings and tries, improving query efficiency and providing construction algorithms.

Findings

01

Supports pattern matching in O(m (σ + log(min{h,m})) + occ) time for strings.

02

Supports pattern matching in O(m (σ^2 + log(min{h,m})) + occ) time for tries.

03

Constructs CPH in O(n log σ) time for strings and O(N σ) time for tries.

Abstract

The Cartesian-tree pattern matching is a recently introduced scheme of pattern matching that detects fragments in a sequential data stream which have a similar structure as a query pattern. Formally, Cartesian-tree pattern matching seeks all substrings $S^{'}$ of the text string $S$ such that the Cartesian tree of $S^{'}$ and that of a query pattern $P$ coincide. In this paper, we present a new indexing structure for this problem called the Cartesian-tree Position Heap (CPH). Let $n$ be the length of the input text string $S$ , $m$ the length of a query pattern $P$ , and $σ$ the alphabet size. We show that the CPH of $S$ , denoted $CPH (S)$ , supports pattern matching queries in $O (m (σ + lo g (min {h, m})) + occ)$ time with $O (n)$ space, where $h$ is the height of the CPH and $occ$ is the number of pattern occurrences. We show how to build $CPH (S)$ in $O(n \log…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · DNA and Biological Computing