A compressed dynamic self-index for highly repetitive text collections
Takaaki Nishimoto, Yoshimasa Takabatake, Yasuo Tabei

TL;DR
This paper introduces TST-index, a new compressed dynamic self-index for highly repetitive texts that significantly improves pattern search speed while supporting dynamic updates.
Contribution
The paper presents the first compressed dynamic self-index that combines fast pattern search with dynamic update capabilities for highly repetitive texts.
Findings
Pattern search speed is significantly improved with TST-index.
TST-index supports dynamic updates in highly repetitive texts.
Experimental results demonstrate superior performance on benchmark datasets.
Abstract
We present a novel compressed dynamic self-index for highly repetitive text collections. Signature encoding is a compressed dynamic self-index for highly repetitive texts and has a large disadvantage that the pattern search for short patterns is slow. We improve this disadvantage for faster pattern search by leveraging an idea behind truncated suffix tree and present the first compressed dynamic self-index named TST-index that supports not only fast pattern search but also dynamic update operation of index for highly repetitive texts. Experiments using a benchmark dataset of highly repetitive texts show that the pattern search of TST-index is significantly improved.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · Natural Language Processing Techniques
