INSTRUCT: Space-Efficient Structure for Indexing and Complete Query Management of String Databases
Sourav Dutta, Arnab Bhattacharya

TL;DR
INSTRUCT is a space-efficient data structure designed for indexing and managing large string databases, supporting various search types with improved space and query efficiency.
Contribution
The paper introduces INSTRUCT, a novel, space-efficient structure that supports comprehensive string queries and dynamic updates, outperforming existing solutions in space and speed.
Findings
INSTRUCT reduces memory usage by nearly 50% compared to existing structures.
It achieves faster query times for prefix, suffix, and substring searches.
Supports insertion and deletion of strings efficiently.
Abstract
The tremendous expanse of search engines, dictionary and thesaurus storage, and other text mining applications, combined with the popularity of readily available scanning devices and optical character recognition tools, has necessitated efficient storage, retrieval and management of massive text databases for various modern applications. For such applications, we propose a novel data structure, INSTRUCT, for efficient storage and management of sequence databases. Our structure uses bit vectors for reusing the storage space for common triplets, and hence, has a very low memory requirement. INSTRUCT efficiently handles prefix and suffix search queries in addition to the exact string search operation by iteratively checking the presence of triplets. We also propose an extension of the structure to handle substring search efficiently, albeit with an increase in the space requirements. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · DNA and Biological Computing
