Efficient tree-structured categorical retrieval
Djamal Belazzougui, Gregory Kucherov

TL;DR
This paper introduces efficient data structures and algorithms for fast retrieval of documents within a category tree based on pattern matching, optimizing for space and query time in hierarchical classification systems.
Contribution
It presents novel solutions for categorical document retrieval in tree-structured categories, balancing space efficiency and query performance.
Findings
Achieves $O(|p|+t)$ query time with space proportional to document length and category tree size.
Provides alternative solutions with reduced space at the cost of slightly increased query time.
Offers practical methods for hierarchical document retrieval in taxonomy-based systems.
Abstract
We study a document retrieval problem in the new framework where text documents are organized in a {\em category tree} with a pre-defined number of categories. This situation occurs e.g. with taxomonic trees in biology or subject classification systems for scientific literature. Given a string pattern and a category (level in the category tree), we wish to efficiently retrieve the \emph{categorical units} containing this pattern and belonging to the category. We propose several efficient solutions for this problem. One of them uses bits of space and query time, where is the total length of the documents, the size of the alphabet used in the documents and is the total number of nodes in the category tree. Another solution uses bits of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Image and Video Retrieval Techniques · Machine Learning and Algorithms
