On the Reproducibility of Experiments of Indexing Repetitive Document Collections
Antonio Fari\~na, Miguel A. Mart\'inez-Prieto, Francisco Claude, and Gonzalo Navarro, Juan J. Lastra-D\'iaz, Nicola Prezza, Diego, Seco

TL;DR
This paper provides a detailed framework and reproducibility package for replicating experiments on indexing techniques for highly repetitive document collections, ensuring transparency and reproducibility of prior research.
Contribution
It introduces uiHRDC, a comprehensive framework and package enabling exact replication of indexing experiments on repetitive collections, enhancing reproducibility in this research area.
Findings
Framework facilitates exact replication of experiments
Provides detailed experimental setup and parameters
Includes reproducibility package for ease of use
Abstract
This work introduces a companion reproducible paper with the aim of allowing the exact replication of the methods, experiments, and results discussed in a previous work [5]. In that parent paper, we proposed many and varied techniques for compressing indexes which exploit that highly repetitive collections are formed mostly of documents that are near-copies of others. More concretely, we describe a replication framework, called uiHRDC (universal indexes for Highly Repetitive Document Collections), that allows our original experimental setup to be easily replicated using various document collections. The corresponding experimentation is carefully explained, providing precise details about the parameters that can be tuned for each indexing solution. Finally, note that we also provide uiHRDC as reproducibility package.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Video Analysis and Summarization · Image Retrieval and Classification Techniques
