TL;DR
This paper introduces Bag of Bags (BoB), a novel image representation for manuscript join retrieval that outperforms traditional Bag of Words methods by using fragment-specific vocabularies and set-to-set distance comparisons.
Contribution
The paper proposes a new adaptive visual vocabulary method, BoB, with a mass-weighted variant and a two-stage retrieval pipeline, improving accuracy and efficiency in manuscript fragment retrieval.
Findings
BoB (Chamfer) achieves 78% Hit@1, 84% MRR, outperforming baseline by 6.1%.
Mass-weighted BoB-OT provides formal bounds on approximation deviation.
Two-stage pipeline balances retrieval accuracy and computational cost.
Abstract
A join is a set of manuscript fragments identified as originally emanating from the same manuscript. We study manuscript join retrieval: Given a query image of a fragment, retrieve other fragments originating from the same physical manuscript. We propose Bag of Bags (BoB), an image-level representation that replaces the global-level visual codebook of classical Bag of Words (BoW) with a fragment-specific vocabulary of local visual words. Our pipeline trains a sparse convolutional autoencoder on binarized fragment patches, encodes connected components from each page, clusters the resulting embeddings with per-image k-means, and compares images using set-to-set distances between their local vocabularies. Evaluated on fragments from the Cairo Genizah, the best BoB variant (viz. Chamfer) achieves Hit@1 of 0.78 and MRR of 0.84, compared to 0.74 and 0.80, respectively, for the strongest BoW…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
