Bag of Bags: Adaptive Visual Vocabularies for Genizah Join Image Retrieval

Sharva Gogawale; Gal Grudka; Daria Vasyutinsky-Shapira; Omer Ventura; Berat Kurar-Barakat; Nachum Dershowitz

arXiv:2604.08138·cs.CV·April 14, 2026

Bag of Bags: Adaptive Visual Vocabularies for Genizah Join Image Retrieval

Sharva Gogawale, Gal Grudka, Daria Vasyutinsky-Shapira, Omer Ventura, Berat Kurar-Barakat, Nachum Dershowitz

PDF

1 Repo

TL;DR

This paper introduces Bag of Bags (BoB), a novel image representation for manuscript join retrieval that outperforms traditional Bag of Words methods by using fragment-specific vocabularies and set-to-set distance comparisons.

Contribution

The paper proposes a new adaptive visual vocabulary method, BoB, with a mass-weighted variant and a two-stage retrieval pipeline, improving accuracy and efficiency in manuscript fragment retrieval.

Findings

01

BoB (Chamfer) achieves 78% Hit@1, 84% MRR, outperforming baseline by 6.1%.

02

Mass-weighted BoB-OT provides formal bounds on approximation deviation.

03

Two-stage pipeline balances retrieval accuracy and computational cost.

Abstract

A join is a set of manuscript fragments identified as originally emanating from the same manuscript. We study manuscript join retrieval: Given a query image of a fragment, retrieve other fragments originating from the same physical manuscript. We propose Bag of Bags (BoB), an image-level representation that replaces the global-level visual codebook of classical Bag of Words (BoW) with a fragment-specific vocabulary of local visual words. Our pipeline trains a sparse convolutional autoencoder on binarized fragment patches, encodes connected components from each page, clusters the resulting embeddings with per-image k-means, and compares images using set-to-set distances between their local vocabularies. Evaluated on fragments from the Cairo Genizah, the best BoB variant (viz. Chamfer) achieves Hit@1 of 0.78 and MRR of 0.84, compared to 0.74 and 0.80, respectively, for the strongest BoW…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TAU-CH/midrash_bob
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.