Image-text matching for large-scale book collections
Artemis Llabr\'es, Arka Ujjal Dey, Dimosthenis Karatzas, and Ernest Valveny

TL;DR
This paper presents a comprehensive approach for large-scale book collection image-text matching, combining segmentation, OCR, and advanced matching algorithms, evaluated on a new extensive dataset with promising results.
Contribution
It introduces a novel two-stage matching method using CLIP and BERT, along with a new dataset for large-scale book collection image-text matching tasks.
Findings
BERT-based matching outperforms fuzzy string matching
Hungarian Algorithm improves matching accuracy
Limitations increase with larger target sets
Abstract
We address the problem of detecting and mapping all books in a collection of images to entries in a given book catalogue. Instead of performing independent retrieval for each book detected, we treat the image-text mapping problem as a many-to-many matching process, looking for the best overall match between the two sets. We combine a state-of-the-art segmentation method (SAM) to detect book spines and extract book information using a commercial OCR. We then propose a two-stage approach for text-image matching, where CLIP embeddings are used first for fast matching, followed by a second slower stage to refine the matching, employing either the Hungarian Algorithm or a BERT-based model trained to cope with noisy OCR input and partial text matches. To evaluate our approach, we publish a new dataset of annotated bookshelf images that covers the whole book collection of a public library in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Mathematics, Computing, and Information Processing
MethodsLib · Contrastive Language-Image Pre-training
