Image-text matching for large-scale book collections

Artemis Llabr\'es; Arka Ujjal Dey; Dimosthenis Karatzas; and Ernest Valveny

arXiv:2407.19812·cs.CV·June 30, 2025

Image-text matching for large-scale book collections

Artemis Llabr\'es, Arka Ujjal Dey, Dimosthenis Karatzas, and Ernest Valveny

PDF

Open Access 1 Repo

TL;DR

This paper presents a comprehensive approach for large-scale book collection image-text matching, combining segmentation, OCR, and advanced matching algorithms, evaluated on a new extensive dataset with promising results.

Contribution

It introduces a novel two-stage matching method using CLIP and BERT, along with a new dataset for large-scale book collection image-text matching tasks.

Findings

01

BERT-based matching outperforms fuzzy string matching

02

Hungarian Algorithm improves matching accuracy

03

Limitations increase with larger target sets

Abstract

We address the problem of detecting and mapping all books in a collection of images to entries in a given book catalogue. Instead of performing independent retrieval for each book detected, we treat the image-text mapping problem as a many-to-many matching process, looking for the best overall match between the two sets. We combine a state-of-the-art segmentation method (SAM) to detect book spines and extract book information using a commercial OCR. We then propose a two-stage approach for text-image matching, where CLIP embeddings are used first for fast matching, followed by a second slower stage to refine the matching, employing either the Hungarian Algorithm or a BERT-based model trained to cope with noisy OCR input and partial text matches. To evaluate our approach, we publish a new dataset of annotated bookshelf images that covers the whole book collection of a public library in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

llabres/library-dataset
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Mathematics, Computing, and Information Processing

MethodsLib · Contrastive Language-Image Pre-training