Exploring text-to-image generation for historical document image retrieval

Melissa Cote; Alexandra Branzan Albu

arXiv:2507.20934·cs.CV·July 29, 2025

Exploring text-to-image generation for historical document image retrieval

Melissa Cote, Alexandra Branzan Albu

PDF

TL;DR

This paper explores using text-to-image AI generation to create query images from textual attribute descriptions, enhancing historical document retrieval by bridging attribute-based and query-by-example methods.

Contribution

It introduces T2I-QBE, a novel approach combining generative AI with traditional retrieval, specifically applied to historical documents, demonstrating its viability for the first time.

Findings

01

T2I-QBE improves retrieval effectiveness on historical documents.

02

Generative AI can produce relevant query images from text prompts.

03

First application of T2I generation for document image retrieval.

Abstract

Attribute-based document image retrieval (ABDIR) was recently proposed as an alternative to query-by-example (QBE) searches, the dominant document image retrieval (DIR) paradigm. One drawback of QBE searches is that they require sample query documents on hand that may not be available. ABDIR aims to offer users a flexible way to retrieve document images based on memorable visual features of document contents, describing document images with combinations of visual attributes determined via convolutional neural network (CNN)-based binary classifiers. We present an exploratory study of the use of generative AI to bridge the gap between QBE and ABDIR, focusing on historical documents as a use case for their diversity and uniqueness in visual features. We hypothesize that text-to-image (T2I) generation can be leveraged to create query document images using text prompts based on ABDIR-like…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.