Exploring text-to-image generation for historical document image retrieval
Melissa Cote, Alexandra Branzan Albu

TL;DR
This paper explores using text-to-image AI generation to create query images from textual attribute descriptions, enhancing historical document retrieval by bridging attribute-based and query-by-example methods.
Contribution
It introduces T2I-QBE, a novel approach combining generative AI with traditional retrieval, specifically applied to historical documents, demonstrating its viability for the first time.
Findings
T2I-QBE improves retrieval effectiveness on historical documents.
Generative AI can produce relevant query images from text prompts.
First application of T2I generation for document image retrieval.
Abstract
Attribute-based document image retrieval (ABDIR) was recently proposed as an alternative to query-by-example (QBE) searches, the dominant document image retrieval (DIR) paradigm. One drawback of QBE searches is that they require sample query documents on hand that may not be available. ABDIR aims to offer users a flexible way to retrieve document images based on memorable visual features of document contents, describing document images with combinations of visual attributes determined via convolutional neural network (CNN)-based binary classifiers. We present an exploratory study of the use of generative AI to bridge the gap between QBE and ABDIR, focusing on historical documents as a use case for their diversity and uniqueness in visual features. We hypothesize that text-to-image (T2I) generation can be leveraged to create query document images using text prompts based on ABDIR-like…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
