Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models
Hassan El-Hajj, Matteo Valleriani

TL;DR
This paper introduces a pipeline utilizing foundation models like GroundDINO and SAM to extract and evaluate visual data from historical documents, aiding dataset creation for humanities research.
Contribution
It presents a novel sequential approach for extracting visual elements from historical texts using text-image prompts and foundation models, addressing data scarcity in humanities datasets.
Findings
Effective extraction of visual data from historical documents.
Impact of different prompts on detection accuracy.
Potential for improved dataset creation in humanities.
Abstract
In this paper, we present a pipeline for image extraction from historical documents using foundation models, and evaluate text-image prompts and their effectiveness on humanities datasets of varying levels of complexity. The motivation for this approach stems from the high interest of historians in visual elements printed alongside historical texts on the one hand, and from the relative lack of well-annotated datasets within the humanities when compared to other domains. We propose a sequential approach that relies on GroundDINO and Meta's Segment-Anything-Model (SAM) to retrieve a significant portion of visual data from historical documents that can then be used for downstream development tasks and dataset creation, as well as evaluate the effect of different linguistic prompts on the resulting detections.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Digital Humanities and Scholarship · Handwritten Text Recognition Techniques
