Integrating Visual and Textual Inputs for Searching Large-Scale Map   Collections with CLIP

Jamie Mahowald; Benjamin Charles Germain Lee

arXiv:2410.01190·cs.IR·October 3, 2024

Integrating Visual and Textual Inputs for Searching Large-Scale Map Collections with CLIP

Jamie Mahowald, Benjamin Charles Germain Lee

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates how multimodal CLIP embeddings can enable natural language, visual, and combined searches in large-scale map collections, enhancing exploration beyond traditional metadata-based methods.

Contribution

It introduces a multimodal search framework for maps using CLIP, including a fine-tuning dataset and code, advancing interactive map exploration techniques.

Findings

01

Effective multimodal search results for map queries

02

Identification of strengths and limitations of CLIP-based search

03

Open-source code for reproducibility and further research

Abstract

Despite the prevalence and historical importance of maps in digital collections, current methods of navigating and exploring map collections are largely restricted to catalog records and structured metadata. In this paper, we explore the potential for interactively searching large-scale map collections using natural language inputs ("maps with sea monsters"), visual inputs (i.e., reverse image search), and multimodal inputs (an example map + "more grayscale"). As a case study, we adopt 562,842 images of maps publicly accessible via the Library of Congress's API. To accomplish this, we use the mulitmodal Contrastive Language-Image Pre-training (CLIP) machine learning model to generate embeddings for these maps, and we develop code to implement exploratory search capabilities with these input strategies. We present results for example searches created in consultation with staff in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

j-mahowald/clip-loc-maps
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeographic Information Systems Studies · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques

MethodsLib · Contrastive Language-Image Pre-training