Linking Representations with Multimodal Contrastive Learning
Abhishek Arora, Xinmei Yang, Shao-Yu Jheng, Melissa Dell

TL;DR
This paper introduces CLIPPINGS, a multimodal contrastive learning approach that aligns vision and language embeddings to improve record linkage, especially in noisy OCR scenarios, outperforming traditional string matching methods.
Contribution
The study develops a novel multimodal contrastive learning framework that leverages both images and OCR texts for improved record linkage, demonstrating significant performance gains over existing methods.
Findings
CLIPPINGS outperforms string matching in linking historical Japanese firms.
Self-supervised multimodal models surpass traditional string matching techniques.
Multimodal pre-training enhances vision-only encoder performance even with single modality at inference.
Abstract
Many applications require linking individuals, firms, or locations across datasets. Most widely used methods, especially in social science, do not employ deep learning, with record linkage commonly approached using string matching techniques. Moreover, existing methods do not exploit the inherently multimodal nature of documents. In historical record linkage applications, documents are typically noisily transcribed by optical character recognition (OCR). Linkage with just OCR'ed texts may fail due to noise, whereas linkage with just image crops may also fail because vision models lack language understanding (e.g., of abbreviations or other different ways of writing firm names). To leverage multimodal learning, this study develops CLIPPINGS (Contrastively LInking Pooled Pre-trained Embeddings). CLIPPINGS aligns symmetric vision and language bi-encoders, through contrastive language-image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Handwritten Text Recognition Techniques · Digital and Traditional Archives Management
