Constructing Image-Text Pair Dataset from Books

Yamato Okamoto; Haruto Toyonaga; Yoshihisa Ijiri; Hirokatsu Kataoka

arXiv:2310.01936·cs.CV·October 4, 2023

Constructing Image-Text Pair Dataset from Books

Yamato Okamoto, Haruto Toyonaga, Yoshihisa Ijiri, Hirokatsu Kataoka

PDF

Open Access

TL;DR

This paper presents a pipeline for creating image-text datasets from digitized books using OCR, object detection, and layout analysis, enabling machine learning to extract insights from archival images and texts.

Contribution

It introduces a novel dataset construction pipeline that automates image-text pair extraction from digitized books for machine learning applications.

Findings

01

Effective image-text retrieval demonstrated

02

Pipeline successfully applied to old photo books

03

Enables autonomous insight extraction from archives

Abstract

Digital archiving is becoming widespread owing to its effectiveness in protecting valuable books and providing knowledge to many people electronically. In this paper, we propose a novel approach to leverage digital archives for machine learning. If we can fully utilize such digitized data, machine learning has the potential to uncover unknown insights and ultimately acquire knowledge autonomously, just like humans read books. As a first step, we design a dataset construction pipeline comprising an optical character reader (OCR), an object detector, and a layout analyzer for the autonomous extraction of image-text pairs. In our experiments, we apply our pipeline on old photo books to construct an image-text pair dataset, showing its effectiveness in image-text retrieval and insight extraction.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications