On Web-based Visual Corpus Construction for Visual Document   Understanding

Donghyun Kim; Teakgyu Hong; Moonbin Yim; Yoonsik Kim; Geewook Kim

arXiv:2211.03256·cs.CV·May 3, 2023

On Web-based Visual Corpus Construction for Visual Document Understanding

Donghyun Kim, Teakgyu Hong, Moonbin Yim, Yoonsik Kim, Geewook Kim

PDF

Open Access 1 Repo

TL;DR

This paper introduces Webvicob, a web-based tool for creating large-scale, multilingual visual corpora from Wikipedia HTML data, enhancing visual document understanding models especially for resource-scarce languages.

Contribution

Webvicob is a novel dataset generator that constructs extensive visual corpora from raw web data, improving VDU model training and performance on downstream tasks.

Findings

01

Generated datasets improve VDU model accuracy

02

Over 13% improvement on DocVQA with Webvicob data

03

Public implementation available for community use

Abstract

In recent years, research on visual document understanding (VDU) has grown significantly, with a particular emphasis on the development of self-supervised learning methods. However, one of the significant challenges faced in this field is the limited availability of publicly accessible visual corpora or extensive collections of images with detailed text annotations, particularly for non-Latin or resource-scarce languages. To address this challenge, we propose Web-based Visual Corpus Builder (Webvicob), a dataset generator engine capable of constructing large-scale, multilingual visual corpora from raw Wikipedia HTML dumps. Our experiments demonstrate that the data generated by Webvicob can be used to train robust VDU models that perform well on various downstream tasks, such as DocVQA and post-OCR parsing. Furthermore, when using a dataset of 1 million images generated by Webvicob, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

clovaai/webvicob
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization