A Multi-Modal Multilingual Benchmark for Document Image Classification

Yoshinari Fujinuma; Siddharth Varia; Nishant Sankaran; Srikar; Appalaraju; Bonan Min; Yogarshi Vyas

arXiv:2310.16356·cs.CL·October 26, 2023·1 cites

A Multi-Modal Multilingual Benchmark for Document Image Classification

Yoshinari Fujinuma, Siddharth Varia, Nishant Sankaran, Srikar, Appalaraju, Bonan Min, Yogarshi Vyas

PDF

Open Access

TL;DR

This paper introduces two new multilingual datasets for document image classification and evaluates existing models, revealing their limitations in cross-lingual transfer, thereby paving the way for future improvements.

Contribution

The paper presents two curated multilingual datasets, WIKI-DOC and MULTIEURLEX-DOC, and conducts a comprehensive evaluation of Document AI models in new multi-label and zero-shot cross-lingual settings.

Findings

01

Existing models show limited cross-lingual transfer capabilities.

02

Multilingual Document AI models struggle with typologically distant languages.

03

New datasets facilitate future research in document image classification.

Abstract

Document image classification is different from plain-text document classification and consists of classifying a document by understanding the content and structure of documents such as forms, emails, and other such documents. We show that the only existing dataset for this task (Lewis et al., 2006) has several limitations and we introduce two newly curated multilingual datasets WIKI-DOC and MULTIEURLEX-DOC that overcome these limitations. We further undertake a comprehensive study of popular visually-rich document understanding or Document AI models in previously untested setting in document image classification such as 1) multi-label classification, and 2) zero-shot cross-lingual transfer setup. Experimental results show limitations of multilingual Document AI models on cross-lingual transfer across typologically distant languages. Our datasets and findings open the door for future…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Text and Document Classification Technologies · Music and Audio Processing