MIDV-2020: A Comprehensive Benchmark Dataset for Identity Document   Analysis

Konstantin Bulatov; Ekaterina Emelianova; Daniil Tropin; Natalya; Skoryukina; Yulia Chernyshova; Alexander Sheshkus; Sergey Usilin; Zuheng; Ming; Jean-Christophe Burie; Muhammad Muzzamil Luqman; Vladimir V. Arlazarov

arXiv:2107.00396·cs.CV·March 23, 2022

MIDV-2020: A Comprehensive Benchmark Dataset for Identity Document Analysis

Konstantin Bulatov, Ekaterina Emelianova, Daniil Tropin, Natalya, Skoryukina, Yulia Chernyshova, Alexander Sheshkus, Sergey Usilin, Zuheng, Ming, Jean-Christophe Burie, Muhammad Muzzamil Luqman, Vladimir V. Arlazarov

PDF

TL;DR

The paper introduces MIDV-2020, a large, diverse benchmark dataset for identity document analysis, including various document types, conditions, and annotations, to facilitate research in document recognition and fraud prevention.

Contribution

It provides the largest publicly available dataset with rich annotations for complex identity document analysis tasks, addressing previous dataset limitations.

Findings

01

Baseline results for document detection and recognition tasks.

02

Rich annotations enable comprehensive evaluation of recognition algorithms.

03

Dataset supports diverse document types and conditions.

Abstract

Identity documents recognition is an important sub-field of document analysis, which deals with tasks of robust document detection, type identification, text fields recognition, as well as identity fraud prevention and document authenticity validation given photos, scans, or video frames of an identity document capture. Significant amount of research has been published on this topic in recent years, however a chief difficulty for such research is scarcity of datasets, due to the subject matter being protected by security requirements. A few datasets of identity documents which are available lack diversity of document types, capturing conditions, or variability of document field values. In addition, the published datasets were typically designed only for a subset of document recognition problems, not for a complex identity document analysis. In this paper, we present a dataset MIDV-2020…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.