TL;DR
This paper introduces MIDV-500, a comprehensive video dataset of 50 identity document types designed to advance research in document analysis and recognition on mobile devices, providing ground truth annotations and baseline evaluations.
Contribution
The paper presents MIDV-500, the first specialized dataset for identity document recognition in videos, along with baseline evaluation results for key recognition tasks.
Findings
Existing face detection methods perform variably on the dataset.
Text line recognition accuracy varies across document types.
Document data extraction methods show potential but need improvement.
Abstract
A lot of research has been devoted to identity documents analysis and recognition on mobile devices. However, no publicly available datasets designed for this particular problem currently exist. There are a few datasets which are useful for associated subtasks but in order to facilitate a more comprehensive scientific and technical approach to identity document recognition more specialized datasets are required. In this paper we present a Mobile Identity Document Video dataset (MIDV-500) consisting of 500 video clips for 50 different identity document types with ground truth which allows to perform research in a wide scope of document analysis problems. The paper presents characteristics of the dataset and evaluation results for existing methods of face detection, text line recognition, and document fields data extraction. Since an important feature of identity documents is their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
