TL;DR
NumtaDB is a comprehensive, bias-free dataset of over 85,000 handwritten Bengali digit images designed to facilitate benchmarking and development of digit recognition algorithms.
Contribution
The paper introduces NumtaDB, a large, publicly available dataset of Bengali handwritten digits, including its collection, curation process, and key statistics.
Findings
Dataset contains over 85,000 images.
Biases from location, gender, and age are minimized.
Provides a valuable resource for Bengali digit recognition research.
Abstract
To benchmark Bengali digit recognition algorithms, a large publicly available dataset is required which is free from biases originating from geographical location, gender, and age. With this aim in mind, NumtaDB, a dataset consisting of more than 85,000 images of hand-written Bengali digits, has been assembled. This paper documents the collection and curation process of numerals along with the salient statistics of the dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
