NumtaDB - Assembled Bengali Handwritten Digits

Samiul Alam; Tahsin Reasat; Rashed Mohammad Doha; Ahmed Imtiaz Humayun

arXiv:1806.02452·cs.CV·June 8, 2018

NumtaDB - Assembled Bengali Handwritten Digits

Samiul Alam, Tahsin Reasat, Rashed Mohammad Doha, Ahmed Imtiaz Humayun

PDF

2 Repos

TL;DR

NumtaDB is a comprehensive, bias-free dataset of over 85,000 handwritten Bengali digit images designed to facilitate benchmarking and development of digit recognition algorithms.

Contribution

The paper introduces NumtaDB, a large, publicly available dataset of Bengali handwritten digits, including its collection, curation process, and key statistics.

Findings

01

Dataset contains over 85,000 images.

02

Biases from location, gender, and age are minimized.

03

Provides a valuable resource for Bengali digit recognition research.

Abstract

To benchmark Bengali digit recognition algorithms, a large publicly available dataset is required which is free from biases originating from geographical location, gender, and age. With this aim in mind, NumtaDB, a dataset consisting of more than 85,000 images of hand-written Bengali digits, has been assembled. This paper documents the collection and curation process of numerals along with the salient statistics of the dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.