CNN training with graph-based sample preselection: application to   handwritten character recognition

Fr\'ed\'eric Rayar; Masanori Goto; Seiichi Uchida

arXiv:1712.02122·cs.LG·March 7, 2018

CNN training with graph-based sample preselection: application to handwritten character recognition

Fr\'ed\'eric Rayar, Masanori Goto, Seiichi Uchida

PDF

Open Access

TL;DR

This paper introduces a graph-based sample preselection method for CNN training that reduces dataset size without sacrificing accuracy, demonstrated on large handwritten character recognition datasets.

Contribution

It proposes a novel graph-structured data preselection approach for CNN training, improving efficiency in large-scale handwritten character recognition tasks.

Findings

01

Preselection reduces dataset size significantly.

02

Recognition accuracy remains stable despite data reduction.

03

Effective on datasets with hundreds of thousands of images.

Abstract

In this paper, we present a study on sample preselection in large training data set for CNN-based classification. To do so, we structure the input data set in a network representation, namely the Relative Neighbourhood Graph, and then extract some vectors of interest. The proposed preselection method is evaluated in the context of handwritten character recognition, by using two data sets, up to several hundred thousands of images. It is shown that the graph-based preselection can reduce the training data set without degrading the recognition accuracy of a non pretrained CNN shallow model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Face and Expression Recognition · Handwritten Text Recognition Techniques