CNN training with graph-based sample preselection: application to handwritten character recognition
Fr\'ed\'eric Rayar, Masanori Goto, Seiichi Uchida

TL;DR
This paper introduces a graph-based sample preselection method for CNN training that reduces dataset size without sacrificing accuracy, demonstrated on large handwritten character recognition datasets.
Contribution
It proposes a novel graph-structured data preselection approach for CNN training, improving efficiency in large-scale handwritten character recognition tasks.
Findings
Preselection reduces dataset size significantly.
Recognition accuracy remains stable despite data reduction.
Effective on datasets with hundreds of thousands of images.
Abstract
In this paper, we present a study on sample preselection in large training data set for CNN-based classification. To do so, we structure the input data set in a network representation, namely the Relative Neighbourhood Graph, and then extract some vectors of interest. The proposed preselection method is evaluated in the context of handwritten character recognition, by using two data sets, up to several hundred thousands of images. It is shown that the graph-based preselection can reduce the training data set without degrading the recognition accuracy of a non pretrained CNN shallow model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Face and Expression Recognition · Handwritten Text Recognition Techniques
