Evaluation of Neural Network Classification Systems on Document Stream

Joris Voerman; Aurelie Joseph; Mickael Coustaty; Vincent Poulain d; Andecy; Jean-Marc Ogier

arXiv:2007.07547·cs.CV·July 16, 2020

Evaluation of Neural Network Classification Systems on Document Stream

Joris Voerman, Aurelie Joseph, Mickael Coustaty, Vincent Poulain d, Andecy, Jean-Marc Ogier

PDF

TL;DR

This paper evaluates neural network-based document classification in realistic industrial scenarios, revealing significant performance drops for underrepresented classes and highlighting the need for adaptation.

Contribution

It analyzes the efficiency of NN-based document classification in sub-optimal, real-world conditions, comparing image and text-based approaches across various challenging scenarios.

Findings

01

Performance drops significantly in realistic cases

02

NN systems overfit well-represented classes

03

Underrepresented classes are poorly classified

Abstract

One major drawback of state of the art Neural Networks (NN)-based approaches for document classification purposes is the large number of training samples required to obtain an efficient classification. The minimum required number is around one thousand annotated documents for each class. In many cases it is very difficult, if not impossible, to gather this number of samples in real industrial processes. In this paper, we analyse the efficiency of NN-based document classification systems in a sub-optimal training case, based on the situation of a company document stream. We evaluated three different approaches, one based on image content and two on textual content. The evaluation was divided into four parts: a reference case, to assess the performance of the system in the lab; two cases that each simulate a specific difficulty linked to document stream processing; and a realistic case…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.