# Cutting the Error by Half: Investigation of Very Deep CNN and Advanced   Training Strategies for Document Image Classification

**Authors:** Muhammad Zeshan Afzal, Andreas K\"olsch, Sheraz Ahmed, Marcus Liwicki

arXiv: 1704.03557 · 2018-03-28

## TL;DR

This paper explores the use of very deep CNN architectures and advanced transfer learning strategies, including large-scale document image pretraining, to significantly improve document image classification accuracy.

## Contribution

It introduces the use of very deep neural networks with transfer learning from large document datasets, achieving substantial error reduction in classification tasks.

## Key findings

- Achieved 91.13% accuracy on Tobacco-3482, a 60% error reduction.
- Achieved 90.97% accuracy on RVL-CDIP, an 11.5% error reduction.
- Demonstrated the impact of training data size and parameters on classification performance.

## Abstract

We present an exhaustive investigation of recent Deep Learning architectures, algorithms, and strategies for the task of document image classification to finally reduce the error by more than half. Existing approaches, such as the DeepDocClassifier, apply standard Convolutional Network architectures with transfer learning from the object recognition domain. The contribution of the paper is threefold: First, it investigates recently introduced very deep neural network architectures (GoogLeNet, VGG, ResNet) using transfer learning (from real images). Second, it proposes transfer learning from a huge set of document images, i.e. 400,000 documents. Third, it analyzes the impact of the amount of training data (document images) and other parameters to the classification abilities. We use two datasets, the Tobacco-3482 and the large-scale RVL-CDIP dataset. We achieve an accuracy of 91.13% for the Tobacco-3482 dataset while earlier approaches reach only 77.6%. Thus, a relative error reduction of more than 60% is achieved. For the large dataset RVL-CDIP, an accuracy of 90.97% is achieved, corresponding to a relative error reduction of 11.5%.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.03557/full.md

## Figures

37 figures with captions in the complete paper: https://tomesphere.com/paper/1704.03557/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/1704.03557/full.md

---
Source: https://tomesphere.com/paper/1704.03557