# Fast convergence rates of deep neural networks for classification

**Authors:** Yongdai Kim, Ilsang Ohn, Dongha Kim

arXiv: 1812.03599 · 2019-06-19

## TL;DR

This paper establishes that deep neural networks with ReLU activation and hinge or cross-entropy loss can achieve fast convergence rates in classification tasks under various conditions, highlighting their flexibility and effectiveness.

## Contribution

The paper provides theoretical convergence rate results for DNN classifiers with ReLU and hinge loss across different data conditions, and compares hinge loss with cross-entropy in practice.

## Key findings

- DNN classifiers with ReLU and hinge loss achieve fast convergence under smooth decision boundary and margin conditions.
- DNN classifiers with cross-entropy converge quickly when class probabilities are near 0 or 1.
- Numerical experiments support the theoretical convergence rates and compare hinge loss and cross-entropy performance.

## Abstract

We derive the fast convergence rates of a deep neural network (DNN) classifier with the rectified linear unit (ReLU) activation function learned using the hinge loss. We consider three cases for a true model: (1) a smooth decision boundary, (2) smooth conditional class probability, and (3) the margin condition (i.e., the probability of inputs near the decision boundary is small). We show that the DNN classifier learned using the hinge loss achieves fast rate convergences for all three cases provided that the architecture (i.e., the number of layers, number of nodes and sparsity). is carefully selected. An important implication is that DNN architectures are very flexible for use in various cases without much modification. In addition, we consider a DNN classifier learned by minimizing the cross-entropy, and show that the DNN classifier achieves a fast convergence rate under the condition that the conditional class probabilities of most data are sufficiently close to either 1 or zero. This assumption is not unusual for image recognition because human beings are extremely good at recognizing most images. To confirm our theoretical explanation, we present the results of a small numerical study conducted to compare the hinge loss and cross-entropy.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.03599/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1812.03599/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1812.03599/full.md

---
Source: https://tomesphere.com/paper/1812.03599