Overfitting Mechanism and Avoidance in Deep Neural Networks
Shaeke Salman, Xiuwen Liu

TL;DR
This paper investigates the causes of overfitting in deep neural networks, identifies key factors like gradient updates and loss sensitivity, and proposes a consensus-based classification method to improve accuracy and reduce overfitting, especially with limited data.
Contribution
It introduces a novel consensus-based classification algorithm that mitigates overfitting by reducing extrinsic factors and avoiding overgeneralization, outperforming traditional ensemble methods.
Findings
The proposed method achieves 95% accuracy on MNIST with only 1000 samples.
Consensus among multiple models reduces extrinsic factors exponentially.
The algorithm effectively avoids overgeneralization on ambiguous inputs.
Abstract
Assisted by the availability of data and high performance computing, deep learning techniques have achieved breakthroughs and surpassed human performance empirically in difficult tasks, including object recognition, speech recognition, and natural language processing. As they are being used in critical applications, understanding underlying mechanisms for their successes and limitations is imperative. In this paper, we show that overfitting, one of the fundamental issues in deep neural networks, is due to continuous gradient updating and scale sensitiveness of cross entropy loss. By separating samples into correctly and incorrectly classified ones, we show that they behave very differently, where the loss decreases in the correct ones and increases in the incorrect ones. Furthermore, by analyzing dynamics during training, we propose a consensus-based classification algorithm that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Anomaly Detection Techniques and Applications · Time Series Analysis and Forecasting
