Unreasonable Effectiveness of Learning Neural Networks: From Accessible   States and Robust Ensembles to Basic Algorithmic Schemes

Carlo Baldassi; Christian Borgs; Jennifer Chayes; Alessandro Ingrosso,; Carlo Lucibello; Luca Saglietti; Riccardo Zecchina

arXiv:1605.06444·stat.ML·December 2, 2016

Unreasonable Effectiveness of Learning Neural Networks: From Accessible States and Robust Ensembles to Basic Algorithmic Schemes

Carlo Baldassi, Christian Borgs, Jennifer Chayes, Alessandro Ingrosso,, Carlo Lucibello, Luca Saglietti, Riccardo Zecchina

PDF

TL;DR

This paper reveals that neural networks often operate in dense, accessible regions of the weight space, and introduces the robust ensemble measure and algorithms that improve learning by targeting these regions.

Contribution

It introduces the robust ensemble measure and a general algorithmic framework for better optimization in neural networks, especially with discrete weights.

Findings

01

Robust ensemble suppresses trapping in isolated configurations.

02

Algorithms targeting dense regions improve performance.

03

The approach applies to various optimization problems.

Abstract

In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost-function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare - but extremely dense and accessible - regions of configurations in the network weight space. We define a novel measure, which we call the "robust ensemble" (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.