Exploiting Heavy Tails in Training Times of Multilayer Perceptrons: A Case Study with the UCI Thyroid Disease Database
Manuel Cebrian, Ivan Cantador

TL;DR
This paper models multilayer perceptron training times as heavy-tailed distributions and demonstrates how restart strategies can significantly reduce training time on the UCI Thyroid Disease Database.
Contribution
It provides empirical evidence of heavy-tailed training time distributions and introduces restart strategies to effectively reduce training durations.
Findings
Heavy tails observed in training time distribution.
Restart strategies reduce expected training time by up to 40%.
Knowledge of distribution enhances time reduction effectiveness.
Abstract
The random initialization of weights of a multilayer perceptron makes it possible to model its training process as a Las Vegas algorithm, i.e. a randomized algorithm which stops when some required training error is obtained, and whose execution time is a random variable. This modeling is used to perform a case study on a well-known pattern recognition benchmark: the UCI Thyroid Disease Database. Empirical evidence is presented of the training time probability distribution exhibiting a heavy tail behavior, meaning a big probability mass of long executions. This fact is exploited to reduce the training time cost by applying two simple restart strategies. The first assumes full knowledge of the distribution yielding a 40% cut down in expected time with respect to the training without restarts. The second, assumes null knowledge, yielding a reduction ranging from 9% to 23%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Blind Source Separation Techniques · Face and Expression Recognition
