Judging Adam: Studying the Performance of Optimization Methods on ML4SE   Tasks

Dmitry Pasechnyuk; Anton Prazdnichnykh; Mikhail Evtikhiev; Timofey; Bryksin

arXiv:2303.03540·cs.SE·March 8, 2023·1 cites

Judging Adam: Studying the Performance of Optimization Methods on ML4SE Tasks

Dmitry Pasechnyuk, Anton Prazdnichnykh, Mikhail Evtikhiev, Timofey, Bryksin

PDF

Open Access

TL;DR

This paper evaluates the performance of various optimization algorithms on source code-related deep learning tasks, revealing significant differences in effectiveness and highlighting RAdam as a consistently strong choice.

Contribution

It provides the first comprehensive benchmark of optimizers on ML4SE tasks, demonstrating RAdam's superior performance and urging the community to reconsider default optimizer choices.

Findings

01

Optimizer choice significantly affects model quality, with up to two-fold score differences.

02

RAdam and its Lookahead variant outperform other optimizers on source code tasks.

03

The study highlights the need for further research on optimizer performance in code-related machine learning.

Abstract

Solving a problem with a deep learning model requires researchers to optimize the loss function with a certain optimization method. The research community has developed more than a hundred different optimizers, yet there is scarce data on optimizer performance in various tasks. In particular, none of the benchmarks test the performance of optimizers on source code-related problems. However, existing benchmark data indicates that certain optimizers may be more efficient for particular domains. In this work, we test the performance of various optimizers on deep learning models for source code and find that the choice of an optimizer can have a significant impact on the model quality, with up to two-fold score differences between some of the relatively well-performing optimizers. We also find that RAdam optimizer (and its modification with the Lookahead envelope) is the best optimizer that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Machine Learning and Data Classification · Parallel Computing and Optimization Techniques

MethodsNone · Test · Lookahead · Adam · RAdam