Lookaround Optimizer: $k$ steps around, 1 step average
Jiangtao Zhang, Shunyu Liu, Jie Song, Tongtian Zhu, Zhengqi Xu, Mingli, Song

TL;DR
Lookaround is a novel SGD-based optimizer that iteratively trains multiple networks with data augmentation and averages them, leading to flatter minima and improved generalization in deep learning models.
Contribution
It introduces a new optimizer combining around and average steps, enhancing diversity and weight locality for better training outcomes.
Findings
Outperforms state-of-the-art methods on CIFAR and ImageNet benchmarks.
Effective for both CNNs and ViTs.
Theoretically justified by convergence analysis.
Abstract
Weight Average (WA) is an active research topic due to its simplicity in ensembling deep networks and the effectiveness in promoting generalization. Existing weight average approaches, however, are often carried out along only one training trajectory in a post-hoc manner (i.e., the weights are averaged after the entire training process is finished), which significantly degrades the diversity between networks and thus impairs the effectiveness. In this paper, inspired by weight average, we propose Lookaround, a straightforward yet effective SGD-based optimizer leading to flatter minima with better generalization. Specifically, Lookaround iterates two steps during the whole training period: the around step and the average step. In each iteration, 1) the around step starts from a common point and trains multiple networks simultaneously, each on transformed data by a different data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Machine Learning and Data Classification
