Lookaround Optimizer: $k$ steps around, 1 step average

Jiangtao Zhang; Shunyu Liu; Jie Song; Tongtian Zhu; Zhengqi Xu; Mingli; Song

arXiv:2306.07684·cs.CV·November 3, 2023·2 cites

Lookaround Optimizer: $k$ steps around, 1 step average

Jiangtao Zhang, Shunyu Liu, Jie Song, Tongtian Zhu, Zhengqi Xu, Mingli, Song

PDF

Open Access 1 Video

TL;DR

Lookaround is a novel SGD-based optimizer that iteratively trains multiple networks with data augmentation and averages them, leading to flatter minima and improved generalization in deep learning models.

Contribution

It introduces a new optimizer combining around and average steps, enhancing diversity and weight locality for better training outcomes.

Findings

01

Outperforms state-of-the-art methods on CIFAR and ImageNet benchmarks.

02

Effective for both CNNs and ViTs.

03

Theoretically justified by convergence analysis.

Abstract

Weight Average (WA) is an active research topic due to its simplicity in ensembling deep networks and the effectiveness in promoting generalization. Existing weight average approaches, however, are often carried out along only one training trajectory in a post-hoc manner (i.e., the weights are averaged after the entire training process is finished), which significantly degrades the diversity between networks and thus impairs the effectiveness. In this paper, inspired by weight average, we propose Lookaround, a straightforward yet effective SGD-based optimizer leading to flatter minima with better generalization. Specifically, Lookaround iterates two steps during the whole training period: the around step and the average step. In each iteration, 1) the around step starts from a common point and trains multiple networks simultaneously, each on transformed data by a different data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Lookaround Optimizer: $k$ steps around, 1 step average· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Machine Learning and Data Classification