Efficient Sharpness-Aware Minimization for Molecular Graph Transformer   Models

Yili Wang; Kaixiong Zhou; Ninghao Liu; Ying Wang; Xin Wang

arXiv:2406.13137·cs.LG·June 21, 2024·2 cites

Efficient Sharpness-Aware Minimization for Molecular Graph Transformer Models

Yili Wang, Kaixiong Zhou, Ninghao Liu, Ying Wang, Xin Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces GraphSAM, an efficient algorithm for molecular graph transformer training that reduces computational costs of sharpness-aware minimization while maintaining or improving generalization performance.

Contribution

The paper proposes GraphSAM, a novel method that approximates SAM's perturbation gradient using previous gradients and proves its loss landscape is tightly bounded, enhancing efficiency and generalization.

Findings

01

GraphSAM reduces training time compared to traditional SAM.

02

It improves generalization performance on multiple datasets.

03

Theoretical analysis guarantees bounded loss landscape.

Abstract

Sharpness-aware minimization (SAM) has received increasing attention in computer vision since it can effectively eliminate the sharp local minima from the training trajectory and mitigate generalization degradation. However, SAM requires two sequential gradient computations during the optimization of each step: one to obtain the perturbation gradient and the other to obtain the updating gradient. Compared with the base optimizer (e.g., Adam), SAM doubles the time overhead due to the additional perturbation gradient. By dissecting the theory of SAM and observing the training gradient of the molecular graph transformer, we propose a new algorithm named GraphSAM, which reduces the training cost of SAM and improves the generalization performance of graph transformer models. There are two key factors that contribute to this result: (i) \textit{gradient approximation}: we use the updating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YL-wang/GraphSAM
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Molecular Junctions and Nanostructures · CO2 Reduction Techniques and Catalysts

MethodsAttention Is All You Need · Softmax · Layer Normalization · Laplacian EigenMap · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer