Efficient Sharpness-Aware Minimization for Molecular Graph Transformer Models
Yili Wang, Kaixiong Zhou, Ninghao Liu, Ying Wang, Xin Wang

TL;DR
This paper introduces GraphSAM, an efficient algorithm for molecular graph transformer training that reduces computational costs of sharpness-aware minimization while maintaining or improving generalization performance.
Contribution
The paper proposes GraphSAM, a novel method that approximates SAM's perturbation gradient using previous gradients and proves its loss landscape is tightly bounded, enhancing efficiency and generalization.
Findings
GraphSAM reduces training time compared to traditional SAM.
It improves generalization performance on multiple datasets.
Theoretical analysis guarantees bounded loss landscape.
Abstract
Sharpness-aware minimization (SAM) has received increasing attention in computer vision since it can effectively eliminate the sharp local minima from the training trajectory and mitigate generalization degradation. However, SAM requires two sequential gradient computations during the optimization of each step: one to obtain the perturbation gradient and the other to obtain the updating gradient. Compared with the base optimizer (e.g., Adam), SAM doubles the time overhead due to the additional perturbation gradient. By dissecting the theory of SAM and observing the training gradient of the molecular graph transformer, we propose a new algorithm named GraphSAM, which reduces the training cost of SAM and improves the generalization performance of graph transformer models. There are two key factors that contribute to this result: (i) \textit{gradient approximation}: we use the updating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Molecular Junctions and Nanostructures · CO2 Reduction Techniques and Catalysts
MethodsAttention Is All You Need · Softmax · Layer Normalization · Laplacian EigenMap · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam · Linear Layer
