TRAM: Bridging Trust Regions and Sharpness Aware Minimization
Tom Sherborne, Naomi Saphra, Pradeep Dasigi, Hao Peng

TL;DR
TRAM combines trust-region and sharpness-aware minimization techniques to enhance out-of-domain generalization in vision and language tasks by optimizing both parameter and function space curvature.
Contribution
It introduces a novel fine-tuning algorithm, TRAM, that unifies trust-region and SAM strategies to improve domain transfer and representation robustness.
Findings
TRAM outperforms existing SAM and trust-region methods across vision and language tasks.
TRAM achieves superior domain transfer, especially in challenging anticorrelated domain scenarios.
Minimal additional computation is required compared to previous sharpness-aware methods.
Abstract
Sharpness-aware minimization (SAM) reports improving domain generalization by reducing the loss surface curvature in the parameter space. However, generalization during fine-tuning is often more dependent on the transferability of representations in the function space. Trust-region methods (TR) target this goal by regularizing representation curvature to reduce catastrophic forgetting of pre-trained task-agnostic information while adopting task-specific skills. We consider unifying these strategies for low curvature in both parameter space and function space to improve out-of-domain (OOD) generalization. We propose Trust Region Aware Minimization (TRAM), a SAM algorithm fine-tuning for low parameter sharpness and smooth, informative representations preserving pre-trained structure. TRAM uses a trust region bound to inform the SAM adversarial neighborhood, introducing an awareness of…
Peer Reviews
Decision·ICLR 2024 spotlight
- The paper did a good job summarizing the existing approaches and how the proposed method builds on top of them. - The experimental settings are detailed, and reasonable. - Experiments seem quite comprehensive at least for the settings considered in this work. - It's quite remarkable that the proposed methods achieve best performance across different fine-tuning tasks.
- See the question section below.
1. The proposed method is intuitive and well-motivated. The combination of SAM and Trust region methods is reasonable and interesting. 2. Extensive experiments on multiple NLP tasks demonstrate the effectiveness of the proposed method.
1. Theoretical motivation for unifying SAM and Trust region methods is not provided. 2. Some results have high variance across runs. More runs may better characterize the performance.
**Strengths** 1. The paper is clearly written and easy to follow. 2. I think the paper aims to contribute to SAM from a very interesting perspective, i.e. fine-tuning techniques. Considering that fine-tuning has become a nearly necessary procedure in NLP tasks, the paper may provide some promising instructions further. 3. Combine the proposed method with Fisher-SAM can reduce extra forward-propagation count when implementing to the same count as in vanilla SAM.
**Weakness** 1. The core of this proposed method is to adaptively change the neighbourhood radius in SAM (or ASAM) based on certain distance measure. This somehow does not follow the idea of Trust Region Regularization which adds additional constraint on top of the loss according to the measure. More accurately, they are two different things. And, I could not find a clear meaning why using such a distance as the neighbourhood radius could give the "Trust". Several questions arise: what does the
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Topic Modeling
MethodsSegment Anything Model · Attentive Walk-Aggregating Graph Neural Network · Sharpness-Aware Minimization
