Stochastic Re-weighted Gradient Descent via Distributionally Robust   Optimization

Ramnath Kumar; Kushal Majmundar; Dheeraj Nagaraj; Arun Sai; Suggala

arXiv:2306.09222·cs.LG·October 15, 2024·2 cites

Stochastic Re-weighted Gradient Descent via Distributionally Robust Optimization

Ramnath Kumar, Kushal Majmundar, Dheeraj Nagaraj, Arun Sai, Suggala

PDF

Open Access

TL;DR

This paper introduces Re-weighted Gradient Descent (RGD), a new optimization method based on distributionally robust optimization that dynamically adjusts sample importance to enhance deep neural network training across various tasks.

Contribution

The paper proposes RGD, a simple, efficient, and broadly compatible optimization technique that improves neural network performance by incorporating distributionally robust principles with dynamic sample re-weighting.

Findings

01

Achieves state-of-the-art results on multiple benchmarks

02

Improves performance on out-of-domain generalization tasks

03

Demonstrates compatibility with popular optimizers like SGD and Adam

Abstract

We present Re-weighted Gradient Descent (RGD), a novel optimization technique that improves the performance of deep neural networks through dynamic sample re-weighting. Leveraging insights from distributionally robust optimization (DRO) with Kullback-Leibler divergence, our method dynamically assigns importance weights to training data during each optimization step. RGD is simple to implement, computationally efficient, and compatible with widely used optimizers such as SGD and Adam. We demonstrate the effectiveness of RGD on various learning tasks, including supervised learning, meta-learning, and out-of-domain generalization. Notably, RGD achieves state-of-the-art results on diverse benchmarks, with improvements of +0.7% on DomainBed, +1.44% on tabular classification, \textcolor{blue}+1.94% on GLUE with BERT, and +1.01% on ImageNet-1K with ViT.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Weight Decay · Residual Connection · Softmax · Dropout · Stochastic Gradient Descent