Stochastic Re-weighted Gradient Descent via Distributionally Robust Optimization
Ramnath Kumar, Kushal Majmundar, Dheeraj Nagaraj, Arun Sai, Suggala

TL;DR
This paper introduces Re-weighted Gradient Descent (RGD), a new optimization method based on distributionally robust optimization that dynamically adjusts sample importance to enhance deep neural network training across various tasks.
Contribution
The paper proposes RGD, a simple, efficient, and broadly compatible optimization technique that improves neural network performance by incorporating distributionally robust principles with dynamic sample re-weighting.
Findings
Achieves state-of-the-art results on multiple benchmarks
Improves performance on out-of-domain generalization tasks
Demonstrates compatibility with popular optimizers like SGD and Adam
Abstract
We present Re-weighted Gradient Descent (RGD), a novel optimization technique that improves the performance of deep neural networks through dynamic sample re-weighting. Leveraging insights from distributionally robust optimization (DRO) with Kullback-Leibler divergence, our method dynamically assigns importance weights to training data during each optimization step. RGD is simple to implement, computationally efficient, and compatible with widely used optimizers such as SGD and Adam. We demonstrate the effectiveness of RGD on various learning tasks, including supervised learning, meta-learning, and out-of-domain generalization. Notably, RGD achieves state-of-the-art results on diverse benchmarks, with improvements of +0.7% on DomainBed, +1.44% on tabular classification, \textcolor{blue}+1.94% on GLUE with BERT, and +1.01% on ImageNet-1K with ViT.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Weight Decay · Residual Connection · Softmax · Dropout · Stochastic Gradient Descent
