Distributionally Robust Optimization via Generative Ambiguity Modeling
Jiaqi Wen, Jianyi Yang

TL;DR
This paper introduces a novel generative ambiguity set for Distributionally Robust Optimization, improving robustness and out-of-distribution generalization by modeling adversarial distributions with generative models.
Contribution
It proposes GAS-DRO, a tractable DRO method using generative models for ambiguity sets, and demonstrates its theoretical convergence and empirical superiority.
Findings
GAS-DRO achieves better OOD generalization in ML tasks.
The method is theoretically proven to converge.
Empirical results show improved robustness over existing methods.
Abstract
This paper studies Distributionally Robust Optimization (DRO), a fundamental framework for enhancing the robustness and generalization of statistical learning and optimization. An effective ambiguity set for DRO must involve distributions that remain consistent to the nominal distribution while being diverse enough to account for a variety of potential scenarios. Moreover, it should lead to tractable DRO solutions. To this end, we propose generative model-based ambiguity sets that capture various adversarial distributions beyond the nominal support space while maintaining consistency with the nominal distribution. Building on this generative ambiguity modeling, we propose DRO with Generative Ambiguity Set (GAS-DRO), a tractable DRO algorithm that solves the inner maximization over the parameterized generative model space. We formally establish the stationary convergence performance of…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper identifies real problems with existing DRO methods. φ-divergence restricts support. Wasserstein distance is hard to optimize. 2. Theorems 1 and 2 provide convergence guarantees. Lemma 1 connects reconstruction loss to KL divergence. 3. GAS-DRO achieves 63.7% improvement over baseline ML. It outperforms existing DRO methods by significant margins.
1. Paper claims "first to model ambiguity sets in parameterized space of likelihood-based generative models" (page 2). This is incorrect. Michel et al. (2021) "Modeling the Second Player in Distributionally Robust Optimization" (ICLR 2021) already proposed Parametric DRO (P-DRO) that: (a) uses neural generative models q_ψ for adversary, (b) parameterizes uncertainty set with model weights ψ, (c) uses likelihood-based Transformers (evaluates log q_ψ(x,y) in Equation 8), (d) solves same min-max ga
1. The paper introduces a generative-model-based ambiguity set for DRO, which allows considering distributions outside the original support while maintaining similarity to the nominal distribution. The approach does not require too much prior condition and addresses the limitations of KL-divergence ambiguity (no support shift) and improves over Wasserstein ambiguity by providing a tractable, parameterized search space. 2. Empirical OOD Performance Gains: GAS-DRO demonstrates state-of-the-art OO
1. This paper investigates DRO, which is widespread considered as an approach for discriminative tasks. The method in the paper instead uses a generative method to address the classification problem but determining ambiguity set itself needs the sampling operation. Therefore, I wonder how much extra compuational cost this method has introduced compared to normal plug-and-play baselines? For instance, how much longer (in terms of training time or iterations) does GAS-DRO take compared to standard
- The core idea of formulating the ambiguity set over the parameters of a generative model and, crucially, using the *reconstruction loss* $J(\theta, P_0) \le \epsilon$ as the constraint (Eq. 7) is a highly novel and elegant contribution. It reframes the problem from an intractable search over distributions to a tractable optimization over model parameters. - The method provides a compelling solution to the fundamental tension between $\phi$-divergence DRO (no support shift) and Wasserstein-D
- The theoretical link (Lemma 1) between the proposed constraint $J(\theta, P_0) \le \epsilon$ and traditional divergence-based ambiguity sets is a one-way bound ($J \le \epsilon \implies D_{KL}(P_0 || P_\theta)$ is bounded). This ensures distributions in GAS are "consistent," but it's not clear if the set is sufficiently expressive. It's possible that a "worst-case" distribution $P^*$ that is close to $P_0$ (in a standard metric) might not be representable by any $P_\theta$ in the GAS, even wit
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Portfolio Optimization · Stochastic Gradient Optimization Techniques · Advanced Multi-Objective Optimization Algorithms
