DADA: Dual Averaging with Distance Adaptation

Mohammad Moshtaghifar; Anton Rodomanov; Daniil Vankov; Sebastian Stich

arXiv:2501.10258·math.OC·April 22, 2026

DADA: Dual Averaging with Distance Adaptation

Mohammad Moshtaghifar, Anton Rodomanov, Daniil Vankov, Sebastian Stich

PDF

1 Video

TL;DR

DADA is a universal gradient method that adaptively adjusts coefficients based on observed gradients and distances, effectively solving various convex optimization problems without prior parameter knowledge.

Contribution

Introduces DADA, a novel dual averaging algorithm that dynamically adapts to problem structure, eliminating the need for problem-specific parameters and prior knowledge.

Findings

01

Works for a broad spectrum of convex problems including nonsmooth and smooth functions.

02

Applicable to unconstrained and constrained problems without prior iteration or accuracy info.

03

Eliminates the need for problem-specific parameter tuning.

Abstract

We present a novel universal gradient method for solving convex optimization problems. Our algorithm, Dual Averaging with Distance Adaptation (DADA), is based on the classical scheme of dual averaging and dynamically adjusts its coefficients based on observed gradients and the distance between iterates and the starting point, eliminating the need for problem-specific parameters. DADA is a universal algorithm that simultaneously works for a broad spectrum of problem classes, provided the local growth of the objective function around its minimizer can be bounded. Particular examples of such problem classes are nonsmooth Lipschitz functions, Lipschitz-smooth functions, H\"older-smooth functions, functions with high-order Lipschitz derivative, quasi-self-concordant functions, and $(L_{0}, L_{1})$ -smooth functions. Crucially, DADA is applicable to both unconstrained and constrained problems, even…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DADA: Dual Averaging with Distance Adaptation· slideslive