Discounted Online Convex Optimization: Uniform Regret Across a Continuous Interval

Wenhao Yang; Sifan Yang; Lijun Zhang

arXiv:2505.19491·cs.LG·May 27, 2025

Discounted Online Convex Optimization: Uniform Regret Across a Continuous Interval

Wenhao Yang, Sifan Yang, Lijun Zhang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a new online convex optimization algorithm that adaptively achieves uniform discounted regret across a continuous range of discount factors without prior knowledge of the discount parameter.

Contribution

The paper proposes a novel analysis showing that smoothed OGD combined with Discounted-Normal-Predictor achieves uniform regret bounds over all discount factors, adapting to unknown environments.

Findings

01

Achieves $O( oot{rac{ ext{log} T}{1- ext{lambda}}})$ discounted regret uniformly for all discount factors.

02

Uses multiple OGD instances with aggregation via Discounted-Normal-Predictor.

03

Demonstrates effective combination of decisions from experts with different discount factors.

Abstract

Reflecting the greater significance of recent history over the distant past in non-stationary environments, $λ$ -discounted regret has been introduced in online convex optimization (OCO) to gracefully forget past data as new information arrives. When the discount factor $λ$ is given, online gradient descent with an appropriate step size achieves an $O (1/ 1 - λ)$ discounted regret. However, the value of $λ$ is often not predetermined in real-world scenarios. This gives rise to a significant open question: is it possible to develop a discounted algorithm that adapts to an unknown discount factor. In this paper, we affirmatively answer this question by providing a novel analysis to demonstrate that smoothed OGD (SOGD) achieves a uniform $O (lo g T /1 - λ)$ discounted regret, holding for all values of $λ$ across a continuous interval simultaneously.…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

Addresses a clear open problem in discounted OCO, by providing a discounted regret bound for uniform range of lambda. The learner does not know lambda a priori. The technical challenges of the problem were clearly presented. The prior aggregation framework needed to operate under a uniform performance metric, whereas in this paper each expert has a different metric. Provide a step by step derivation on the motivation behind the algorithm design which is insightful. Provides rigorous theore

Weaknesses

- I encourage the authors to provide **more concrete** motivations on the relevance of the problem of unknown lambda, why it is practically relevant other than the mere theoretic interest. - As mentioned in the intro, part of the motivation is that the user’s preference might change, indicating a time-varying lambda. Can the paper be extended to the time-varying lambda case? i.e. the learner has some feedback signal that is indicative of lambda, and can adapt itself to optimize the regret with

Reviewer 02Rating 4Confidence 3

Strengths

Originality: The paper creatively applies the DNP algorithm to the discounted optimization problem and remove the need to know the discount factor. Quality: The submission seems technically correct. Experiments are a plus. The algorithms and the experimental settings appear reproducible. Clarity: The submission is clear in general. Significance: Theoretical novel findings in the form of discounted regret results for unknown discount factors are obtained.

Weaknesses

I am leaning towards rejection. Below are the reasons. Utilizing DNP-cu seems to help in arriving to a clean result, but it seems any combiner could have worked. I am not sure if the issue of different discounted performance measures as explained in Figure 1 is as great as advertised. Since the combined $\lambda$ values have a difference of $1/T$, the regret redundancy propagating due to mismatches seems to be finite at each combination node, similar to DNP-cu. Aside from that, exponentially g

Reviewer 03Rating 8Confidence 4

Strengths

* The paper studies an important open problem: that of achieving discounted regret bounds with an unknown $ \lambda $. The paper does a decent job of motivating that in some settings $ \lambda $ is truly unknown and is not just a tunable hyperparameter. * In contrast to typical aggregation mechanism in meta algorithms with multiple instances of an online gradient descent or experts algorithm, this work relies on the less well-known mechanism of discounted normal predictor with conservative upda

Weaknesses

* There is a related line of work on online convex optimization with unbounded memory (Kumar, Dean, Kleinberg, NeurIPS 2023). One special case is $\rho$-discounted infinite memory, where the loss in each round depends on the entire history of decisions and each past decision is weighted by a geometric factor of $\rho$. This paper does not discuss similarities and differences from this line of work. * Algorithm 3 is hard to read - I had to keep jumping around to look at the algorithm and at the

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Smart Parking Systems Research