Discounted Online Convex Optimization: Uniform Regret Across a Continuous Interval
Wenhao Yang, Sifan Yang, Lijun Zhang

TL;DR
This paper introduces a new online convex optimization algorithm that adaptively achieves uniform discounted regret across a continuous range of discount factors without prior knowledge of the discount parameter.
Contribution
The paper proposes a novel analysis showing that smoothed OGD combined with Discounted-Normal-Predictor achieves uniform regret bounds over all discount factors, adapting to unknown environments.
Findings
Achieves $O( oot{rac{ ext{log} T}{1- ext{lambda}}})$ discounted regret uniformly for all discount factors.
Uses multiple OGD instances with aggregation via Discounted-Normal-Predictor.
Demonstrates effective combination of decisions from experts with different discount factors.
Abstract
Reflecting the greater significance of recent history over the distant past in non-stationary environments, -discounted regret has been introduced in online convex optimization (OCO) to gracefully forget past data as new information arrives. When the discount factor is given, online gradient descent with an appropriate step size achieves an discounted regret. However, the value of is often not predetermined in real-world scenarios. This gives rise to a significant open question: is it possible to develop a discounted algorithm that adapts to an unknown discount factor. In this paper, we affirmatively answer this question by providing a novel analysis to demonstrate that smoothed OGD (SOGD) achieves a uniform discounted regret, holding for all values of across a continuous interval simultaneously.…
Peer Reviews
Decision·ICLR 2026 Poster
Addresses a clear open problem in discounted OCO, by providing a discounted regret bound for uniform range of lambda. The learner does not know lambda a priori. The technical challenges of the problem were clearly presented. The prior aggregation framework needed to operate under a uniform performance metric, whereas in this paper each expert has a different metric. Provide a step by step derivation on the motivation behind the algorithm design which is insightful. Provides rigorous theore
- I encourage the authors to provide **more concrete** motivations on the relevance of the problem of unknown lambda, why it is practically relevant other than the mere theoretic interest. - As mentioned in the intro, part of the motivation is that the user’s preference might change, indicating a time-varying lambda. Can the paper be extended to the time-varying lambda case? i.e. the learner has some feedback signal that is indicative of lambda, and can adapt itself to optimize the regret with
Originality: The paper creatively applies the DNP algorithm to the discounted optimization problem and remove the need to know the discount factor. Quality: The submission seems technically correct. Experiments are a plus. The algorithms and the experimental settings appear reproducible. Clarity: The submission is clear in general. Significance: Theoretical novel findings in the form of discounted regret results for unknown discount factors are obtained.
I am leaning towards rejection. Below are the reasons. Utilizing DNP-cu seems to help in arriving to a clean result, but it seems any combiner could have worked. I am not sure if the issue of different discounted performance measures as explained in Figure 1 is as great as advertised. Since the combined $\lambda$ values have a difference of $1/T$, the regret redundancy propagating due to mismatches seems to be finite at each combination node, similar to DNP-cu. Aside from that, exponentially g
* The paper studies an important open problem: that of achieving discounted regret bounds with an unknown $ \lambda $. The paper does a decent job of motivating that in some settings $ \lambda $ is truly unknown and is not just a tunable hyperparameter. * In contrast to typical aggregation mechanism in meta algorithms with multiple instances of an online gradient descent or experts algorithm, this work relies on the less well-known mechanism of discounted normal predictor with conservative upda
* There is a related line of work on online convex optimization with unbounded memory (Kumar, Dean, Kleinberg, NeurIPS 2023). One special case is $\rho$-discounted infinite memory, where the loss in each round depends on the entire history of decisions and each past decision is weighted by a geometric factor of $\rho$. This paper does not discuss similarities and differences from this line of work. * Algorithm 3 is hard to read - I had to keep jumping around to look at the algorithm and at the
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Smart Parking Systems Research
