Online Newton Step Algorithm with Estimated Gradient
Binbin Liu, Jundong Li, Yunquan Song, Xijun Liang, Ling Jian, Huan, Liu

TL;DR
This paper introduces ONSEG, a second-order online learning algorithm that leverages estimated gradients to achieve faster convergence in bandit feedback scenarios, improving regret bounds over previous methods.
Contribution
The paper develops ONSEG, extending the Online Newton Step algorithm with expected gradient estimation, reducing regret bounds from 0505/615 to 0502/315, and demonstrating empirical advantages.
Findings
ONSEG reduces expected regret from 0505/615 to 0502/315.
The algorithm outperforms existing methods on multiple real-world datasets.
Second-order information accelerates convergence in bandit online learning.
Abstract
Online learning with limited information feedback (bandit) tries to solve the problem where an online learner receives partial feedback information from the environment in the course of learning. Under this setting, Flaxman et al.[8] extended Zinkevich's classical Online Gradient Descent (OGD) algorithm [29] by proposing the Online Gradient Descent with Expected Gradient (OGDEG) algorithm. Specifically, it uses a simple trick to approximate the gradient of the loss function by evaluating it at a single point and bounds the expected regret as [8], where the number of rounds is . Meanwhile, past research efforts have shown that compared with the first-order algorithms, second-order online learning algorithms such as Online Newton Step (ONS) [11] can significantly accelerate the convergence rate of traditional online learning algorithms. Motivated by this,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Advanced Optimization Algorithms Research · Metaheuristic Optimization Algorithms Research
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
