Online Non-Additive Path Learning under Full and Partial Information
Corinna Cortes, Vitaly Kuznetsov, Mehryar Mohri, Holakou Rahmanian,, Manfred K. Warmuth

TL;DR
This paper introduces new online algorithms for non-additive path learning across various information settings, leveraging an automaton approach to adapt additive algorithms, with applications to ensemble structured prediction and non-additive gains.
Contribution
The paper proposes novel algorithms for non-additive path learning in full, semi-bandit, and bandit settings, using a context-dependent automaton to adapt existing additive algorithms.
Findings
Algorithms with favorable regret guarantees
Effective automaton-based approach for non-additive gains
Efficient implementation of EXP3 for non-additive rewards
Abstract
We study the problem of online path learning with non-additive gains, which is a central problem appearing in several applications, including ensemble structured prediction. We present new online algorithms for path learning with non-additive count-based gains for the three settings of full information, semi-bandit and full bandit with very favorable regret guarantees. A key component of our algorithms is the definition and computation of an intermediate context-dependent automaton that enables us to use existing algorithms designed for additive gains. We further apply our methods to the important application of ensemble structured prediction. Finally, beyond count-based gains, we give an efficient implementation of the EXP3 algorithm for the full bandit setting with an arbitrary (non-additive) gain.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
