Reinforcement learning from comparisons: Three alternatives is enough, two is not
Benoit Laslier, Jean-Francois Laslier

TL;DR
This paper investigates a reinforcement learning approach based on pairwise comparisons, demonstrating that reinforcing the best among three options converges to the optimal solution, unlike simpler two-option methods which may cycle.
Contribution
It introduces a reinforcement urn model that guarantees convergence with three alternatives, highlighting the importance of considering three options rather than two in non-transitive comparison settings.
Findings
Reinforcing the best among three options converges to the optimal solution.
Reinforcing only pairwise winners may lead to cycling and lack of convergence.
The model provides a theoretical foundation for multi-alternative reinforcement learning.
Abstract
The paper deals with the problem of finding the best alternatives on the basis of pairwise comparisons when these comparisons need not be transitive. In this setting, we study a reinforcement urn model. We prove convergence to the optimal solution when reinforcement of a winning alternative occurs each time after considering three random alternatives. The simpler process, which reinforces the winner of a random pair does not always converges: it may cycle.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBehavioral and Psychological Studies · Evolutionary Algorithms and Applications · Reinforcement Learning in Robotics
