Reinforcement learning from comparisons: Three alternatives is enough,   two is not

Benoit Laslier; Jean-Francois Laslier

arXiv:1301.5734·math.OC·January 25, 2013·2 cites

Reinforcement learning from comparisons: Three alternatives is enough, two is not

Benoit Laslier, Jean-Francois Laslier

PDF

Open Access

TL;DR

This paper investigates a reinforcement learning approach based on pairwise comparisons, demonstrating that reinforcing the best among three options converges to the optimal solution, unlike simpler two-option methods which may cycle.

Contribution

It introduces a reinforcement urn model that guarantees convergence with three alternatives, highlighting the importance of considering three options rather than two in non-transitive comparison settings.

Findings

01

Reinforcing the best among three options converges to the optimal solution.

02

Reinforcing only pairwise winners may lead to cycling and lack of convergence.

03

The model provides a theoretical foundation for multi-alternative reinforcement learning.

Abstract

The paper deals with the problem of finding the best alternatives on the basis of pairwise comparisons when these comparisons need not be transitive. In this setting, we study a reinforcement urn model. We prove convergence to the optimal solution when reinforcement of a winning alternative occurs each time after considering three random alternatives. The simpler process, which reinforces the winner of a random pair does not always converges: it may cycle.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBehavioral and Psychological Studies · Evolutionary Algorithms and Applications · Reinforcement Learning in Robotics