Learning Permutations with Sinkhorn Policy Gradient

Patrick Emami; Sanjay Ranka

arXiv:1805.07010·cs.LG·May 21, 2018·40 cites

Learning Permutations with Sinkhorn Policy Gradient

Patrick Emami, Sanjay Ranka

PDF

Open Access 1 Repo

TL;DR

The paper introduces Sinkhorn Policy Gradient (SPG), a novel method for learning permutation policies using a differentiable Sinkhorn layer, enabling end-to-end training for tasks like sorting and TSP with improved data efficiency.

Contribution

The paper proposes the SPG algorithm with a new actor-critic architecture that decouples state representation from permutation actions using a Sinkhorn layer, advancing permutation learning methods.

Findings

01

SPG performs competitively on sorting, TSP, and matching tasks.

02

SPG is more data-efficient than baseline methods on matching tasks.

03

The Sinkhorn layer enables end-to-end training of permutation policies.

Abstract

Many problems at the intersection of combinatorics and computer science require solving for a permutation that optimally matches, ranks, or sorts some data. These problems usually have a task-specific, often non-differentiable objective function that data-driven algorithms can use as a learning signal. In this paper, we propose the Sinkhorn Policy Gradient (SPG) algorithm for learning policies on permutation matrices. The actor-critic neural network architecture we introduce for SPG uniquely decouples representation learning of the state space from the highly-structured action space of permutations with a temperature-controlled Sinkhorn layer. The Sinkhorn layer produces continuous relaxations of permutation matrices so that the actor-critic architecture can be trained end-to-end. Our empirical results show that agents trained with SPG can perform competitively on sorting, the Euclidean…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pemami4911/sinkhorn-policy-gradient.pytorch
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics