Learning Permutations with Sinkhorn Policy Gradient
Patrick Emami, Sanjay Ranka

TL;DR
The paper introduces Sinkhorn Policy Gradient (SPG), a novel method for learning permutation policies using a differentiable Sinkhorn layer, enabling end-to-end training for tasks like sorting and TSP with improved data efficiency.
Contribution
The paper proposes the SPG algorithm with a new actor-critic architecture that decouples state representation from permutation actions using a Sinkhorn layer, advancing permutation learning methods.
Findings
SPG performs competitively on sorting, TSP, and matching tasks.
SPG is more data-efficient than baseline methods on matching tasks.
The Sinkhorn layer enables end-to-end training of permutation policies.
Abstract
Many problems at the intersection of combinatorics and computer science require solving for a permutation that optimally matches, ranks, or sorts some data. These problems usually have a task-specific, often non-differentiable objective function that data-driven algorithms can use as a learning signal. In this paper, we propose the Sinkhorn Policy Gradient (SPG) algorithm for learning policies on permutation matrices. The actor-critic neural network architecture we introduce for SPG uniquely decouples representation learning of the state space from the highly-structured action space of permutations with a temperature-controlled Sinkhorn layer. The Sinkhorn layer produces continuous relaxations of permutation matrices so that the actor-critic architecture can be trained end-to-end. Our empirical results show that agents trained with SPG can perform competitively on sorting, the Euclidean…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics
