Stochastic Optimization of Sorting Networks via Continuous Relaxations
Aditya Grover, Eric Wang, Aaron Zweig, Stefano Ermon

TL;DR
This paper introduces NeuralSort, a continuous relaxation of sorting operators that enables gradient-based optimization of permutation-based problems in machine learning.
Contribution
NeuralSort provides a differentiable relaxation of sorting, allowing end-to-end training and stochastic optimization over permutations using reparameterized gradients.
Findings
Enables gradient-based optimization of sorting operations.
Applies to learning semantic orderings of high-dimensional objects.
Extends k-nearest neighbors with differentiable sorting.
Abstract
Sorting input objects is an important step in many machine learning pipelines. However, the sorting operator is non-differentiable with respect to its inputs, which prohibits end-to-end gradient-based optimization. In this work, we propose NeuralSort, a general-purpose continuous relaxation of the output of the sorting operator from permutation matrices to the set of unimodal row-stochastic matrices, where every row sums to one and has a distinct arg max. This relaxation permits straight-through optimization of any computational graph involve a sorting operation. Further, we use this relaxation to enable gradient-based stochastic optimization over the combinatorially large space of permutations by deriving a reparameterized gradient estimator for the Plackett-Luce family of distributions over permutations. We demonstrate the usefulness of our framework on three tasks that require…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Bayesian Methods and Mixture Models · Machine Learning and Algorithms
