Differentiable Sorting Networks for Scalable Sorting and Ranking Supervision
Felix Petersen, Christian Borgelt, Hilde Kuehne, Oliver Deussen

TL;DR
This paper introduces differentiable sorting networks that enable end-to-end neural network training using sorting supervision, addressing gradient issues and enabling scalable sorting of large sets.
Contribution
It proposes novel differentiable sorting network relaxations, particularly bitonic networks, that improve stability and scalability in sorting and ranking tasks.
Findings
Outperforms existing relaxations of sorting operations.
Enables stable training on input sets of up to 1024 elements.
Addresses vanishing gradients and blurring in large networks.
Abstract
Sorting and ranking supervision is a method for training neural networks end-to-end based on ordering constraints. That is, the ground truth order of sets of samples is known, while their absolute values remain unsupervised. For that, we propose differentiable sorting networks by relaxing their pairwise conditional swap operations. To address the problems of vanishing gradients and extensive blurring that arise with larger numbers of layers, we propose mapping activations to regions with moderate gradients. We consider odd-even as well as bitonic sorting networks, which outperform existing relaxations of the sorting operation. We show that bitonic sorting networks can achieve stable training on large input sets of up to 1024 elements.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Algorithms · Advanced Neural Network Applications
