From Sorting Algorithms to Scalable Kernels: Bayesian Optimization in High-Dimensional Permutation Spaces
Zikai Xie, Linjiang Chen

TL;DR
This paper introduces the Merge Kernel, a new scalable permutation representation for Bayesian Optimization that outperforms existing methods in high-dimensional permutation spaces, enabling efficient large-scale optimization tasks.
Contribution
The paper proposes the Merge Kernel, leveraging divide-and-conquer sorting algorithms to create scalable, efficient permutation representations for Bayesian Optimization in high dimensions.
Findings
Merge Kernel achieves lower complexity with no information loss.
It outperforms Mallows kernel in high-dimensional settings.
Demonstrates effectiveness on large-scale permutation benchmarks.
Abstract
Bayesian Optimization (BO) is a powerful tool for black-box optimization, but its application to high-dimensional permutation spaces is severely limited by the challenge of defining scalable representations. The current state-of-the-art BO approach for permutation spaces relies on an exhaustive pairwise comparison, inducing a dense representation that is impractical for large-scale permutations. To break this barrier, we introduce a novel framework for generating efficient permutation representations via kernel functions derived from sorting algorithms. Within this framework, the Mallows kernel can be viewed as a special instance derived from enumeration sort. Further, we introduce the \textbf{Merge Kernel} , which leverages the divide-and-conquer structure of merge sort to produce a compact, to achieve the lowest possible complexity with no information…
Peer Reviews
Decision·ICLR 2026 Poster
1. This paper addresses a critical challenge for Bayesian Optimization over permutations, showing potential in efficient permutation encoding and searching. 2. The idea is novel and interesting. The authors are the first to incorporate traditional sorting algorithms into permutation kernel design, forming a powerful and general framework. It provides a new paradigm for kernel functions over permutation spaces. 3. The motivation is clear and easy to follow. The proposed general framework interp
1. While the authors propose a kernel based on Merging Sort and achieve competitive performance, the generality of the selection of sorting algorithms can be discussed in more detail, especially for some stochastic sorting algorithms that contain non-fixed times of binary comparisons. In other words, since the authors make connections between the traditional sorting algorithms and the permutation kernel, readers may be interested in how different properties of sorting algorithms affect the perfo
The paper introduces what is to my knowledge a novel connection between sorting algorithms and permutation kernels. This connection is very interesting, original, and useful. The fact that the primary existing permutation falls out as a special case in this framework is very interesting and provides strong validity for thinking about permutation kernels this way. The paper is clear and well written.
The empirical evaluation is not completely satisfying. - Of five problems, only 2 shows significant differences between the methods. The conclusion would be that most of the time it doesn't matter what kernel you use? I recommend trying to expand the set of problems further to include more with clear differences in performance. The paper hypothesizes that the MergeKernel does better in high-dimensional spaces, but I did not find that very convincing based on a signal of just 2 of 5 problems havi
The method is very well-motivated and performs well on the high-dimensional setting, and stays competitive on the smaller dimensional tasks. I’m also happy the authors acknowledged the lack of right-invariance of the method, which doesn’t invalidate the embedding but at first blush looks a little surprising. Their discussion of sacrificing this property for the sake of a fast method is convincing.
I’m somewhat surprised the authors didn’t benchmark with other notions of permutation distance, for example they mention Spearman’s footrule on page 5, and there are other notions like the Cayley distance. The results would I think be a bit stronger if it was demonstrated that a new notion of distance between permutations is strictly necessary to get better performance in bayesian optimization. The lack of right invariance is still concerning, mainly as it makes it very difficult to interpret
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Algorithms and Data Compression · Machine Learning and Algorithms
