Hybridizing Non-dominated Sorting Algorithms: Divide-and-Conquer Meets Best Order Sort
Margarita Markina, Maxim Buzdalov

TL;DR
This paper introduces a hybrid non-dominated sorting algorithm that combines divide-and-conquer with Best Order Sort, achieving improved practical performance especially on large datasets with many points and objectives.
Contribution
The paper proposes a novel hybrid algorithm that merges asymptotically efficient divide-and-conquer with a practical quadratic method for non-dominated sorting.
Findings
Hybrid algorithm outperforms original algorithms by at least 20% on large datasets.
Speedup reaches up to four times for small numbers of objectives.
Hybrid maintains comparable performance on small problem instances.
Abstract
Many production-grade algorithms benefit from combining an asymptotically efficient algorithm for solving big problem instances, by splitting them into smaller ones, and an asymptotically inefficient algorithm with a very small implementation constant for solving small subproblems. A well-known example is stable sorting, where mergesort is often combined with insertion sort to achieve a constant but noticeable speed-up. We apply this idea to non-dominated sorting. Namely, we combine the divide-and-conquer algorithm, which has the currently best known asymptotic runtime of , with the Best Order Sort algorithm, which has the runtime of but demonstrates the best practical performance out of quadratic algorithms. Empirical evaluation shows that the hybrid's running time is typically not worse than of both original algorithms, while for large numbers of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Complexity and Algorithms in Graphs · Machine Learning and Algorithms
Hybridizing Non-dominated Sorting Algorithms: Divide-and-Conquer Meets
Best Order Sort
Margarita Markina
Maxim Buzdalov
Abstract
Many production-grade algorithms benefit from combining an asymptotically efficient algorithm for solving big problem instances, by splitting them into smaller ones, and an asymptotically inefficient algorithm with a very small implementation constant for solving small subproblems. A well-known example is stable sorting, where mergesort is often combined with insertion sort to achieve a constant but noticeable speed-up.
We apply this idea to non-dominated sorting. Namely, we combine the divide-and-conquer algorithm, which has the currently best known asymptotic runtime of , with the Best Order Sort algorithm, which has the runtime of but demonstrates the best practical performance out of quadratic algorithms.
Empirical evaluation shows that the hybrid’s running time is typically not worse than of both original algorithms, while for large numbers of points it outperforms them by at least 20%. For smaller numbers of objectives, the speedup can be as large as four times.
1 Introduction
Many real-world optimization problems are multiobjective, that is, they require maximizing or minimizing several objectives, which are often conflicting. These problems most often do not have a single solution, but instead feature many incomparable solutions, which trade one objective for another. It is often not known a priori which solution will be chosen, as decisions of this sort are often recommended to be made late, as the decision maker can learn more about the problem [1]. This encourages finding a set of diverse incomparable solutions, which is a problem often approached by multiobjective evolutionary algorithms.
In the realm of scaling-independent preference-less, and thus general-purpose, evolutionary multiobjective algorithms, three paradigms currently seem to prevail [1]: Pareto-based, indicator-based, and decomposition-based approaches. Although there exist well-known decomposition-based [22] and indicator-based [27, 25, 28] algorithms, the majority of modern algorithms are Pareto-based [6, 5, 4, 26].
Most Pareto-based algorithms belong to one of big groups according to how solutions are selected or ranked: the algorithms which maintain non-dominated solutions [4, 13, 3], perform non-dominated sorting [6, 5, 7], use domination count [9], or domination strength [26]. In this research we concentrate on non-dominated sorting, as some popular algorithms make use of it [6, 5].
Non-dominated sorting assigns ranks to solutions in the following way: the non-dominated solutions get rank 0, and the solutions which are dominated only by solutions of rank at most get rank . In the original work [20], this procedure was performed in , where is the population size and is the number of objectives. This was later improved to be in [6].
As the quadratic complexity is still quite large, both from theoretical and practical points of view, many researchers concentrated on improving practical running times [23, 24, 8, 11, 19, 16, 21], however, without improving the worst-case complexity. Jensen was the first to adapt the earlier result of Kung et at. [14], who solved the problem of finding non-dominated solutions in , to non-dominated sorting. This algorithm has the worst-case complexity of . However, this algorithm could not handle coinciding objective values, which was later corrected in subsequent works [10, 2]. A more efficient algorithm for non-dominated sorting, or finding layers of maxima, exists for three dimensions [17], whose complexity is with the use of randomized data structures, or for deterministic ones. However, whether this algorithm is useful in practice is still an open question.
A large number of available algorithms for non-dominated sorting opens the question of algorithm selection [18]. What is more, a family of algorithms for non-dominated sorting resembles a family of quadratic algorithms for comparison based sorting, and the non-dominated sorting algorithms seem to take up the niche of sorting algorithms (such as mergesort, heapsort, and randomized versions of quicksort).
For comparison-based sorting, the quadratic algorithms are often much simpler and demonstrate better performance on small data, while asymptotically better algorithms take over starting from certain problem sizes. If the latter algorithm is built using a divide-and-conquer scheme, it becomes possible to choose better algorithms for subproblems: if a subproblem, due to its size, can be solved faster using a quadratic algorithm, then it should be done, otherwise let the divide-and-conquer algorithm decompose the problem further. For example, most stable sorting algorithms from standard libraries are currently implemented using mergesort or TimSort, while for data fragments smaller than, for example, 32 in the current implementation of sorting in Java111http://grepcode.com/file_/repository.grepcode.com/java/root/jdk/openjdk/8-b132/java/util/TimSort.java, the quadratic insertion sort algorithm, with the binary search lookup, is used.
This inspired us to apply the similar idea to non-dominated sorting. For the “outer” divide-and-conquer algorithm, we use the only available algorithm family of this sort [12, 10, 2]. For the quadratic algorithm to solve smaller subproblems, we adapt the Best Order Sort [19], as it was shown to typically outperform other quadratic algorithms. Our result is a hybrid algorithm which uses primarily the divide-and-conquer strategy and decides when to switch to Best Order Sort using a formula which depends on the number of points in the subproblem and the number of remaining objectives to consider.
This is a full version of the paper with the same name which was accepted as a poster to the GECCO conference in 2017.
The rest of the paper is structured as follows. In Section 2, we give the necessary definitions and describe the algorithms we put together: the divide-and-conquer algorithm and Best Order Sort. Section 3 describes our hybridizing approach, which includes the changes necessary to introduce to Best Order Sort to serve as the subproblem solver, and the analysis of preliminary experiments which established the formula used to switch between the algorithms. Section 4 gives the main body of our experimental studies, including their analysis. Section 5 concludes.
2 Preliminaries
In the following, we assume that all points are different, which enables us to name any unordered collection of points a set. This is not true in general, however, all equal points will receive the same rank, so implementations are free, depending on their need, to either discard a point if there is an equal one, or to keep all equal points in a same entity and run algorithms on these entities instead, or to work directly with equal points with some additional algorithmic care. None of these precautions change the worst-case algorithmic complexity.
2.1 Definitions
We use capital Latin letters to denote sets of points, as well as the global constants (the number of points) and (the number of objectives), while small Latin letters are used for single points, standalone objectives and rank values, and small Greek letters are used for mappings. The value of the -th objective of a point is denoted as .
In the rest of the paper we assume, without losing generality, that we solve a multiobjective minimization problem with the number of objectives equal to . In this case, the Pareto dominance relation is determined on two points in the objective space as follows:
[TABLE]
where is called the strict dominance and is the weak dominance.
Non-dominated sorting is a procedure which, for a given set of points in the -dimensional objective space, assigns each point an integer rank , such that:
[TABLE]
In other words, a rank of a point which is not dominated by any other point is zero, and a rank of any other point is one plus the maximum rank among the points which dominate it.
Following the convention from [15], we call the set of all points with the given rank a non-domination level :
[TABLE]
2.2 The Divide-and-Conquer Approach
The divide-and-conquer approach dates back to 1975, when Kung et al. proposed a multidimensional divide-and-conquer algorithm for finding the maxima of a set of vectors [14], which, in the realm of evolutionary computation, corresponds to the set of non-dominated points, or to points with rank zero. The complexity of this algorithm is , which we shorten to for clarity.
This algorithm can be used to implement non-dominated sorting in the following manner: first we determine the points with rank zero, then we remove these points and run the algorithm again on the remaining points (which yields points with rank one), then we repeat it until no points left. However, the worst-case complexity of this approach is . In contrast, fast non-dominated sorting, shipped with the original NSGA-II of Deb et at. [6], has a better complexity.
The divide-and-conquer approach has been generalized to perform non-dominated sorting by Jensen [12], shortly afterwards the NSGA-II arrived. The algorithm from [12] solves the problem in , which is much faster for small values of , as well as for large values of , than fast non-dominated sorting. However, this algorithm was designed with an assumption that no two points have equal objectives, which is often not the case, especially in discrete optimization, and is known to produce wrong results when this assumption is violated. This problem was overcome by Fortin et al. [10], who proposed modifications of this algorithm to always produce correct results. The average complexity was proven to be the same, but the worst-case complexity was left at . Finally, Buzdalov et al. [2] introduced further modifications to achieve the worst-case time complexity of .
We shall now briefly illustrate the working principles of this approach. At any moment of time, the algorithm maintains, for every point , a lower bound on its rank , which are initially set to zero. The reason for this lower bound can be explained as follows: at any moment of time, we have performed a subset of necessary objective comparisons, which impose approximations of ranks of the affected points. These approximations are of course lower bounds of the real ranks.
To ease the notation, in the following we do not use the term “lower bound of the rank”, as well as the symbol. Instead, we will say “current rank” for the current state of the lower bound of a certain point, which possibly coincides with the real rank, and “final rank” when we know that the lower bound coincides with the real rank.
One of the main properties of the algorithm is that whenever a comparison of and is performed for the first time, where and are points and is the objective, then the following holds:
- •
for all objectives such that , it holds that , that is, weakly dominates in objectives ;
- •
the rank of is known and final, that is, all comparisons necessary to determine the rank of have already been done.
The top-level concept is the procedure , which takes a set of points sorted lexicographically (where non-zero lower bounds are possibly known for some of the points from ) and makes sure all necessary comparisons between the objectives of these points are performed. This procedure is called only when all necessary comparisons of points and , such that and , have already been performed. To perform non-dominated sorting of a set with objectives, one should run .
For , it calls a sweep line based algorithm , which runs in , which we will cover later. If there are at most two points in , it performs their direct comparisons and updates the rank of the second point if necessary. If all values of the objective are the same in the entire , it directly calls . Otherwise, it divides into three parts using the objective : the part with lower values, the part with median values, and the part with higher values.
It is clear that ranks of points in do not depend on ranks of points in neither nor , and also does not depend on . The algorithm first calls , which results in finding the exact ranks in , because all necessary comparisons with points from on the right side and other points on the left side have been performed before this call.
Next comes the set , but the ranks of these points still need to be updated using the set (and nothing more). To do this, the algorithm calls another procedure, , whose meaning is to update the ranks of points from the second argument using the first argument and objectives in . Then it calls , as all other necessary comparisons have been done, and all values for the objective are equal in . It then proceeds with and finishes with .
The procedure, as follows from the short description above, shall perform all the necessary comparisons between points on the left and on the right, provided that in objectives it holds that , and all ranks in are final. For , it, again, runs a sweep line procedure . If or , a straightforward pairwise comparison is performed. If the maximum value of the objective in does not exceed the minimum value in , it calls . Otherwise, it chooses a median of the objective in and then, similarly to HelperA, splits into , and , and also splits into , and . Following the same logic as in HelperA, it performs the following recursive calls:
- •
;
- •
;
- •
;
- •
;
- •
.
The remaining parts to explain are and . The SweepA procedure utilizes a sweep line approach. Points from the set are processed in lexicographical order using first two objectives. In the same time, the procedure maintains a binary search tree which contains the last seen representative points for each non-domination level. When the next point is processed, this binary search tree is traversed to determine the biggest number of the level which still dominates the point in question, and then the rank of this point is updated correspondingly. After that, this point is inserted in the tree: it becomes the last representative of its non-domination level and possibly throws out some of the other representatives, which have no more chance to determine rank of any point on their own. An example is shown in Fig. 1.
The SweepB procedure works in a similar way. The sweep line goes over the union of sets, , however, the tree is built of the points from only, and rank updates are performed with points from only.
The running times of SweepA and SweepB are and , respectively. From the well-known theory of solving recursive relations, and from the strategies of creating subproblems, it follows that the running time of is , and of it is .
2.3 Best Order Sort
The Best Order Sort algorithm was proposed in [19]. It aims at removing as many comparisons to be performed as possible. To do this, it sorts all points by all objectives, thus constructing sorted lists of points , and processes the points in the following order: first, all first points in the lists (), then all second points (), then all third points, et cetera, until every point is processed at least once.
When a point is processed for the first time, assume it happens in the list of the -th objective, its rank has to be determined. The key fact is that only the points which precede in can dominate , because all other points have a greater value of the -th objective. Thus, it makes sense to compare with the points that precede it in .
To further decrease the number of comparisons, it is worth noting that, when a certain point is processed in objective , all subsequent new points, that is the points which will be processed for the first time, will have a value of the -th objective which is not smaller than the one of . This means that the objective can be safely removed from the list of objectives to test when some other point is checked for being dominated by .
The algorithm maintains a set of objectives to consider for every point . Initially, . Whenever a point is processed in the list of the -th objective, it is removed from . Whenever a point is checked for being dominated by , only the objectives from need to be considered.
Finally, to determine the rank using fewer comparisons, the points, which have been already considered in each objective list and have been assigned ranks, are stored in separate lists, where each list corresponds to a rank. To determine the rank of the next point, one can perform either a linear scan (starting with rank zero and increasing ranks by one) or binary search for the rank. As the number of points in rank lists cannot be non-trivially bounded, both ways have the worst-case complexity of a single search of , where is the number of points in all lists.
Best Order Sort features two phases: the pre-sorting phase, which takes , and the domination scanning phase. The complexity of the latter, in the worst case, is , but can be smaller under various conditions. For instance, when all points are non-dominating, the points have a chance to arrange such that the first processed points are unique, which means that every such point is tested against points in average, which results in running time.
3 Hybridizing the Algorithms
Our hybridization scheme is similar to that of production-grade sorting algorithms tuned for performance. As the top-level algorithm, we use the divide-and-conquer algorithm. For each subproblem it decides, using certain heuristic, whether to continue using the divide-and-conquer strategy or to run Best Order Sort for this subproblem. In turn, Best Order Sort runs uninterrupted until it solves the assigned subproblem.
Two problems need to be solved for this scheme to work. First, the original Best Order Sort algorithm cannot be straightforwardly applied to solve subproblems, because subproblems may feature non-zero lower bounds for ranks of some points, which appear from comparisons of these points with other points, which are out of the scope of the current subproblem. It also does not support working with two point sets in order to serve as a back-end of HelperB.
Second, the particular kind of heuristic to determine when to run Best Order Sort is unclear. The main problem with it is that it should have a low computation complexity: at most , because otherwise evaluation of this heuristic worsens the complexity of the divide-and-conquer algorithm. This means we cannot perform any complicated analysis, such as, for instance, principal component analysis, to predict which algorithm is best.
In this section we address these two problems, which determines the shape of our hybridization approach.
3.1 Adaptation of Best Order Sort
When working as a part of the divide-and-conquer algorithm, Best Order Sort can be called instead either HelperA or HelperB. In the first case, it needs to assign final ranks to a set of points using first objectives (, as SweepA, due to its simplicity, works faster than Best Order Sort under any conditions), provided that all other necessary comparisons have been already performed, and consequently every point has a current rank , which is a lower bound of its real rank. The only difference to the original Best Order Sort is that some can be non-zero. This is easily compensated by checking only rank lists with ranks greater than or equal to , and thus updating the rank only if the update is increasing.
The HelperB case is slightly more involved. If Best Order Sort is called within , then ranks of points from are already known, and it is necessary to perform comparisons between points from and points from to update the current ranks of points using first objectives. In this case, all points are merged and are processed altogether. However, for points from the rank is not updated (that is, the rank lists are never checked), instead they go directly to the corresponding rank lists. On the contrary, the rank update procedure is executed on points from , but they are never added to rank lists.
These changes are quite small, so the correctness of Best Order Search in the changed conditions follows straightforwardly from the correctness of the original algorithm [19]. The worst-case complexity of the HelperB case is .
3.2 Design of the Switch Heuristic
To understand the possible kind of the heuristic algorithm to use for deciding whether to use Best Order Sort for a certain subproblem, we conducted a series of preliminary experiments. In these experiments, we considered a series of datasets, where every dataset had points with objectives and was generated either by uniformly random objective sampling (from the hypercube) or by sampling from a hyperplane (which yields a dataset with exactly one non-domination level). Then we ran the divide-and-conquer algorithm on each of these datasets and recorded all subproblems created during the run. After that, we measured the running times of both the divide-and-conquer algorithm and Best Order Sort on all these subproblems.
Fig. 2 shows an example of such experiment. In this figure, the point above the abscissa axis means that for the corresponding subproblem the divide-and-conquer algorithm took less time than Best Order Sort, while a point below zero means the opposite. One can clearly see in Fig. 2 that Best Order Sort behaves best, compared to the divide-and-conquer algorithm, for which are not too small and not too large.
As the similar effect has been noticed for all other datasets as well, we attempted to deduce formulas for the left and right bounds of the higher efficiency range of Best Order Sort. The following empirically constructed formulas were found to fit our data rather well: and , where is the current number of first objectives to consider, is the left bound of the range, and is the right bound. Fig. 3 shows the plot of the left bound formula and the actual left bounds in datasets with three non-domination levels, Fig. 4 does the same for the right bound formula and datasets with twenty levels. The fitting quality is the same for all other considered datasets.
As a result, the hybrid algorithm switches to Best Order Sort whenever the number of points and the number of considered objectives in the current subproblem satisfy:
[TABLE]
4 Experiments
The main part of experiments was organized as follows. For every combination of:
- •
numbers of points where ;
- •
numbers of objectives ;
- •
numbers of non-domination levels ;
ten datasets were created, and running times of all considered algorithms (the divide-and-conquer algorithm, Best Order Sort, and the hybrid algorithm) were measured.
The results are presented on pages Hybridizing Non-dominated Sorting Algorithms: Divide-and-Conquer Meets Best Order Sort–Hybridizing Non-dominated Sorting Algorithms: Divide-and-Conquer Meets Best Order Sort as bar plots for all and . Each bar plot features a section corresponding to the value of , consisting of the following three bars: , , , where is the running time of Best Order Sort, is the running time of the divide-and-conquer algorithm, and is the running time of the proposed hybrid algorithm. The bars for Best Order Sort are blue, and the bars for the hybrid algorithm are brown. Every bar has an average, minimum and maximum value (for the second bar plotting , the average is always one). Whenever a bar’s average is greater than one, that is, it points up, it means that the corresponding algorithm is slower than divide-and-conquer, and if it is faster, then the bar points down.
From plots on pages Hybridizing Non-dominated Sorting Algorithms: Divide-and-Conquer Meets Best Order Sort–Hybridizing Non-dominated Sorting Algorithms: Divide-and-Conquer Meets Best Order Sort, one can immediately spot the characteristic behavior of Best Order Sort: it is typically better at smaller numbers of points, then it gradually becomes worse (for and , this tendency is seen the best). For somwhat higher dimensions ( and ), the lower bound of the Best Order Sort efficiency interval can be seen. For the highest considered dimension, , Best Order Sort demonstrates no significant improvement over the divide-and-conquer algorithm.
The hybrid algorithm tends to perform at least as good as the best of the two algorithms up to . Starting from , it features a somewhat suboptimal performance at the middle problem sizes while still capturing the best behavior at small sizes and getting better than all other algorithms close to .
In fact, the hybrid is always better than its parts for big numbers of points. For , the average speedup compared to the best of the parts can be as large as when , and never seen to get less than in all other considered datasets.
5 Conclusion
We presented a hybrid algorithm for non-dominated sorting which initially runs a divide-and-conquer algorithm, however, when the size of a certain subproblem seems to be suitable, it solves this subproblem using another approach, Best Order Sort. For this to work, we slightly adapted Best Order Sort, so that it can perform non-dominated sorting in a more general setup, which needs to solve the divide-and-conquer subproblems. We also composed a heuristic rule for when to switch to Best Order Sort, which is based solely on the dimensions of a subproblem.
Our algorithm performs generally at least as well as its parts, except for certain ranges around the switchpoint between the algorithms at higher dimensions. This is an indicator that our heuristic on when to switch is not perfect yet and has a room for improvement. Nevertheless, for the wide range of testing data (3 to 30 objectives, 1 to 20 non-domination levels) our algorithm performs at least 20% better than the best of its parts for large numbers of points (such as ), and the speedup can be up to 4x for smaller . In a sense, this means that our hybridization scheme is rather robust.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D. Brockhoff and T. Wagner. Gecco 2016 tutorial on evolutionary multiobjective optimization. In Proceedings of Genetic and Evolutionary Computation Conference Companion , pages 201–227, 2016.
- 2[2] M. Buzdalov and A. Shalyto. A provably asymptotically fast version of the generalized Jensen algorithm for non-dominated sorting. In Parallel Problem Solving from Nature – PPSN XIII , number 8672 in Lecture Notes in Computer Science, pages 528–537. Springer, 2014.
- 3[3] C. Coello Coello and G. Toscano Pulido. A micro-genetic algorithm for multiobjective optimization. In Proceedings of International Conference on Evolutionary Multi-Criterion Optimization , number 1993 in Lecture Notes in Computer Science, pages 126–140. 2001.
- 4[4] D. W. Corne, N. R. Jerram, J. D. Knowles, and M. J. Oates. PESA-II: Region-based selection in evolutionary multiobjective optimization. In Proceedings of Genetic and Evolutionary Computation Conference , pages 283–290. Morgan Kaufmann Publishers, 2001.
- 5[5] K. Deb and H. Jain. An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: Solving problems with box constraints. IEEE Transactions on Evolutionary Computation , 18(4):577–601, 2013.
- 6[6] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multi-objective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation , 6(2):182–197, 2002.
- 7[7] M. Erickson, A. Mayer, and J. Horn. The niched Pareto genetic algorithm 2 applied to the design of groundwater remediation systems. In Proceedings of International Conference on Evolutionary Multi-Criterion Optimization , number 1993 in Lecture Notes in Computer Science, pages 681–695. 2001.
- 8[8] H. Fang, Q. Wang, Y.-C. Tu, and M. F. Horstemeyer. An efficient non-dominated sorting method for evolutionary algorithms. Evolutionary Computation , 16(3):355–384, 2008.
