Constructing Clustering Transformations
Steffen Borgwardt, Charles Viss

TL;DR
This paper introduces methods for transforming one clustering into another using linear programming and network theory, providing a new metric for clustering differences and bounds on the related partition polytopes.
Contribution
It presents a novel approach for clustering transformation based on a clustering-difference graph and elementary moves, along with bounds on the circuit diameter of partition polytopes.
Findings
Developed a clustering-difference graph model for transformations.
Provided methods for decomposing transformations into elementary moves.
Established bounds on the circuit diameter of partition polytopes.
Abstract
Clustering is one of the fundamental tasks in data analytics and machine learning. In many situations, different clusterings of the same data set become relevant. For example, different algorithms for the same clustering task may return dramatically different solutions. We are interested in applications in which one clustering has to be transformed into another; e.g., when a gradual transition from an old solution to a new one is required. In this paper, we devise methods for constructing such a transition based on linear programming and network theory. We use a so-called clustering-difference graph to model the desired transformation and provide methods for decomposing the graph into a sequence of elementary moves that accomplishes the transformation. These moves are equivalent to the edge directions, or circuits, of the underlying partition polytopes. Therefore, in addition to a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
11institutetext: 11email: [email protected]; University of Colorado Denver 22institutetext: 22email: [email protected]; University of Colorado Denver
Constructing Clustering Transformations
Steffen Borgwardt 11
Charles Viss 22
Abstract
Clustering is one of the fundamental tasks in data analytics and machine learning. In many situations, different clusterings of the same data set become relevant. For example, different algorithms for the same clustering task may return dramatically different solutions. We are interested in applications in which one clustering has to be transformed into another; e.g., when a gradual transition from an old solution to a new one is required.
In this paper, we devise methods for constructing such a transition based on linear programming and network theory. We use a so-called clustering-difference graph to model the desired transformation and provide methods for decomposing the graph into a sequence of elementary moves that accomplishes the transformation. These moves are equivalent to the edge directions, or circuits, of the underlying partition polytopes. Therefore, in addition to a conceptually new metric for measuring the distance between clusterings, we provide new bounds on the circuit diameter of these partition polytopes.
Keywords: partitioning, clustering, polyhedra, circuits, diameter, linear programming
MSC: 52B05, 90C05, 90C08, 90C27
1 Introduction and Preliminaries
Clusterings of large data sets play an important role in data analytics, machine learning, and informed decision-making in general. In many applications, there exists a desired clustering corresponding to an optimal solution to an optimization problem. However, directly implementing such a solution can be challenging – instead, a gradual sequence of transitions which transforms an initial, sub-optimal solution into the improved clustering is desired.
Consider the example in land consolidation from [3, 4, 5]. In a Bavarian agricultural region, 471 lots are cultivated by 7 different farmers. The initial distribution of the ownership of the lots, depicted in Figure 1(a), is quite problematic – the scattered, small lots result in large transportation overhead and prohibit the use of heavy machinery. To address this, the authors worked with the Bavarian State to facilitate a voluntary land exchange among the farmers in which the boundaries of the lots would remain the same while the cultivation rights for the lots would be redistributed.
The combinatorial redistribution of lots can be modeled as a clustering problem: a set of data points (the lots) must be partitioned into clusters (the farmers) under the restriction that each farmer keeps his original total value of land. This restriction makes it provably hard to determine an optimal redistribution of the lots, but it is possible to compute an approximation of the global optimum based on linear programming over projections of transportation polytopes [4]. A computed solution in which the lots form large, connected sections of land is depicted in Figure 1(b).
However, such a radical redistribution of lots among the farmers (more than 70% of the lots change ownership from Figure 1(a) to 1(b)) cannot realistically take place all at once. Crop rotations, required machinery, and other processes for farming stability must be respected. Therefore, the farmers requested a “best” way to gradually implement the proposed changes over the course of several years. In this paper, we propose methods for constructing such a transformation between clusterings based on linear programming and network theory.
The need for the construction of an efficient gradual transition to a new, given clustering also arises in many other applications. For example, an insurance company may want to gradually transition their customers to a new clustering of premium classes. In other situations, there are snapshots of the same data set at different times and the goal is to devise a model that explains how the data gradually changed over time.
In general, we consider partitions of a data set into labeled clusters where each item is assigned to exactly one of the clusters. We call such a partition a -clustering of , or simply a clustering of when is clear from context. As is the case in most clustering applications, we assume that the number of items is significantly greater than the number of clusters; i.e., .
We additionally consider situations in which upper and lower bounds are given for the sizes of the clusters. Such bounds may arise directly from an application itself or may be introduced to guide clustering algorithms to return sufficiently balanced solutions. Specifically, let with be given. A bounded-size -clustering of with respect to and then satisfies for . This concept generalizes the fixed-size -clusterings from [2] in which each cluster contains a fixed number of items (i.e., a bounded-size -clustering with ). The classical transportation problem and the assignment problem, along with their related polytopes, are well-studied topics in optimization corresponding to fixed-size clusterings [1, 13, 21].
For a data set and cluster size bounds , the bounded-size partition polytope models the set of all bounded-size -clusterings of with respect to and [11]. Specifically, for and , let indicate whether or not cluster receives item in a -clustering of . Then is given by the following system of constraints:
[TABLE]
Since these constraints form a totally unimodular matrix, the right-hand side is integral, and each is implicitly bounded between 0 and 1, is in fact a 0/1-polytope whose vertices correspond to the feasible -clusterings of . For , this polytope generalizes the fixed-size partition polytope from [2]. It is also an instance of the bounded-shape partition polytope from [7] when is the standard basis of .
In the fixed-size partition polytope, the edges also have a combinatorial interpretation: two vertices share an edge if and only if the corresponding clusterings differ by a single cyclical move of items among the clusters, formally defined in Section 2. This fact can be used to prove new bounds on the combinatorial diameter of the polytope – the maximum length of a shortest edge walk between any pair of vertices – and provide practical methods for constructing transformations between fixed-size clusterings [2]. In this paper, we generalize these methods to the bounded-size partition polytope . Although the edges of this polytope have a more technical characterization, its circuits correspond to a set of natural cyclical and sequential moves of items among the clusters [11]. However, constructing transformations using these two types of moves is significantly more challenging than using only cyclical moves due to their different effects on the sizes of the underlying clusters.
Circuits, introduced as the elementary vectors of a subspace by Rockafellar [24], play a fundamental role in the theory of linear programming. For a general polyhedron P=\{{\mathchoice{\mbox{\boldmath\displaystyle\bf x}}{\mbox{\boldmath\textstyle\bf x}}{\mbox{\boldmath\scriptstyle\bf x}}{\mbox{\boldmath\scriptscriptstyle\bf x}}}\in\mathbb{R}^{n}\colon A{\mathchoice{\mbox{\boldmath\displaystyle\bf x}}{\mbox{\boldmath\textstyle\bf x}}{\mbox{\boldmath\scriptstyle\bf x}}{\mbox{\boldmath\scriptscriptstyle\bf x}}}={\mathchoice{\mbox{\boldmath\displaystyle\bf b}}{\mbox{\boldmath\textstyle\bf b}}{\mbox{\boldmath\scriptstyle\bf b}}{\mbox{\boldmath\scriptscriptstyle\bf b}}},B{\mathchoice{\mbox{\boldmath\displaystyle\bf x}}{\mbox{\boldmath\textstyle\bf x}}{\mbox{\boldmath\scriptstyle\bf x}}{\mbox{\boldmath\scriptscriptstyle\bf x}}}\leq{\mathchoice{\mbox{\boldmath\displaystyle\bf d}}{\mbox{\boldmath\textstyle\bf d}}{\mbox{\boldmath\scriptstyle\bf d}}{\mbox{\boldmath\scriptscriptstyle\bf d}}}\}, the set of circuits of consists of all {\mathchoice{\mbox{\boldmath\displaystyle\bf g}}{\mbox{\boldmath\textstyle\bf g}}{\mbox{\boldmath\scriptstyle\bf g}}{\mbox{\boldmath\scriptscriptstyle\bf g}}}\in\ker(A)\setminus\{\mathchoice{\mbox{\boldmath\displaystyle\bf 0}}{\mbox{\boldmath\textstyle\bf 0}}{\mbox{\boldmath\scriptstyle\bf 0}}{\mbox{\boldmath\scriptscriptstyle\bf 0}}\} normalized to coprime integer components for which B{\mathchoice{\mbox{\boldmath\displaystyle\bf g}}{\mbox{\boldmath\textstyle\bf g}}{\mbox{\boldmath\scriptstyle\bf g}}{\mbox{\boldmath\scriptscriptstyle\bf g}}} is support-minimal over \{B{\mathchoice{\mbox{\boldmath\displaystyle\bf x}}{\mbox{\boldmath\textstyle\bf x}}{\mbox{\boldmath\scriptstyle\bf x}}{\mbox{\boldmath\scriptscriptstyle\bf x}}}\colon{\mathchoice{\mbox{\boldmath\displaystyle\bf x}}{\mbox{\boldmath\textstyle\bf x}}{\mbox{\boldmath\scriptstyle\bf x}}{\mbox{\boldmath\scriptscriptstyle\bf x}}}\in\ker(A)\setminus\{\mathchoice{\mbox{\boldmath\displaystyle\bf 0}}{\mbox{\boldmath\textstyle\bf 0}}{\mbox{\boldmath\scriptstyle\bf 0}}{\mbox{\boldmath\scriptscriptstyle\bf 0}}\}\}. Geometrically, circuits correspond to all potential edge directions of as the right-hand side vectors and vary. This implies that the set of circuits serves as a universal test set for any linear program over [15]. Hence, circuits are used as the step directions in several augmentation algorithms for solving linear programs [10, 12, 16, 17].
Additionally, for polyhedra such as defined by totally unimodular matrices, circuits have combinatorial interpretations in terms of the underlying problem. The support-minimality of circuits implies that, in a sense, steps taken in circuit directions are as simple as possible while maintaining feasibility. In combination with highly-structured problems from combinatorial optimization, these steps become particularly intuitive and are guaranteed to only visit integral points [11].
Therefore, in this paper, we choose the circuits of as the elementary moves for transforming -clusterings. In building such transformations using these circuits, we construct circuit walks between the corresponding vertices in the polytope. As a generalization of edge walks, circuit walks are of interest due to their relationship to the combinatorial diameter of polyhedra [6, 8, 9]. The circuit distance between vertices refers to the minimum number of steps needed to join the vertices by a circuit walk and hence provides a lower bound on the combinatorial distance between the vertices. The related circuit diameter of a polyhedron – the minimum number of steps needed to join any pair of vertices by a circuit walk – then serves as a lower bound on its combinatorial diameter. In particular, circuit diameters provide insight into the polynomial Hirsch Conjecture, one of the fundamental open questions in linear programming. See [19] for a survey of this field of study.
To build transformations between clusterings, we use one of the main tools from [2] for analyzing fixed-size clusterings. Given two -clusterings of the same data set, the clustering-difference graph is a directed graph that models the transfers of items required for transforming into . In the context of fixed-size clusterings, such a graph decomposes into directed cycles. Sets of cyclical moves can then be integrated together in order to bound the combinatorial distance between vertices in the fixed-size partition polytope [2].
We generalize this graph-theoretic approach for constructing transformations between clusterings to the context of general and bounded-size -clusterings. In these situations, both cycles and paths in a clustering-difference graph correspond to circuits of the related polytopes. Integrating these different types of moves becomes much more technically challenging than integrating only cyclical moves since sequential moves alter the sizes of the underlying clusters. In Section 2, we show how this is possible for certain combinations of cyclical and sequential moves, providing various double-moves which reduce the number of steps needed for transforming clusterings (Theorems 2.1 and 2.2). Next, we prove in Section 3 how these double-moves lead to an upper bound (Theorem 3.1) on the so-called transformation distance between -clusterings: a relaxation of the circuit distance. In Section 4 we then prove the implications of this bound on the circuit diameter of the bounded-size partition polytope (Theorem 4.1). We end with a brief discussion on future directions of research in Section 5.
2 Moves and Double-Moves for Transforming Clusterings
Let be two -clusterings of the same data set. We recall from [2] the definition of the clustering-difference graph from to , a graph-theoretic model for the difference between the two clusterings.
Definition 1 (Clustering-difference Graph)
For two -clusterings and of a data set , the clustering-difference graph from to is a directed arc-labeled multigraph with vertex set and edge set , where an edge with label belongs to if and only if and for .
Thus, the edges of describe all single-item transfers needed to transform into . The number of edges is equal to the number of items whose cluster assignment differs from to – if an edge with label belongs to , item must be moved from cluster to cluster as a part of the transformation. Note that this allows for parallel edges in , but all edges have different labels. We refer to the transfers of a consisting of a single directed cycle as a cyclical move of items among the clusters. Similarly, the set of transfers from a directed path is referred to as a sequential move of items. See Figure 2 for examples of these two types of moves with corresponding clustering-difference graphs.
The clustering-difference graph plays an important role in the analysis of bounded-size and fixed-size partition polytopes. For the fixed-size partition polytope, a decomposes into directed cycles since all vertices in the graph must have equal indegree and outdegree. Two vertices in the polytope then share an edge if and only if the corresponding consists of a single directed cycle. This characterization has been used to construct edge walks between vertices of the polytope using sequences of cyclical moves, resulting in upper bounds on the combinatorial diameter [2].
Although the edges of the bounded-size partition polytope have a more complicated characterization, its circuits analogously correspond to clustering-difference graphs consisting of either a single directed cycle or a single directed path [11]. Hence, devising sequences of cyclical and sequential moves for transforming into can be interpreted as constructing a circuit walk between the corresponding vertices in the polytope. As we will show in Sections 3 and 4, this allows us to bound the circuit distance between vertices and the related circuit diameter of the bounded-size partition polytope.
Therefore, we are interested in constructing a transformation from to using as few of these cyclical and sequential moves as possible. We call the minimum number of required moves the transformation distance from to .
Definition 2 (Transformation Distance)
For -clusterings of the same data set, the transformation distance is the minimum number of cyclical and sequential moves needed to transform into .
Hence, is a relaxation of the circuit distance between the corresponding vertices of the bounded-size partition polytope – as long as no cluster size constraints are violated during a sequence of moves used to achieve , the two distances are equal.
A naive approach for bounding is to decompose into paths and cycles and then simply apply the corresponding moves individually to perform the clustering transformation. However, even when such a decomposition is optimal, can be significantly less than the number of parts in the decomposition. For example, Figure 3(a) depicts a case in which consists of four vertex-disjoint cycles, which trivially implies . Nevertheless, due to the fact that any permutation can be expressed as the product of two cyclic permutations [1], only two cyclical moves are needed to transform into , implying . For simplicity, we will use the term disjoint in place of vertex-disjoint throughout the remainder of the paper. Whenever two components of a are not vertex-disjoint, we will say that they intersect.
Proposition 1 ([1], Lemma 7 in [2])
Let be -clusterings for which decomposes into disjoint cycles. Then .
In [2], Proposition 1 is used to integrate disjoint cyclical moves into what we will call a double-move: a sequence of two moves which results in the desired changes to the underlying clustering-difference graph. See Figure 3 for a visualization of this double-move. In a first cyclical move of items, depicted in Figure 3(b), the transfers corresponding to all but one edge from each cycle are correctly applied – the items corresponding to the remaining edges are temporarily sent to incorrect destinations across the cycles. However, a second cyclical move, depicted in Figure 3(c), can then be used to send each of these misplaced items to its correct destination, completing the clustering transformation.
In the following lemmas and the upcoming Theorems 2.1 and 2.2, we show how similar double-moves can be used to integrate both cyclical and sequential moves when transforming general -clusterings. This becomes more technically challenging due to the different natures of the two types of moves – cyclical moves do not alter the sizes of any underlying cluster while sequential moves alter the sizes of exactly two clusters by one. First, generalizing Proposition 1, we show how to integrate sets of disjoint cycles and paths from a . Note that the bound on in the following lemma depends only on the number of paths in the decomposition and not on the number of cycles.
Lemma 1
Let be -clusterings for which decomposes into disjoint cycles and paths. Then , where is the number of paths in the decomposition.
Proof
Let denote the number of directed paths and the number of directed cycles in the decomposition of . When , the result follows from Proposition 1, so assume . We may also assume , else the sequential moves could simply be applied individually.
First suppose . Let denote the paths of and let denote the cycles. For , select any edge from cycle . In addition, let denote the tail of , the tail of , and the neighbor of on . See Figure 4(a). We will apply a double-move consisting of two sequential moves to transform clustering into .
For the first sequential move, first introduce an edge from to , sending to the item from intended for . After traveling along this edge, follow from to . Next, if , introduce an edge from to , whose item associated with , and then travel along to . Repeat for until is reached. Complete the move by introducing an edge from to , whose item is associated with , and then traveling along path . See the blue edges in Figure 4(b) for a visualization of this move.
Note that this first sequential move applies all transfers given by the cycles in with the exception of those corresponding to the edges – the item from intended for is temporarily sent to the wrong cluster. The move also applies all transfers given by but does not correctly apply any transfer from . However, all misplaced items can be corrected and all remaining transfers can be applied via a single, second sequential move as seen in Figure 4(c). First, send from to the item received from . Next, for , send from to the item received from . Once is reached, send from to the item received from . Finish the move by following the remaining edges of . After this second sequential move is applied, the transformation from to is complete – it follows that . See Figure 4(d) for a visualization of these moves integrated together into a double-move
Next suppose . We can apply the double-move from the previous case to remove all cycles and any two paths from . The remaining paths can then be applied via individual sequential moves. Hence, .
Lastly, suppose . Again let denote the single path of , let denote the tail of , let denote the neighbor of on , and choose an edge from for . We will apply a sequential move followed by a cyclical move to transform into . See Figure 5 for a visualization of the double-move.
For the first move, as in the previous case, first introduce an edge from to and then travel along from to . If , then for , introduce an edge from to and follow to until is reached. Complete the sequential move by introducing an edge from to and then following the remaining edges of .
The second cyclical move corrects all items sent to incorrect destinations by the first move. Namely, first follow the edge from to , sending to the item received from . Next, for , follow the edges from to until is reached. Complete the move by following the edge from to , sending the item received from . This completes the transformation from to , so .
Therefore, in each case, .
Even when paths and cycles are not completely disjoint, the corresponding moves may still be integrated together into a double-move. In the following lemma, we show that if a path intersects at most one cycle out of a collection of disjoint cycles, only two moves are required to apply all corresponding transfers.
Lemma 2
Let be -clusterings for which consists of disjoint cycles and a single path , which intersects at most one of the cycles. Then .
Proof
Let denote the cycles of . If does not intersect any of the cycles, we can apply Lemma 1, so assume intersects . We may also assume , else the sequential and cyclical moves corresponding to and could simply be applied individually. Let denote (in order) the vertices of , and select an edge from each cycle . Assume is chosen such that is the first vertex of that also belongs to . (Note that it is possible to have or , but both of those cases are still covered by the following construction.) We will apply two sequential moves to transform into . See Figure 6 for a visualization of along with the double-move.
For the first sequential move, start at and follow to . Then follow the path formed by joining with the introduced edges , terminating the move at . Hence, the move reduces the size of the cluster corresponding to , as desired, but it also increases the size of the cluster corresponding to .
To correct this, apply a second sequential move starting at and terminating at . First follow the edges to correct items misplaced across cycles. Next, since , follow along the vertices to complete the move. Since only intersects , no vertices are repeated in this second sequential move and it indeed corresponds to a single directed path in the . All transfers corresponding to the original edges of have then been correctly applied.
Note that in the double-move used for Lemma 2, the first sequential move temporarily increases the size of the cluster corresponding to . For general -clusterings this is not an issue, but for bounded-size -clusterings this could potentially violate the upper bound on the size of the cluster. Unfortunately, this increase in cluster size is unavoidable for certain configurations of . Consider the example in Figure 7. If only two moves are to be used to transform into , must correctly send an item to in each move. Hence, may not temporarily receive an item from or in the first move, and the cluster size corresponding to either or must be temporarily increased. Therefore, although , the circuit distance between the corresponding vertices in a related bounded-size partition polytope may be 3 if the cluster sizes corresponding to and are already at their upper bounds. We further address this issue in Section 4.
When a path intersects multiple cycles from a set of disjoint cycles, integrating the corresponding moves as in the double-move from Lemma 2 becomes more challenging since we can no longer guarantee that the second sequential move corresponds to a single directed path in the underlying . However, we can ensure that transfers corresponding to at least the first and last edges of the path are applied in conjunction with the cycles. We show in the upcoming theorem that given any path and a set of disjoint cycles, a double-move can be used to correctly apply all transfers from while decreasing the cluster size corresponding to and increasing the cluster size corresponding to . Furthermore, such a double-move will then allow us to completely integrate with as long as does not intersect the cycles of more than three times.
Recall that in a clustering-difference graph , the outdegree of a vertex is equal to the number of items which must be moved from the corresponding cluster to perform the clustering transformation. Note that in order for the outdegree to be reduced, a correct item must be sent from the cluster to a new destination; however, this destination cluster need not actually be the other endpoint of the corresponding edge in . On the other hand, the indegree of a vertex gives the number of items which must be moved to the corresponding cluster. For the indegree to be reduced, a correct item must be received by the vertex, but it does not matter which cluster actually sends the item. We call the minimum of the indegree and outdegree of a vertex the shared degree of that vertex in the . Hence, applying a set of disjoint cyclical moves reduces the shared degree of all covered vertices by one. When integrating a path with these cyclical moves, this reduction in shared degree should still occur in order to make all desired improvements to the underlying . In the following theorem, we provide four different double-moves to accomplish this task. The type of double-move to use depends on the intersection points of the path with the cycles.
Theorem 2.1
Let be -clusterings with clustering-difference graph , let be a set of disjoint cycles in , and let be a path in that is edge-disjoint from . There exists a double-move which accomplishes all of the following:
Correctly applies all transfers from 2. 2.
Reduces the cluster size corresponding to through sending a correct item 3. 3.
Increases the cluster size corresponding to through receiving a correct item 4. 4.
Decreases the shared degree of each vertex covered by by at least one.
Proof
If or if is disjoint from , we can apply Lemma 2, so assume . Let denote the first vertex of covered by , and let denote, respectively, the second-to-last and last vertex of covered by . Note that we have when intersects only twice. Similarly, if intersects only once, we let . We treat four exhaustive cases regarding the distribution of , , and across the cycles of .
Case 1: and belong to different cycles of and belongs to the same cycle as . We apply a cyclical move followed by a sequential move to perform all necessary transfers in . See Figure 8 for a visualization of the double-move.
Without loss of generality, assume and . Choose an edge from each cycle . However, choose and so that and . For the cyclical move, first introduce an edge from to whose item corresponds to the item sent from in . (Note that we might have , in which case the edge already exists in .) Next, follow from to . Introduce an edge from to whose item corresponds to that of , and then travel along until is reached. Repeat for the remaining cycles by introducing edges and traveling along until is reached to complete the cyclical move.
Next, start a sequential move at , following to . Use the edge . Then follow the edges for until is reached to correct items misplaced across the cycles. Complete the sequential move by following from to and then to .
We now prove that the desired changes have been made to the underlying . Clearly the cluster size corresponding to is decreased and the cluster size corresponding to is increased via the second sequential move. Furthermore, each item sent by or received by is correct. As in the double-move from Figure 3, all edges from are applied through the combination of the two moves. Thus, it suffices to show that the shared degree of each vertex covered by has been reduced by at least one. The only interesting cases are , , and . In the first cyclical move, both receives and sends a correct item, reducing its shared degree by one. In the following sequential move, either only sends a correct item (if ) or both sends and receives a correct item, so the net reduction in shared degree of is at least one. Similarly, sends and receives a correct item in the first move and then either receives a correct item or both receives and sends a correct item in the second move.
Finally, consider . The vertex receives a possibly incorrect item from but also sends a correct item to its neighbor on via the first cyclical move, leaving its shared degree, at worst, unchanged. In the second sequential move, receives a correct item originating from and then sends a correct item to the following vertex on . Thus, the shared degree of is also reduced by at least one, as desired.
Case 2: and belong to different cycles of and does not belong to the same cycle as . ( may or may not belong to the same cycle as , or we may have .) We apply a sequential move followed by a cyclical move to perform the necessary transfers. See Figure 9 for a visualization of the double-move.
Assume and . Choose an edge from each such that , , and for . First, travel along from to . Then for , travel along and follow an introduced edge . Finish the sequential move by following from to and then following from to .
Start constructing the following cyclical move at using the edge . Then for , follow the edges until is reached to correct items misplaced across the cycles. Now introduce (if needed) an edge from to whose item corresponds to the item sent from in . Follow this edge to , and then complete the cyclical move by following from to . Note that this is indeed a single cyclical move since for .
The first sequential move alters the cluster sizes corresponding to and through correct transfers as desired, and again the edges of are all applied through the combination of the two moves. It suffices to show that the double-move reduces the shared degree of all vertices covered by . However, this again follows from the argument used in the previous case: although may receive an incorrect item from , it sends away two correct items and its shared degree is reduced by at least one.
Case 3: and belong to the same cycle of but belongs to a different cycle. Assume and . We apply a cyclical move followed by a sequential move to perform the necessary transfers. See Figure 10 for a visualization of the double-move.
Choose an edge from each cycle such that and . For the cyclical move, first travel along from to . Then for , introduce and follow an edge and follow to . Once is reached, follow from to to complete the cyclical move.
Start the sequential move by following from to . Next, introduce (if needed) an edge from to whose item corresponds to the item sent from in . Follow this edge and then follow the edge . Next, for , follow the edge to correct items misplaced among the cycles. Once is reached, finish the sequential move by following from to . All clusters are then changed as desired using the arguments from Case 1.
Case 4: , , and belong to the same cycle in (allowing for either or ). We apply two sequential moves similar to those of Lemma 2 to perform all necessary transfers. See Figure 11 for a visualization of the double-move.
Assume . Choose an edge from each cycle , choosing so that . For the first sequential move, first follow from to and then travel along to . Then for , introduce the edge and travel along . Terminate once is reached.
For the second sequential move, begin at via edge . Then for , travel along the edge to correct items misplaced among the cycles. Once is reached, introduce (if needed) an edge from to whose item corresponds to the item sent from in . Follow this edge and then finish the move by following from to and then to . All clusters are then changed as desired by again using the arguments from Case 1.
In the proof of Theorem 2.1, we observe an important implication: if a path intersects a set of disjoint cycles at most three times, then all transfers correcponding to the cycles and the path can be correctly applied using one of the double-moves from the theorem.
Theorem 2.2
Let be -clusterings where consists of a set of disjoint cycles and a path . If intersects at most three times, then .
Proof
Let . As in the proof of Theorem 2.1, consider , , and : the first, second-to-last, and last vertices of which are covered by . If these are the only vertices of covered by , then all edges of can be applied in any of the four double-moves from Theorem 2.1. To see this, note that since no vertices on between and are covered by , we can follow all corresponding edges of when sending an item from to . As in the proof of the theorem, the same holds for all edges between and . Hence, all transfers corresponding to the edges from both and are correctly applied through the appropriate double-move.
Of course, since the number of times a path intersects a set of cycles is at most the number of vertices in , this implies that a path with at most three vertices can always be integrated with using one of the double-moves from Theorem 2.1.
Corollary 1
Let be -clusterings where consists of a set of disjoint cycles and a path with at most three vertices. Then .
3 Bounds on the Transformation Distance
In this section, we use the double-moves from Section 2 to prove upper bounds on the transformation distance between clusterings based on certain properties of the related .
Given any two -clusterings with clustering-difference graph , our goal is to transform into using as few cyclical and sequential moves as possible. Recall the fundamental difference between these two types of moves: cyclical moves transfer items among the clusters while preserving the original cluster sizes; on the other hand, sequential moves transfer items while increasing the size of one cluster and decreasing the size of another. This motivates a decomposition of into two parts corresponding to these different types of moves.
Definition 3 (Path-Cycle Decomposition)
Let be -clusterings with clustering-difference graph . For , let denote , the change in the size of cluster between and . A path-cycle decomposition of is a decomposition of into two parts: a set containing directed paths and a graph which decomposes into directed cycles.
For any path-cycle decomposition of , the paths of adjust the cluster sizes of to those of and the edges of apply any remaining transfers. Such a decomposition can be found easily: greedily construct directed paths in which begin at excess vertices, those vertices satisfying , and terminate at deficit vertices, those satisfying , and add the paths to . Once there do not exist any excess or deficit vertices in , the remaining edges in the graph form . Alternatively, greedily remove directed cycles from to build and the leftover edges will decompose into . Note that we can store either as a set of directed paths or as a graph which decomposes into paths. Nevertheless, the fixed number of paths from excess to deficit vertices in gives a lower bound on the transformation distance between the clusterings.
Lemma 3
Let be -clusterings of the same data set. Then , where .
Proof
By definition, is the change in cluster size needed to transform into . Note that the sum is therefore even. Cyclical moves do not change the size of any clusters while sequential moves change the size of exactly two clusters by one. Hence, at least sequential moves are required in order to change the cluster sizes of to those of .
Given a path-cycle decomposition of , a straightforward approach for transforming into is to separately apply the paths of followed by the cycles of . However, whereas a fixed number of sequential moves is required to apply all paths of , the number of cyclical moves required to apply all transfers in is generally less than its number of cycles. Using the double-move from Figure 3, we can integrate sets of disjoint cycles from to achieve a transformation distance bound which generalizes Corollary 7 in [2]. This serves as a starting point for our discussion on an improved upper bound for . Recall that the shared degree of a vertex in is the minimum of its indegree and outdegree.
Lemma 4
Let be -clusterings of the same data set. Then
[TABLE]
where , is the shared degree of in , , and .
Proof
Let be any path-cycle decomposition of . Applying the sequential moves corresponding to the paths of correctly adjusts all cluster sizes.
Next, we can use the method of Corollary 7 from [2] to apply the cycles in . To do so, note that for , the shared degree of in is at most . Hence, we may first apply at most cyclical moves to reduce the maximum shared degree in to at most . Next, using the technique in Corollary 3 from [2], we can solve a maximum flow problem to obtain a set of disjoint cycles in covering all vertices of maximum shared degree. All transfers from this cycle cover can be applied via at most two cyclical moves using the double-move from Figure 3. Repeating until the maximum shared degree of is zero, all transfers from are performed in at most cyclical moves. Therefore, at most cyclical and sequential moves are required to transform into .
This initial upper bound on uses the double-move for integrating disjoint cyclical moves from Figure 3 but does not yet take advantage of any of the double-moves from Section 2 which integrate both cyclical and sequential moves. For instance, when applying the cyclical moves from a disjoint cycle cover of all vertices of maximum shared degree in , we could attempt to integrate a path from . If is disjoint from , if intersects at most one cycle of , or if intersects at most three times, we could use one of the double-moves from Lemma 1, Lemma 2, or Theorem 2.2 to integrate at no extra cost, reducing the number of remaining sequential moves. However, we cannot guarantee that such a path exists in .
Nevertheless, we can achieve an improved bound on the transformation distance by considering each path in as the combination of a (short) sequential move with a cyclical move. To motivate this, suppose consists of a set of disjoint cycles and a path with , where , , and are all covered by . We can apply one of the four double-moves from Theorem 2.1 to perform all transfers corresponding to the edges in while decreasing the excess of and the deficit of . However, the double-move does not apply any edges of between and . Additionally, receives an incorrect item from during the double-move which still needs to be sent to . Hence, a new edge from to is introduced and the resulting consists of the directed cycle .
In this manner, we can represent the sequential move corresponding to any path in covering vertices as the combination of such a cycle and a path with three vertices. As depicted in Figure 12, there are two cases to consider regarding the interior vertex of and the item sent along the artificially introduced edge in . These cases depend on the order in which the corresponding transfers are applied. Let denote the item to be sent from to in , and let denote the item to be sent from to . If is applied first (as in the previous paragraph), let and let send from to the correct destination . This case is depicted in Figure 12(a). On the other hand, if is applied before , let send from to and let as depicted in Figure 12(b). Either case has the same effect on the underlying clusters as the original path .
Therefore, we decompose each path from with more than three vertices in this manner, adding the resulting cycle to and replacing with in . All paths in then have at most three vertices and Corollary 1 implies that we can completely integrate any of these paths with a disjoint cycle cover from in only two moves. Note also that each vertex in a cycle is an interior vertex of the original path . Hence, even after introducing these additional cycles to , the shared degree of each vertex in remains at most the shared degree of that vertex in the original clustering-difference graph. This allows us to improve upon the distance bound given in Lemma 4.
The challenge in this approach lies in the fact that the interior vertex of (either or ) and the label of (either or ) depend on the order in which the corresponding transfers are applied. If a cycle cover does not include , then integrating with is straightforward – simply choose the correct interior vertex for depending on whether or not has been applied yet. Once is applied, then if remains in , change its label from to .
However, if is contained in , then we must make adjustments to the double-moves from Theorem 2.1 in order to take into account the different possible cases for and . Nevertheless, this approach allows us to integrate sequential moves from with disjoint cyclical moves from at no extra cost, resulting in the following greatly improved distance bound. The bound depends only on the larger of the second-largest shared degree and the overall change in cluster sizes rather than on the sum of these values as in Lemma 4.
Theorem 3.1
Let be -clusterings of the same data set. Then
[TABLE]
where , is the shared degree of in , , and .
Proof
Let be any path-cycle decomposition of . For each path of the paths in , if , decompose into a cycle and a short path as depicted in the cases of Figure 12. Specifically, let where the label of edge is the item to be sent from to in , as depicted by the blue edges in Figure 12(a). Replace with in . In addition, introduce the cycle to , where the label of the artificial edge in is the item to be sent from to in , as depicted by the green edges in Figure 12(b). Note that each vertex in is an interior vertex of the original path ; hence, in the resulting cycle graph , the shared degree of each vertex remains at most .
As in the proof of Lemma 4, first apply at most cyclical moves to reduce the maximum shared degree in to at most . Whenever an artificial edge is applied in such a move, change the interior vertex of the corresponding path in so that as in Figure 12(b).
Now, again as in the proof of Lemma 4, we can reduce the maximum shared degree in by finding a disjoint cycle cover for the vertices of maximum shared degree and applying a double-move. However, in each such double-move, we will also integrate a path from .
Let be such a set of disjoint cycles in , which can be found using the technique from [2]. Choose any path from . Since each path in has at most three vertices, if the selected path is an original path from , integrating the path with in a double-move is straightforward via Corollary 1. Again, if an artificial edge is applied through this double-move and the corresponding path remains in , switch the interior vertex of from to as in Figure 12(b).
Hence, assume the selected path from is an introduced path of the form with corresponding artificial edge . If is not contained in the cycle cover , integrating with is again straightforward via Corollary 1: the interior vertex of is known and we can simply apply one of the double-moves from Theorem 2.1. After the double-move is applied, make any necessary adjustments to the remaining paths in as in the previous paragraph. Additionally, if remains in , change its label from to as in Figure 12(a).
However, if the edge corresponding to is contained in , then we must make modifications to the double-moves of Theorem 2.1 to account for the two different cases for and . Note that if this situation arises, has not yet been applied so has the initial form . We modify each of the four cases regarding the intersection points of with from Theorem 2.1 to perform the necessary transfers. Since is included in , both and are necessarily covered by , but and need not be covered. Several of the case modifications depend on whether or not these two vertices are actually covered by the cycles.
Case 1: All three vertices of are covered by , where and belong to different cycles of and (and hence, also and ) belongs to the same cycle as . See the examples in Figure 13 – the artificial edge is given by the green edge from to and the other edges of are given in black. Note that we cannot simply apply the double-move from Case 1 in Theorem 2.1 as depicted for this scenario in Figure 13(a) (compare to Figure 8). In the first cyclical move, the edge would be applied, sending the item from to . Hence, would then be unable to send to in the second sequential move.
We can address this by making a slight modification to this first cyclical move: instead of sending from to and then applying , simply send directly from to . Then remains at , and in the second sequential move, can send to as seen in Figure 13(b).
Note that in this modified double-move, the artificial edge from to is never actually applied. However, its intended purpose is accomplished: item is correctly received by from , and is correctly sent from to . Therefore, after the double-move is applied, we can remove from and along with the other edges of from , as desired.
Case 2: The first and last vertices of which are covered by belong to different cycles, and the second-to-last vertex covered by belongs to a different cycle than the last vertex. There are three double-moves based on the double-move from Case 2 of Theorem 2.1 which can be used depending on whether or not , , or both and are covered by . Depictions of these moves are given in Figure 14.
- a)
Vertices and are both covered by . In this situation, for Case 2 to apply, and must belong to different cycles of and must not belong to the same cycle as . See Figure 14(a). Then when performing the double-move from Case 2 of Theorem 2.1 as depicted in Figure 9, the edge is applied in the first sequential move before any of the edges from , sending from to . Hence, if we switch the interior vertex of from to , we can apply this original double-move without any further modifications, as depicted in Figure 14(a).
- b)
Vertex is not covered by . Then for Case 2 to apply, and must belong to different cycles of . See Figure 14(b). As in Case 1, we cannot apply the original double-move since then would be unable to send to in the second move. We can address this in the same way as in the modified double-move from Case 1: send directly from to in the first sequential move and then send directly from to in the second cyclical move, as depicted in Figure 14(b). Although is never actually applied, all desired transfers are accomplished as in Case 1. 3. c)
Vertex is not covered by . Then for Case 2 to apply, and must belong to different cycles of . See Figure 14(c). We make a similar modification to that of the previous case: correctly send directly from to in the first sequential move and then correctly send directly from to in the second cyclical move, as depicted in Figure 14(c).
Case 3: The first and last vertices of which are covered by belong to the same cycle in , while the second-to-last vertex belongs to a different cycle. The only scenario in which this case applies is when and belong to the same cycle of and belongs to a different cycle. We make a modification similar to the third double-move from the previous case. In a first cyclical move send directly from to , and in a second sequential move send directly from to . See Figure 15.
Case 4: All vertices of which are covered by belong to the same cycle. There are two double-moves, depicted in Figure 16, which can be used depending on whether or not is covered by .
- a)
Vertex is covered by . As in the first double-move for Case 2, in the original double-move for Case 4 of Theorem 2.1 the edge is applied before any of the edges from . Hence, if we switch the interior vertex of from to , we can apply the double-move without any further modifications, as depicted in Figure 16(a). Note that may or may not be covered by the cycle containing and .
- b)
Vertex is not covered by . We make a modification similar to that of the second double-move for Case 2. In a first sequential move, directly send from to , and in a second sequential move, directly send from to . See Figure 16(b). Note again that may or may not be covered by the cycle containing .
In each case, we are able to integrate with and apply all necessary transfers in only two moves. Therefore, at most double-moves are needed to reduce the shared degree of to zero, and through each of these double-moves, we remove one of the paths from . Afterwards, we may simply apply the remaining paths in , if any, individually. The total number of moves used to transform into is thus at most
[TABLE]
4 Circuit Diameter of Partition Polytopes
A fundamental open question in linear programming is whether or not there exists a polynomial pivot rule for the simplex method. The existence of such a pivot rule would require that the polynomial Hirsch conjecture [19] holds; i.e., that the combinatorial diameter of a polyhedron can be polynomially bounded. A recent effort to better understand the combinatorial diameter of polyhedra has been the study of the related circuit diameter [6, 8, 9, 18]. Whereas the original Hirsch conjecture is false in general [20, 25], the related Circuit Diameter Conjecture [6] remains open.
Recall that the circuits of the bounded-size partition polytope correspond to cyclical and sequential moves of items among clusters. Therefore, as long as no cluster size constraints are violated during a clustering transformation, any resulting bounds on the transformation distance between clusterings have implications on the circuit distance between vertices in . As a 0/1-polytope, the combinatorial diameter (and hence, also the circuit diameter) of satisfies the Hirsch conjecture [23] – specifically, the combinatorial diameter is at most the number of items . In this section, we will use the results from Section 3 to achieve much better upper bounds on the circuit diameter.
For the fixed-size partition polytope, Proposition 1 can be used to show that the combinatorial diameter is at most , where are the two largest fixed cluster sizes [2]. We begin by generalizing this bound to the circuit diameter of the bounded-size partition polytope by also taking into account the largest possible change in cluster sizes. Although we do not yet utilize any double-moves which integrate sequential and cyclical moves (see the upcoming Theorem 4.1), the bound of the following lemma is already better than the naive bound achieved by simply counting the sequential and cyclical moves separately – we can relate the shared degree of a vertex in a to the change in size of the corresponding cluster.
Lemma 5
For a bounded-size partition polytope , assume the corresponding clusters are indexed so that and let denote the two indices minimizing . Then the circuit diameter of is at most
[TABLE]
Proof
Let be -clusterings corresponding to vertices of . We can transform into by separately applying sequential moves followed by cyclical double-moves in the manner of Lemma 4. All intermediate clusterings in this transformation satisfy the cluster size bounds of , so the process indeed corresponds to a circuit walk from to in .
Let denote the shared degree of vertex in , and let . Lemma 4 then implies that the circuit distance from to in is at most
[TABLE]
where maximize over all . Trivially, for , it holds that and . Hence, we obtain the following upper bound on the circuit diameter of as a natural implication of Lemma 4:
[TABLE]
Note however that this bound can be immediately improved. For , we must have since is equal to the maximum of the indegree and outdegree of . Rearranging this inequality yields . Substituting into (1), we obtain the following upper bound on the circuit distance from to :
[TABLE]
Note that . Similarly, it holds that . Thus, we obtain the stated bound.
As in Theorem 3.1, we can significantly improve upon this diameter bound by using the double-moves from Theorem 2.1 to integrate sequential moves with sets of disjoint cyclical moves. Note that we must take care when applying these double-moves to bounded-size clusterings – certain moves require the existence of a vertex whose cluster size can be temporarily increased as demonstrated in Figure 7. Nevertheless, as long as there is at least some slack in the constraints for all but at most one cluster, we can ensure the existence of such a vertex through a simple pre-processing of the clusters. Hence, we obtain the following improved diameter bound as an implication of the transformation distance bound from Theorem 3.1, which depends on the maximum of the second-largest cluster size and the largest possible change in cluster sizes. We note that the cluster size slackness assumptions made in this theorem are quite natural in any application involving bounded-size clusterings.
Theorem 4.1
For a bounded-size partition polytope , assume the corresponding clusters are indexed so that and let denote the index minimizing . If and if for , the circuit diameter of is at most
[TABLE]
Proof
Let be -clusterings corresponding to vertices of . We can transform into in the manner of Theorem 3.1. However, in order for all intermediate clusterings to satisfy the bounds of , we must make sure that when applying any version of the double-move from Case 4 of Theorem 2.1, there exists a suitable choice for whose corresponding cluster size is strictly less than its upper bound and can be temporarily increased.
To ensure that this is always the case, we pre-process and in the following manner. If there exists more than one cluster (or in the case of ) such that , then choose such an index with , which is possible since at most one index satisfies . Transfer any item from to a different cluster which satisfies . Such an index must exist, else we would have
[TABLE]
Repeat this process at most times until sizes of all clusters but at most one are strictly less than their upper bounds. In the resulting -clustering, it is then always possible to choose in Case 4 of Theorem 2.1 such that the corresponding cluster size can be temporarily increased when performing the double-move: any cycle in a clustering-difference graph covers at least two vertices, and at least one of these vertices must have a corresponding cluster size less than its upper bound. Additionally, the choice of the vertex is not affected by the modifications in Case 4 of Theorem 3.1.
Pre-processing both and in this manner requires at most single item transfers. Once these transfers have been applied, we may perform a circuit walk between the resulting -clusterings by applying the moves of Theorem 3.1 to their clustering-difference graph. Such a circuit walk has length at most
[TABLE]
where denote the two indices maximizing the shared degree .
As in the proof of Lemma 5, since , this bound is at most
[TABLE]
Taking into account the at most circuit steps needed to adjust the cluster sizes, we obtain the stated improved bound.
5 Conclusions and Future Directions
In this work, we provide methods based on linear programming and network theory for transforming -clusterings using sequences of cyclical and sequential moves of items among clusters. This leads to upper bounds on the transformation distance between two general -clusterings as well as the circuit diameter of the bounded-size partition polytope. There are several natural directions for future research in this area.
We prove in Theorem 4.1 an upper bound on the circuit diameter of the bounded-size partition polytope using the transformation distance bound from Theorem 3.1 and modified double-moves from Theorem 2.1 which integrate sequential moves of items with cyclical moves. A subsequent research question is whether or not we can also bound the combinatorial diameter of the polytope in such a manner. The edges of have a more technical characterization than its circuits – only certain cyclical and sequential moves actually correspond to edges between vertices [11]. However, through a careful ordering of cyclical and sequential moves and double-moves, we believe new bounds on the combinatorial distance between vertices in the polytope could be achievable.
Additionally, in Theorem 3.1, we use an arbitrary path-cycle decomposition of the clustering-difference graph to bound the transformation distance between the clusterings. It is possible to instead construct a decomposition exhibiting potentially useful properties. For instance, solving a minimum-cost circulation problem over yields a decomposition in which has a maximum number of edges. Modifying this circulation problem can yield a decomposition in which the maximum shared degree in is minimized. Through further analysis, these extremal choices for the path-cycle decomposition might lead to better upper bounds on the transformation distance.
Finally, we note that the transformation distance is formally a metric. Hence, if we are able to compute , we can interpret it as a measure of the distance between given -clusterings of the same data set. There is significant interest in comparing clusterings in the literature [14, 22]. However, most measures typically do not take into account the potential labels of the clusters and are instead based on pairwise relationships among the items. Our new metric takes a fundamentally different approach to measuring the difference between clusterings, motivating a comparative study.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. Balinski and A. Russakoff. On the assignment polytope. SIAM Review , 16(4):516–525, 1974.
- 2[2] S. Borgwardt. On the diameter of partition polytopes and vertex-disjoint cycle cover. Mathematical Programming, Ser. A , 141(1):1–20, 2013.
- 3[3] S. Borgwardt, A. Brieden, and P. Gritzmann. Mathematics in agriculture and forestry: Geometric clustering for land consolidation. IFORMS news , Dec. issue, 2013.
- 4[4] S. Borgwardt, A. Brieden, and P. Gritzmann. Geometric clustering for the consolidation of farmland and woodland. The Mathematical Intelligencer , 36(2):37–44, 2014.
- 5[5] S. Borgwardt, A. Brieden, and P. Gritzmann. Geometrisches Clustering: Mathematik für die Flurverbesserung (Geometric clustering: Mathematics for land improvement). Mitteilungen der DMV , 23:82–90, 2015.
- 6[6] S. Borgwardt, E. Finhold, and R. Hemmecke. On the circuit diameter of dual transportation polyhedra. SIAM Journal on Discrete Mathematics , 29(1):113–121, 2016.
- 7[7] S. Borgwardt and F. Happach. Good Clusterings Have Large Volume. Operations Research , 67(1):215–231, 2019.
- 8[8] S. Borgwardt, J. A. De Loera, and E. Finhold. Edges vs circuits: a hierarchy of diameters in polyhedra. Advances in Geometry , 16(4):511–530, 2016.
