Constructing Clustering Transformations

Steffen Borgwardt; Charles Viss

arXiv:1904.05406·math.OC·April 6, 2020

Constructing Clustering Transformations

Steffen Borgwardt, Charles Viss

PDF

TL;DR

This paper introduces methods for transforming one clustering into another using linear programming and network theory, providing a new metric for clustering differences and bounds on the related partition polytopes.

Contribution

It presents a novel approach for clustering transformation based on a clustering-difference graph and elementary moves, along with bounds on the circuit diameter of partition polytopes.

Findings

01

Developed a clustering-difference graph model for transformations.

02

Provided methods for decomposing transformations into elementary moves.

03

Established bounds on the circuit diameter of partition polytopes.

Abstract

Clustering is one of the fundamental tasks in data analytics and machine learning. In many situations, different clusterings of the same data set become relevant. For example, different algorithms for the same clustering task may return dramatically different solutions. We are interested in applications in which one clustering has to be transformed into another; e.g., when a gradual transition from an old solution to a new one is required. In this paper, we devise methods for constructing such a transition based on linear programming and network theory. We use a so-called clustering-difference graph to model the desired transformation and provide methods for decomposing the graph into a sequence of elementary moves that accomplishes the transformation. These moves are equivalent to the edge directions, or circuits, of the underlying partition polytopes. Therefore, in addition to a…

Equations30

i = 1 \sum k y_{ij}

i = 1 \sum k y_{ij}

j = 1 \sum n y_{ij}

j = 1 \sum n y_{ij}

y_{ij}

d (C, C^{'}) \leq η_{i_{1}} + η_{i_{2}} + \frac{1}{2} i = 1 \sum k δ_{i},

d (C, C^{'}) \leq η_{i_{1}} + η_{i_{2}} + \frac{1}{2} i = 1 \sum k δ_{i},

d (C, C^{'}) \leq η_{i_{1}} + max {η_{i_{2}}, \frac{1}{2} i = 1 \sum k δ_{i}},

d (C, C^{'}) \leq η_{i_{1}} + max {η_{i_{2}}, \frac{1}{2} i = 1 \sum k δ_{i}},

(η_{i_{1}} - η_{i_{2}}) + 2 η_{i_{2}} + max {\frac{1}{2} i = 1 \sum k δ_{i} - η_{i_{2}}, 0} = η_{i_{1}} + max {η_{i_{2}}, \frac{1}{2} i = 1 \sum k δ_{i}} .

(η_{i_{1}} - η_{i_{2}}) + 2 η_{i_{2}} + max {\frac{1}{2} i = 1 \sum k δ_{i} - η_{i_{2}}, 0} = η_{i_{1}} + max {η_{i_{2}}, \frac{1}{2} i = 1 \sum k δ_{i}} .

κ_{1}^{+} + κ_{2}^{+} + \frac{1}{2} i \neq = i_{1}, i_{2} \sum k (κ_{i}^{+} - κ_{i}^{-}) .

κ_{1}^{+} + κ_{2}^{+} + \frac{1}{2} i \neq = i_{1}, i_{2} \sum k (κ_{i}^{+} - κ_{i}^{-}) .

η_{j_{1}} + η_{j_{2}} + \frac{1}{2} i = 1 \sum k δ_{i},

η_{j_{1}} + η_{j_{2}} + \frac{1}{2} i = 1 \sum k δ_{i},

κ_{j_{1}}^{+} + κ_{j_{2}}^{+} + \frac{1}{2} i = 1 \sum k (κ_{i}^{+} - κ_{i}^{-}) .

κ_{j_{1}}^{+} + κ_{j_{2}}^{+} + \frac{1}{2} i = 1 \sum k (κ_{i}^{+} - κ_{i}^{-}) .

η_{j_{1}} + η_{j_{2}} + \frac{1}{2} i = 1 \sum k δ_{i}

η_{j_{1}} + η_{j_{2}} + \frac{1}{2} i = 1 \sum k δ_{i}

\leq i = j_{1}, j_{2} \sum (κ_{i}^{+} - \frac{1}{2} δ_{i}) + \frac{1}{2} i \neq = j_{1}, j_{2} \sum k δ_{i} .

κ_{1}^{+} + max ⎩ ⎨ ⎧ κ_{2}^{+}, \frac{1}{2} i \neq = i_{1} \sum k (κ_{i}^{+} - κ_{i}^{-}) ⎭ ⎬ ⎫ + 2 (k - 2) .

κ_{1}^{+} + max ⎩ ⎨ ⎧ κ_{2}^{+}, \frac{1}{2} i \neq = i_{1} \sum k (κ_{i}^{+} - κ_{i}^{-}) ⎭ ⎬ ⎫ + 2 (k - 2) .

i = 1 \sum k ∣ C_{i} ∣ \geq 2 + i = 1 \sum k (κ_{i}^{+} - 1) > 2 + (n + k - 2) - k = n .

i = 1 \sum k ∣ C_{i} ∣ \geq 2 + i = 1 \sum k (κ_{i}^{+} - 1) > 2 + (n + k - 2) - k = n .

η_{j_{1}} + max {η_{j_{2}}, \frac{1}{2} i = 1 \sum k δ_{i}},

η_{j_{1}} + max {η_{j_{2}}, \frac{1}{2} i = 1 \sum k δ_{i}},

max ⎩ ⎨ ⎧ η_{j_{1}} + η_{j_{2}}, (η_{j_{1}} + \frac{1}{2} δ_{j_{1}}) + \frac{1}{2} i \neq = j_{1} \sum k δ_{i} ⎭ ⎬ ⎫

max ⎩ ⎨ ⎧ η_{j_{1}} + η_{j_{2}}, (η_{j_{1}} + \frac{1}{2} δ_{j_{1}}) + \frac{1}{2} i \neq = j_{1} \sum k δ_{i} ⎭ ⎬ ⎫

\leq max ⎩ ⎨ ⎧ κ_{j_{1}}^{+} + κ_{j_{2}}^{+}, κ_{j_{1}}^{+} + \frac{1}{2} i \neq = j_{1} \sum k (κ_{i}^{+} - κ_{i}^{-}) ⎭ ⎬ ⎫

\leq κ_{1}^{+} + max ⎩ ⎨ ⎧ κ_{2}^{+}, \frac{1}{2} i \neq = i_{1} \sum k (κ_{i}^{+} - κ_{i}^{-}) ⎭ ⎬ ⎫ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

11institutetext: 11email: [email protected]; University of Colorado Denver 22institutetext: 22email: [email protected]; University of Colorado Denver

Constructing Clustering Transformations

Steffen Borgwardt 11

Charles Viss 22

Abstract

Clustering is one of the fundamental tasks in data analytics and machine learning. In many situations, different clusterings of the same data set become relevant. For example, different algorithms for the same clustering task may return dramatically different solutions. We are interested in applications in which one clustering has to be transformed into another; e.g., when a gradual transition from an old solution to a new one is required.

In this paper, we devise methods for constructing such a transition based on linear programming and network theory. We use a so-called clustering-difference graph to model the desired transformation and provide methods for decomposing the graph into a sequence of elementary moves that accomplishes the transformation. These moves are equivalent to the edge directions, or circuits, of the underlying partition polytopes. Therefore, in addition to a conceptually new metric for measuring the distance between clusterings, we provide new bounds on the circuit diameter of these partition polytopes.

Keywords: partitioning, clustering, polyhedra, circuits, diameter, linear programming

MSC: 52B05, 90C05, 90C08, 90C27

1 Introduction and Preliminaries

Clusterings of large data sets play an important role in data analytics, machine learning, and informed decision-making in general. In many applications, there exists a desired clustering corresponding to an optimal solution to an optimization problem. However, directly implementing such a solution can be challenging – instead, a gradual sequence of transitions which transforms an initial, sub-optimal solution into the improved clustering is desired.

Consider the example in land consolidation from [3, 4, 5]. In a Bavarian agricultural region, 471 lots are cultivated by 7 different farmers. The initial distribution of the ownership of the lots, depicted in Figure 1(a), is quite problematic – the scattered, small lots result in large transportation overhead and prohibit the use of heavy machinery. To address this, the authors worked with the Bavarian State to facilitate a voluntary land exchange among the farmers in which the boundaries of the lots would remain the same while the cultivation rights for the lots would be redistributed.

The combinatorial redistribution of lots can be modeled as a clustering problem: a set of data points (the lots) must be partitioned into clusters (the farmers) under the restriction that each farmer keeps his original total value of land. This restriction makes it provably hard to determine an optimal redistribution of the lots, but it is possible to compute an approximation of the global optimum based on linear programming over projections of transportation polytopes [4]. A computed solution in which the lots form large, connected sections of land is depicted in Figure 1(b).

However, such a radical redistribution of lots among the farmers (more than 70% of the lots change ownership from Figure 1(a) to 1(b)) cannot realistically take place all at once. Crop rotations, required machinery, and other processes for farming stability must be respected. Therefore, the farmers requested a “best” way to gradually implement the proposed changes over the course of several years. In this paper, we propose methods for constructing such a transformation between clusterings based on linear programming and network theory.

The need for the construction of an efficient gradual transition to a new, given clustering also arises in many other applications. For example, an insurance company may want to gradually transition their customers to a new clustering of premium classes. In other situations, there are snapshots of the same data set at different times and the goal is to devise a model that explains how the data gradually changed over time.

In general, we consider partitions of a data set $X:=\{x_{1},...,x_{n}\}$ into $k$ labeled clusters where each item $x_{i}$ is assigned to exactly one of the clusters. We call such a partition $\mathcal{C}:=(C_{1},...,C_{k})$ a $k$ -clustering of $X$ , or simply a clustering of $X$ when $k$ is clear from context. As is the case in most clustering applications, we assume that the number of items is significantly greater than the number of clusters; i.e., $n\gg k$ .

We additionally consider situations in which upper and lower bounds are given for the sizes of the clusters. Such bounds may arise directly from an application itself or may be introduced to guide clustering algorithms to return sufficiently balanced solutions. Specifically, let $\kappa^{+},\kappa^{-}\in{\mathbb{Z}}_{+}^{k}$ with $\kappa^{+}\geq\kappa^{-}$ be given. A bounded-size $k$ -clustering of $X$ with respect to $\kappa^{+}$ and $\kappa^{-}$ then satisfies $\kappa_{i}^{-}\leq|C_{i}|\leq\kappa_{i}^{+}$ for $i=1,...,k$ . This concept generalizes the fixed-size $k$ -clusterings from [2] in which each cluster contains a fixed number of items (i.e., a bounded-size $k$ -clustering with $\kappa^{+}=\kappa^{-}$ ). The classical transportation problem and the assignment problem, along with their related polytopes, are well-studied topics in optimization corresponding to fixed-size clusterings [1, 13, 21].

For a data set $X$ and cluster size bounds $\kappa^{+},\kappa^{-}\in{\mathbb{Z}}_{+}^{k}$ , the bounded-size partition polytope ${BPP(\kappa^{+},\kappa^{-})}$ models the set of all bounded-size $k$ -clusterings of $X$ with respect to $\kappa^{+}$ and $\kappa^{-}$ [11]. Specifically, for $i=1,...,k$ and $j=1,...,n$ , let $y_{ij}$ indicate whether or not cluster $C_{i}$ receives item $x_{j}$ in a $k$ -clustering $\mathcal{C}=(C_{1},...,C_{k})$ of $X$ . Then $BPP(\kappa^{+},\kappa^{-})$ is given by the following system of constraints:

[TABLE]

Since these constraints form a totally unimodular matrix, the right-hand side is integral, and each $y_{ij}$ is implicitly bounded between 0 and 1, $BPP(\kappa^{+},\kappa^{-})$ is in fact a 0/1-polytope whose vertices correspond to the feasible $k$ -clusterings of $X$ . For $\kappa^{+}=\kappa^{-}$ , this polytope generalizes the fixed-size partition polytope from [2]. It is also an instance of the bounded-shape partition polytope from [7] when $X$ is the standard basis of $\mathbb{R}^{n}$ .

In the fixed-size partition polytope, the edges also have a combinatorial interpretation: two vertices share an edge if and only if the corresponding clusterings differ by a single cyclical move of items among the clusters, formally defined in Section 2. This fact can be used to prove new bounds on the combinatorial diameter of the polytope – the maximum length of a shortest edge walk between any pair of vertices – and provide practical methods for constructing transformations between fixed-size clusterings [2]. In this paper, we generalize these methods to the bounded-size partition polytope $BPP$ . Although the edges of this polytope have a more technical characterization, its circuits correspond to a set of natural cyclical and sequential moves of items among the clusters [11]. However, constructing transformations using these two types of moves is significantly more challenging than using only cyclical moves due to their different effects on the sizes of the underlying clusters.

Circuits, introduced as the elementary vectors of a subspace by Rockafellar [24], play a fundamental role in the theory of linear programming. For a general polyhedron $P=\{{\mathchoice{\mbox{\boldmath$ \displaystyle\bf x $}}{\mbox{\boldmath$ \textstyle\bf x $}}{\mbox{\boldmath$ \scriptstyle\bf x $}}{\mbox{\boldmath$ \scriptscriptstyle\bf x $}}}\in\mathbb{R}^{n}\colon A{\mathchoice{\mbox{\boldmath$ \displaystyle\bf x $}}{\mbox{\boldmath$ \textstyle\bf x $}}{\mbox{\boldmath$ \scriptstyle\bf x $}}{\mbox{\boldmath$ \scriptscriptstyle\bf x $}}}={\mathchoice{\mbox{\boldmath$ \displaystyle\bf b $}}{\mbox{\boldmath$ \textstyle\bf b $}}{\mbox{\boldmath$ \scriptstyle\bf b $}}{\mbox{\boldmath$ \scriptscriptstyle\bf b $}}},B{\mathchoice{\mbox{\boldmath$ \displaystyle\bf x $}}{\mbox{\boldmath$ \textstyle\bf x $}}{\mbox{\boldmath$ \scriptstyle\bf x $}}{\mbox{\boldmath$ \scriptscriptstyle\bf x $}}}\leq{\mathchoice{\mbox{\boldmath$ \displaystyle\bf d $}}{\mbox{\boldmath$ \textstyle\bf d $}}{\mbox{\boldmath$ \scriptstyle\bf d $}}{\mbox{\boldmath$ \scriptscriptstyle\bf d $}}}\}$ , the set of circuits of $P$ consists of all ${\mathchoice{\mbox{\boldmath$ \displaystyle\bf g $}}{\mbox{\boldmath$ \textstyle\bf g $}}{\mbox{\boldmath$ \scriptstyle\bf g $}}{\mbox{\boldmath$ \scriptscriptstyle\bf g $}}}\in\ker(A)\setminus\{\mathchoice{\mbox{\boldmath$ \displaystyle\bf 0 $}}{\mbox{\boldmath$ \textstyle\bf 0 $}}{\mbox{\boldmath$ \scriptstyle\bf 0 $}}{\mbox{\boldmath$ \scriptscriptstyle\bf 0 $}}\}$ normalized to coprime integer components for which $B{\mathchoice{\mbox{\boldmath$ \displaystyle\bf g $}}{\mbox{\boldmath$ \textstyle\bf g $}}{\mbox{\boldmath$ \scriptstyle\bf g $}}{\mbox{\boldmath$ \scriptscriptstyle\bf g $}}}$ is support-minimal over $\{B{\mathchoice{\mbox{\boldmath$ \displaystyle\bf x $}}{\mbox{\boldmath$ \textstyle\bf x $}}{\mbox{\boldmath$ \scriptstyle\bf x $}}{\mbox{\boldmath$ \scriptscriptstyle\bf x $}}}\colon{\mathchoice{\mbox{\boldmath$ \displaystyle\bf x $}}{\mbox{\boldmath$ \textstyle\bf x $}}{\mbox{\boldmath$ \scriptstyle\bf x $}}{\mbox{\boldmath$ \scriptscriptstyle\bf x $}}}\in\ker(A)\setminus\{\mathchoice{\mbox{\boldmath$ \displaystyle\bf 0 $}}{\mbox{\boldmath$ \textstyle\bf 0 $}}{\mbox{\boldmath$ \scriptstyle\bf 0 $}}{\mbox{\boldmath$ \scriptscriptstyle\bf 0 $}}\}\}$ . Geometrically, circuits correspond to all potential edge directions of $P$ as the right-hand side vectors $\textstyle\bf b$ and $\textstyle\bf d$ vary. This implies that the set of circuits serves as a universal test set for any linear program over $P$ [15]. Hence, circuits are used as the step directions in several augmentation algorithms for solving linear programs [10, 12, 16, 17].

Additionally, for polyhedra such as $BPP$ defined by totally unimodular matrices, circuits have combinatorial interpretations in terms of the underlying problem. The support-minimality of circuits implies that, in a sense, steps taken in circuit directions are as simple as possible while maintaining feasibility. In combination with highly-structured problems from combinatorial optimization, these steps become particularly intuitive and are guaranteed to only visit integral points [11].

Therefore, in this paper, we choose the circuits of $BPP$ as the elementary moves for transforming $k$ -clusterings. In building such transformations using these circuits, we construct circuit walks between the corresponding vertices in the polytope. As a generalization of edge walks, circuit walks are of interest due to their relationship to the combinatorial diameter of polyhedra [6, 8, 9]. The circuit distance between vertices refers to the minimum number of steps needed to join the vertices by a circuit walk and hence provides a lower bound on the combinatorial distance between the vertices. The related circuit diameter of a polyhedron – the minimum number of steps needed to join any pair of vertices by a circuit walk – then serves as a lower bound on its combinatorial diameter. In particular, circuit diameters provide insight into the polynomial Hirsch Conjecture, one of the fundamental open questions in linear programming. See [19] for a survey of this field of study.

To build transformations between clusterings, we use one of the main tools from [2] for analyzing fixed-size clusterings. Given two $k$ -clusterings $\mathcal{C},\mathcal{C}^{\prime}$ of the same data set, the clustering-difference graph $CDG(\mathcal{C},\mathcal{C}^{\prime})$ is a directed graph that models the transfers of items required for transforming $\mathcal{C}$ into $\mathcal{C}^{\prime}$ . In the context of fixed-size clusterings, such a graph decomposes into directed cycles. Sets of cyclical moves can then be integrated together in order to bound the combinatorial distance between vertices in the fixed-size partition polytope [2].

We generalize this graph-theoretic approach for constructing transformations between clusterings to the context of general and bounded-size $k$ -clusterings. In these situations, both cycles and paths in a clustering-difference graph correspond to circuits of the related polytopes. Integrating these different types of moves becomes much more technically challenging than integrating only cyclical moves since sequential moves alter the sizes of the underlying clusters. In Section 2, we show how this is possible for certain combinations of cyclical and sequential moves, providing various double-moves which reduce the number of steps needed for transforming clusterings (Theorems 2.1 and 2.2). Next, we prove in Section 3 how these double-moves lead to an upper bound (Theorem 3.1) on the so-called transformation distance between $k$ -clusterings: a relaxation of the circuit distance. In Section 4 we then prove the implications of this bound on the circuit diameter of the bounded-size partition polytope (Theorem 4.1). We end with a brief discussion on future directions of research in Section 5.

2 Moves and Double-Moves for Transforming Clusterings

Let $\mathcal{C},\mathcal{C}^{\prime}$ be two $k$ -clusterings of the same data set. We recall from [2] the definition of the clustering-difference graph ${CDG(\mathcal{C},\mathcal{C}^{\prime})}$ from $\mathcal{C}$ to $\mathcal{C^{\prime}}$ , a graph-theoretic model for the difference between the two clusterings.

Definition 1 (Clustering-difference Graph)

For two $k$ -clusterings $\mathcal{C}=(C_{1},...,C_{k})$ and ${\mathcal{C}}^{\prime}=(C^{\prime}_{1},...,C^{\prime}_{k})$ of a data set $X=\{x_{1},...,x_{n}\}$ , the clustering-difference graph ${CDG(\mathcal{C},\mathcal{C}^{\prime})}$ from $\mathcal{C}$ to $\mathcal{C}^{\prime}$ is a directed arc-labeled multigraph with vertex set $V=\{c_{1},...,c_{k}\}$ and edge set $A$ , where an edge $(c_{i},c_{j})$ with label $x_{\ell}$ belongs to $A$ if and only if $x_{\ell}\in C_{i}$ and $x_{\ell}\in C^{\prime}_{j}$ for $i\neq j$ .

Thus, the edges of ${CDG(\mathcal{C},\mathcal{C}^{\prime})}$ describe all single-item transfers needed to transform ${\mathcal{C}}$ into ${\mathcal{C}}^{\prime}$ . The number of edges is equal to the number of items whose cluster assignment differs from ${\mathcal{C}}$ to ${\mathcal{C}}^{\prime}$ – if an edge $(c_{i},c_{j})$ with label $x_{\ell}$ belongs to $A$ , item $x_{\ell}$ must be moved from cluster $i$ to cluster $j$ as a part of the transformation. Note that this allows for parallel edges in ${CDG(\mathcal{C},\mathcal{C}^{\prime})}$ , but all edges have different labels. We refer to the transfers of a $CDG$ consisting of a single directed cycle as a cyclical move of items among the clusters. Similarly, the set of transfers from a directed path is referred to as a sequential move of items. See Figure 2 for examples of these two types of moves with corresponding clustering-difference graphs.

The clustering-difference graph plays an important role in the analysis of bounded-size and fixed-size partition polytopes. For the fixed-size partition polytope, a $CDG$ decomposes into directed cycles since all vertices in the graph must have equal indegree and outdegree. Two vertices in the polytope then share an edge if and only if the corresponding $CDG$ consists of a single directed cycle. This characterization has been used to construct edge walks between vertices of the polytope using sequences of cyclical moves, resulting in upper bounds on the combinatorial diameter [2].

Although the edges of the bounded-size partition polytope have a more complicated characterization, its circuits analogously correspond to clustering-difference graphs consisting of either a single directed cycle or a single directed path [11]. Hence, devising sequences of cyclical and sequential moves for transforming ${\mathcal{C}}$ into ${\mathcal{C}}^{\prime}$ can be interpreted as constructing a circuit walk between the corresponding vertices in the polytope. As we will show in Sections 3 and 4, this allows us to bound the circuit distance between vertices and the related circuit diameter of the bounded-size partition polytope.

Therefore, we are interested in constructing a transformation from ${\mathcal{C}}$ to ${\mathcal{C}}^{\prime}$ using as few of these cyclical and sequential moves as possible. We call the minimum number of required moves the transformation distance $d({\mathcal{C}},{\mathcal{C}}^{\prime})$ from ${\mathcal{C}}$ to ${\mathcal{C}}^{\prime}$ .

Definition 2 (Transformation Distance)

For $k$ -clusterings ${\mathcal{C}},{\mathcal{C}}^{\prime}$ of the same data set, the transformation distance $d({\mathcal{C}},{\mathcal{C}}^{\prime})$ is the minimum number of cyclical and sequential moves needed to transform ${\mathcal{C}}$ into ${\mathcal{C}}^{\prime}$ .

Hence, $d({\mathcal{C}},{\mathcal{C}}^{\prime})$ is a relaxation of the circuit distance between the corresponding vertices of the bounded-size partition polytope – as long as no cluster size constraints are violated during a sequence of moves used to achieve $d({\mathcal{C}},{\mathcal{C}}^{\prime})$ , the two distances are equal.

A naive approach for bounding $d({\mathcal{C}},{\mathcal{C}}^{\prime})$ is to decompose ${CDG(\mathcal{C},\mathcal{C}^{\prime})}$ into paths and cycles and then simply apply the corresponding moves individually to perform the clustering transformation. However, even when such a decomposition is optimal, $d({\mathcal{C}},{\mathcal{C}}^{\prime})$ can be significantly less than the number of parts in the decomposition. For example, Figure 3(a) depicts a case in which ${CDG(\mathcal{C},\mathcal{C}^{\prime})}$ consists of four vertex-disjoint cycles, which trivially implies $d({\mathcal{C}},{\mathcal{C}}^{\prime})\leq 4$ . Nevertheless, due to the fact that any permutation can be expressed as the product of two cyclic permutations [1], only two cyclical moves are needed to transform ${\mathcal{C}}$ into ${\mathcal{C}}^{\prime}$ , implying $d({\mathcal{C}},{\mathcal{C}}^{\prime})=2$ . For simplicity, we will use the term disjoint in place of vertex-disjoint throughout the remainder of the paper. Whenever two components of a $CDG$ are not vertex-disjoint, we will say that they intersect.

Proposition 1 ([1], Lemma 7 in [2])

Let ${\mathcal{C}},{\mathcal{C}}^{\prime}$ be $k$ -clusterings for which ${CDG(\mathcal{C},\mathcal{C}^{\prime})}$ decomposes into disjoint cycles. Then $d({\mathcal{C}},{\mathcal{C}}^{\prime})\leq 2$ .

In [2], Proposition 1 is used to integrate disjoint cyclical moves into what we will call a double-move: a sequence of two moves which results in the desired changes to the underlying clustering-difference graph. See Figure 3 for a visualization of this double-move. In a first cyclical move of items, depicted in Figure 3(b), the transfers corresponding to all but one edge from each cycle are correctly applied – the items corresponding to the remaining edges are temporarily sent to incorrect destinations across the cycles. However, a second cyclical move, depicted in Figure 3(c), can then be used to send each of these misplaced items to its correct destination, completing the clustering transformation.

In the following lemmas and the upcoming Theorems 2.1 and 2.2, we show how similar double-moves can be used to integrate both cyclical and sequential moves when transforming general $k$ -clusterings. This becomes more technically challenging due to the different natures of the two types of moves – cyclical moves do not alter the sizes of any underlying cluster while sequential moves alter the sizes of exactly two clusters by one. First, generalizing Proposition 1, we show how to integrate sets of disjoint cycles and paths from a $CDG$ . Note that the bound on $d({\mathcal{C}},{\mathcal{C}}^{\prime})$ in the following lemma depends only on the number of paths in the decomposition and not on the number of cycles.

Lemma 1

Let ${\mathcal{C}},{\mathcal{C}}^{\prime}$ be $k$ -clusterings for which $CDG({\mathcal{C}},{\mathcal{C}}^{\prime})$ decomposes into disjoint cycles and paths. Then $d({\mathcal{C}},{\mathcal{C}}^{\prime})\leq\max\{2,\ t\}$ , where $t$ is the number of paths in the decomposition.

Proof

Let $t$ denote the number of directed paths and $s$ the number of directed cycles in the decomposition of ${CDG(\mathcal{C},\mathcal{C}^{\prime})}$ . When $t=0$ , the result follows from Proposition 1, so assume $t\geq 1$ . We may also assume $s\geq 1$ , else the $t$ sequential moves could simply be applied individually.

First suppose $t=2$ . Let $P_{1},P_{2}$ denote the paths of ${CDG(\mathcal{C},\mathcal{C}^{\prime})}$ and let $Y_{1},...,Y_{s}$ denote the cycles. For $i=1,...,s$ , select any edge $e_{i}:=(u_{i},v_{i})$ from cycle $Y_{i}$ . In addition, let $p_{1}$ denote the tail of $P_{1}$ , $p_{2}$ the tail of $P_{2}$ , and $w$ the neighbor of $p_{1}$ on $P_{1}$ . See Figure 4(a). We will apply a double-move consisting of two sequential moves to transform clustering ${\mathcal{C}}$ into ${\mathcal{C}}^{\prime}$ .

For the first sequential move, first introduce an edge from $p_{1}$ to $v_{1}$ , sending to $v_{1}$ the item from $p_{1}$ intended for $w$ . After traveling along this edge, follow $Y_{1}-e_{1}$ from $v_{1}$ to $u_{1}$ . Next, if $s>1$ , introduce an edge from $u_{1}$ to $v_{2}$ , whose item associated with $e_{1}$ , and then travel along $Y_{2}-e_{2}$ to $u_{2}$ . Repeat for $i=2,...,s-1$ until $u_{s}$ is reached. Complete the move by introducing an edge from $u_{s}$ to $p_{2}$ , whose item is associated with $e_{s}$ , and then traveling along path $P_{2}$ . See the blue edges in Figure 4(b) for a visualization of this move.

Note that this first sequential move applies all transfers given by the cycles $Y_{1},...,Y_{s}$ in $CDG({\mathcal{C}},{\mathcal{C}}^{\prime})$ with the exception of those corresponding to the edges $e_{i}$ – the item from $v_{i}$ intended for $u_{i}$ is temporarily sent to the wrong cluster. The move also applies all transfers given by $P_{2}$ but does not correctly apply any transfer from $P_{1}$ . However, all misplaced items can be corrected and all remaining transfers can be applied via a single, second sequential move as seen in Figure 4(c). First, send from $p_{2}$ to $v_{s}$ the item $p_{2}$ received from $u_{s}$ . Next, for $i=s,...,2$ , send from $v_{i}$ to $v_{i-1}$ the item $v_{i}$ received from $u_{i-1}$ . Once $v_{1}$ is reached, send from $v_{1}$ to $w$ the item $v_{1}$ received from $p_{1}$ . Finish the move by following the remaining edges of $P_{1}$ . After this second sequential move is applied, the transformation from ${\mathcal{C}}$ to ${\mathcal{C}}^{\prime}$ is complete – it follows that $d({\mathcal{C}},{\mathcal{C}}^{\prime})=2$ . See Figure 4(d) for a visualization of these moves integrated together into a double-move

Next suppose $t>2$ . We can apply the double-move from the previous case to remove all cycles and any two paths from ${CDG(\mathcal{C},\mathcal{C}^{\prime})}$ . The remaining $t-2$ paths can then be applied via individual sequential moves. Hence, $d({\mathcal{C}},{\mathcal{C}}^{\prime})\leq t$ .

Lastly, suppose $t=1$ . Again let $P_{1}$ denote the single path of $CDG({\mathcal{C}},{\mathcal{C}}^{\prime})$ , let $p_{1}$ denote the tail of $P_{1}$ , let $w$ denote the neighbor of $p_{1}$ on $P_{1}$ , and choose an edge $e_{i}:=(u_{i},v_{i})$ from $Y_{i}$ for $i=1,...,s$ . We will apply a sequential move followed by a cyclical move to transform ${\mathcal{C}}$ into ${\mathcal{C}}^{\prime}$ . See Figure 5 for a visualization of the double-move.

For the first move, as in the previous case, first introduce an edge from $p_{1}$ to $v_{1}$ and then travel along $Y_{1}-e_{1}$ from $v_{1}$ to $u_{1}$ . If $s>1$ , then for $i=1,...,s-1$ , introduce an edge from $u_{i}$ to $v_{i}$ and follow $Y_{i}-e_{i}$ to $u_{i}$ until $u_{s}$ is reached. Complete the sequential move by introducing an edge from $u_{s}$ to $w$ and then following the remaining edges of $P_{1}$ .

The second cyclical move corrects all items sent to incorrect destinations by the first move. Namely, first follow the edge from $w$ to $v_{s}$ , sending to $v_{s}$ the item $w$ received from $u_{s}$ . Next, for $i=s,...,2$ , follow the edges from $v_{i}$ to $v_{i-1}$ until $v_{1}$ is reached. Complete the move by following the edge from $v_{1}$ to $w$ , sending $w$ the item $v_{1}$ received from $p_{1}$ . This completes the transformation from ${\mathcal{C}}$ to ${\mathcal{C}}^{\prime}$ , so $d({\mathcal{C}},{\mathcal{C}}^{\prime})\leq 2$ .

Therefore, in each case, $d({\mathcal{C}},{\mathcal{C}}^{\prime})\leq\max\{2,\ t\}$ . $\square$

Even when paths and cycles are not completely disjoint, the corresponding moves may still be integrated together into a double-move. In the following lemma, we show that if a path intersects at most one cycle out of a collection of disjoint cycles, only two moves are required to apply all corresponding transfers.

Lemma 2

Let ${\mathcal{C}},{\mathcal{C}}^{\prime}$ be $k$ -clusterings for which $CDG({\mathcal{C}},{\mathcal{C}}^{\prime})$ consists of $s\geq 1$ disjoint cycles and a single path $P$ , which intersects at most one of the cycles. Then $d({\mathcal{C}},{\mathcal{C}}^{\prime})\leq 2$ .

Proof

Let $Y_{1},...,Y_{s}$ denote the cycles of $CDG({\mathcal{C}},{\mathcal{C}}^{\prime})$ . If $P$ does not intersect any of the cycles, we can apply Lemma 1, so assume $P$ intersects $Y_{1}$ . We may also assume $s\geq 2$ , else the sequential and cyclical moves corresponding to $P$ and $Y_{1}$ could simply be applied individually. Let $w_{1},...,w_{t}$ denote (in order) the vertices of $P$ , and select an edge $e_{i}:=(u_{i},v_{i})$ from each cycle $Y_{i}$ . Assume $e_{1}$ is chosen such that $v_{1}$ is the first vertex $w_{j}$ of $P$ that also belongs to $Y_{1}$ . (Note that it is possible to have $j=1$ or $j=t$ , but both of those cases are still covered by the following construction.) We will apply two sequential moves to transform ${\mathcal{C}}$ into ${\mathcal{C}}^{\prime}$ . See Figure 6 for a visualization of $CGD({\mathcal{C}},{\mathcal{C}}^{\prime})$ along with the double-move.

For the first sequential move, start at $w_{1}$ and follow $P$ to $w_{j}=v_{1}$ . Then follow the path formed by joining $Y_{1}-e_{1},...,Y_{s}-e_{s}$ with the introduced edges $(u_{1},v_{2}),...,(u_{s-1},v_{s})$ , terminating the move at $u_{s}$ . Hence, the move reduces the size of the cluster corresponding to $w_{1}$ , as desired, but it also increases the size of the cluster corresponding to $u_{s}$ .

To correct this, apply a second sequential move starting at $u_{s}$ and terminating at $w_{t}$ . First follow the edges $(u_{s},v_{s}),(v_{s},v_{s-1}),...,(v_{2},v_{1})$ to correct items misplaced across cycles. Next, since $v_{1}=w_{j}$ , follow $P$ along the vertices $w_{j},...,w_{t}$ to complete the move. Since $P$ only intersects $Y_{1}$ , no vertices are repeated in this second sequential move and it indeed corresponds to a single directed path in the $CDG$ . All transfers corresponding to the original edges of $CDG({\mathcal{C}},{\mathcal{C}}^{\prime})$ have then been correctly applied. $\square$

Note that in the double-move used for Lemma 2, the first sequential move temporarily increases the size of the cluster corresponding to $u_{s}$ . For general $k$ -clusterings this is not an issue, but for bounded-size $k$ -clusterings this could potentially violate the upper bound on the size of the cluster. Unfortunately, this increase in cluster size is unavoidable for certain configurations of ${CDG(\mathcal{C},\mathcal{C}^{\prime})}$ . Consider the example in Figure 7. If only two moves are to be used to transform ${\mathcal{C}}$ into ${\mathcal{C}}^{\prime}$ , $c_{1}$ must correctly send an item to $c_{2}$ in each move. Hence, $c_{1}$ may not temporarily receive an item from $c_{3}$ or $c_{4}$ in the first move, and the cluster size corresponding to either $c_{3}$ or $c_{4}$ must be temporarily increased. Therefore, although $d({\mathcal{C}},{\mathcal{C}}^{\prime})=2$ , the circuit distance between the corresponding vertices in a related bounded-size partition polytope may be 3 if the cluster sizes corresponding to $c_{3}$ and $c_{4}$ are already at their upper bounds. We further address this issue in Section 4.

When a path intersects multiple cycles from a set of disjoint cycles, integrating the corresponding moves as in the double-move from Lemma 2 becomes more challenging since we can no longer guarantee that the second sequential move corresponds to a single directed path in the underlying $CDG$ . However, we can ensure that transfers corresponding to at least the first and last edges of the path are applied in conjunction with the cycles. We show in the upcoming theorem that given any path $P=w_{1}...w_{t}$ and a set $\mathcal{Y}$ of disjoint cycles, a double-move can be used to correctly apply all transfers from $\mathcal{Y}$ while decreasing the cluster size corresponding to $w_{1}$ and increasing the cluster size corresponding to $w_{t}$ . Furthermore, such a double-move will then allow us to completely integrate $P$ with $\mathcal{Y}$ as long as $P$ does not intersect the cycles of $\mathcal{Y}$ more than three times.

Recall that in a clustering-difference graph ${CDG(\mathcal{C},\mathcal{C}^{\prime})}$ , the outdegree of a vertex is equal to the number of items which must be moved from the corresponding cluster to perform the clustering transformation. Note that in order for the outdegree to be reduced, a correct item must be sent from the cluster to a new destination; however, this destination cluster need not actually be the other endpoint of the corresponding edge in ${CDG(\mathcal{C},\mathcal{C}^{\prime})}$ . On the other hand, the indegree of a vertex gives the number of items which must be moved to the corresponding cluster. For the indegree to be reduced, a correct item must be received by the vertex, but it does not matter which cluster actually sends the item. We call the minimum of the indegree and outdegree of a vertex the shared degree of that vertex in the $CDG$ . Hence, applying a set of disjoint cyclical moves reduces the shared degree of all covered vertices by one. When integrating a path with these cyclical moves, this reduction in shared degree should still occur in order to make all desired improvements to the underlying $CDG$ . In the following theorem, we provide four different double-moves to accomplish this task. The type of double-move to use depends on the intersection points of the path with the cycles.

Theorem 2.1

Let ${\mathcal{C}},{\mathcal{C}}^{\prime}$ be $k$ -clusterings with clustering-difference graph $D:=CDG({\mathcal{C}},{\mathcal{C}}^{\prime})$ , let $\mathcal{Y}=Y_{1},...,Y_{s}$ be a set of disjoint cycles in $D$ , and let $P=w_{1}...w_{t}$ be a path in $D$ that is edge-disjoint from $\mathcal{Y}$ . There exists a double-move which accomplishes all of the following:

Correctly applies all transfers from $\mathcal{Y}$ 2. 2.

Reduces the cluster size corresponding to $w_{1}$ through sending a correct item 3. 3.

Increases the cluster size corresponding to $w_{t}$ through receiving a correct item 4. 4.

Decreases the shared degree of each vertex covered by $\mathcal{Y}$ by at least one.

Proof

If $s=1$ or if $P$ is disjoint from $\mathcal{Y}$ , we can apply Lemma 2, so assume $s\geq 2$ . Let $w_{i_{1}}$ denote the first vertex of $P$ covered by $\mathcal{Y}$ , and let $w_{i_{2}},w_{i_{3}}$ denote, respectively, the second-to-last and last vertex of $P$ covered by $\mathcal{Y}$ . Note that we have $w_{i_{1}}=w_{i_{2}}$ when $P$ intersects $\mathcal{Y}$ only twice. Similarly, if $P$ intersects $\mathcal{Y}$ only once, we let $w_{i_{1}}=w_{i_{2}}=w_{i_{3}}$ . We treat four exhaustive cases regarding the distribution of $w_{i_{1}}$ , $w_{i_{2}}$ , and $w_{i_{3}}$ across the cycles of $\mathcal{Y}$ .

Case 1: $w_{i_{1}}$ and $w_{i_{3}}$ belong to different cycles of $\mathcal{Y}$ and $w_{i_{2}}$ belongs to the same cycle as $w_{i_{3}}$ . We apply a cyclical move followed by a sequential move to perform all necessary transfers in $D$ . See Figure 8 for a visualization of the double-move.

Without loss of generality, assume $w_{i_{1}}\in Y_{1}$ and $w_{i_{2}},w_{i_{3}}\in Y_{s}$ . Choose an edge $e_{i}:=(u_{i},v_{i})$ from each cycle $Y_{i}$ . However, choose $e_{1}$ and $e_{s}$ so that $u_{1}=w_{i_{1}}$ and $v_{s}=w_{i_{2}}$ . For the cyclical move, first introduce an edge from $w_{i_{1}}$ to $w_{i_{2}}=v_{s}$ whose item corresponds to the item sent from $w_{i_{1}}$ in $P$ . (Note that we might have $w_{i_{1}}=w_{i_{2}-1}$ , in which case the edge already exists in ${CDG(\mathcal{C},\mathcal{C}^{\prime})}$ .) Next, follow $Y_{s}$ from $v_{s}$ to $u_{s}$ . Introduce an edge from $u_{s}$ to $v_{s-1}$ whose item corresponds to that of $e_{s}$ , and then travel along $Y_{s-1}-e_{s-1}$ until $u_{s-1}$ is reached. Repeat for the remaining cycles by introducing edges $(u_{i},v_{i-1})$ and traveling along $Y_{i-1}-e_{i-1}$ until $u_{1}=w_{i_{1}}$ is reached to complete the cyclical move.

Next, start a sequential move at $w_{1}$ , following $P$ to $w_{i_{1}}=u_{1}$ . Use the edge $(u_{1},v_{1})$ . Then follow the edges $(v_{i},v_{i+1})$ for $i=1,...,s-1$ until $v_{s}=w_{i_{2}}$ is reached to correct items misplaced across the cycles. Complete the sequential move by following $P$ from $w_{i_{2}}$ to $w_{i_{3}}$ and then to $w_{t}$ .

We now prove that the desired changes have been made to the underlying $CDG$ . Clearly the cluster size corresponding to $w_{1}$ is decreased and the cluster size corresponding to $w_{t}$ is increased via the second sequential move. Furthermore, each item sent by $w_{1}$ or received by $w_{t}$ is correct. As in the double-move from Figure 3, all edges from $\mathcal{Y}$ are applied through the combination of the two moves. Thus, it suffices to show that the shared degree of each vertex covered by $\mathcal{Y}$ has been reduced by at least one. The only interesting cases are $w_{i_{1}}$ , $w_{i_{2}}$ , and $w_{i_{3}}$ . In the first cyclical move, $w_{i_{1}}$ both receives and sends a correct item, reducing its shared degree by one. In the following sequential move, either $w_{i_{1}}$ only sends a correct item (if $w_{i_{1}}=w_{1}$ ) or $w_{i_{1}}$ both sends and receives a correct item, so the net reduction in shared degree of $w_{i_{1}}$ is at least one. Similarly, $w_{i_{3}}$ sends and receives a correct item in the first move and then either receives a correct item or both receives and sends a correct item in the second move.

Finally, consider $w_{i_{2}}$ . The vertex receives a possibly incorrect item from $w_{i_{1}}$ but also sends a correct item to its neighbor on $Y_{s}$ via the first cyclical move, leaving its shared degree, at worst, unchanged. In the second sequential move, $w_{i_{2}}$ receives a correct item originating from $u_{s}$ and then sends a correct item to the following vertex on $P$ . Thus, the shared degree of $w_{i_{2}}$ is also reduced by at least one, as desired.

Case 2: $w_{i_{1}}$ and $w_{i_{3}}$ belong to different cycles of $\mathcal{Y}$ and $w_{i_{2}}$ does not belong to the same cycle as $w_{i_{3}}$ . ( $w_{i_{2}}$ may or may not belong to the same cycle as $w_{i_{1}}$ , or we may have $w_{i_{1}}=w_{i_{2}}$ .) We apply a sequential move followed by a cyclical move to perform the necessary transfers. See Figure 9 for a visualization of the double-move.

Assume $w_{i_{1}}\in Y_{1}$ and $w_{i_{3}}\in Y_{s}$ . Choose an edge $e_{i}:=(u_{i},v_{i})$ from each $Y_{i}$ such that $v_{1}=w_{i_{1}}$ , $u_{s}=w_{i_{3}}$ , and $v_{i}\neq w_{i_{2}}$ for $i=2,...,s-1$ . First, travel along $P$ from $w_{1}$ to $w_{i_{1}}$ . Then for $i=1,...,s-1$ , travel along $Y_{i}-e_{i}$ and follow an introduced edge $(u_{i},v_{i+1})$ . Finish the sequential move by following $Y_{s}-e_{s}$ from $v_{s}$ to $u_{s}=w_{i_{3}}$ and then following $P$ from $w_{i_{3}}$ to $w_{t}$ .

Start constructing the following cyclical move at $w_{i_{3}}=u_{s}$ using the edge $(u_{s},v_{s})$ . Then for $i=s,...2$ , follow the edges $(v_{i},v_{i-1})$ until $v_{1}=w_{i_{1}}$ is reached to correct items misplaced across the cycles. Now introduce (if needed) an edge from $w_{i_{1}}$ to $w_{i_{2}}$ whose item corresponds to the item sent from $w_{i_{1}}$ in $P$ . Follow this edge to $w_{i_{2}}$ , and then complete the cyclical move by following $P$ from $w_{i_{2}}$ to $w_{i_{3}}$ . Note that this is indeed a single cyclical move since $w_{i_{2}}\neq v_{i}$ for $i=2,...,s$ .

The first sequential move alters the cluster sizes corresponding to $w_{1}$ and $w_{t}$ through correct transfers as desired, and again the edges of $\mathcal{Y}$ are all applied through the combination of the two moves. It suffices to show that the double-move reduces the shared degree of all vertices covered by $\mathcal{Y}$ . However, this again follows from the argument used in the previous case: although $w_{i_{2}}$ may receive an incorrect item from $w_{i_{1}}$ , it sends away two correct items and its shared degree is reduced by at least one.

Case 3: $w_{i_{1}}$ and $w_{i_{3}}$ belong to the same cycle of $\mathcal{Y}$ but $w_{i_{2}}$ belongs to a different cycle. Assume $w_{i_{1}},w_{i_{3}}\in Y_{1}$ and $w_{i_{2}}\in Y_{s}$ . We apply a cyclical move followed by a sequential move to perform the necessary transfers. See Figure 10 for a visualization of the double-move.

Choose an edge $e_{i}:=(u_{i},v_{i})$ from each cycle $Y_{i}$ such that $v_{1}=w_{i_{3}}$ and $u_{s}=w_{i_{2}}$ . For the cyclical move, first travel along $Y_{1}-e_{1}$ from $v_{1}=w_{i_{3}}$ to $u_{1}$ . Then for $i=1,...,s-1$ , introduce and follow an edge $(u_{i},v_{i+1})$ and follow $Y_{i+1}-e_{i+1}$ to $u_{i+1}$ . Once $u_{s}=w_{i_{2}}$ is reached, follow $P$ from $w_{i_{2}}$ to $w_{i_{3}}$ to complete the cyclical move.

Start the sequential move by following $P$ from $w_{1}$ to $w_{i_{1}}$ . Next, introduce (if needed) an edge from $w_{i_{1}}$ to $w_{i_{2}}=u_{s}$ whose item corresponds to the item sent from $w_{i_{1}}$ in $P$ . Follow this edge and then follow the edge $(u_{s},v_{s})$ . Next, for $i=s,...,2$ , follow the edge $(v_{i},v_{i-1})$ to correct items misplaced among the cycles. Once $v_{1}=w_{i_{3}}$ is reached, finish the sequential move by following $P$ from $w_{i_{3}}$ to $w_{t}$ . All clusters are then changed as desired using the arguments from Case 1.

Case 4: $w_{i_{1}}$ , $w_{i_{2}}$ , and $w_{i_{3}}$ belong to the same cycle in $\mathcal{Y}$ (allowing for either $w_{i_{1}}=w_{i_{2}}$ or $w_{i_{1}}=w_{i_{2}}=w_{i_{3}}$ ). We apply two sequential moves similar to those of Lemma 2 to perform all necessary transfers. See Figure 11 for a visualization of the double-move.

Assume $w_{i_{1}},w_{i_{2}},w_{i_{3}}\in Y_{1}$ . Choose an edge $e_{i}:=(u_{i},v_{i})$ from each cycle $Y_{i}$ , choosing $e_{1}$ so that $v_{1}=w_{i_{1}}$ . For the first sequential move, first follow $P$ from $w_{1}$ to $w_{i_{1}}=v_{1}$ and then travel along $Y_{1}-e_{1}$ to $u_{1}$ . Then for $i=1,...,s-1$ , introduce the edge $(u_{i},v_{i+1})$ and travel along $Y_{i}-e_{i}$ . Terminate once $u_{s}$ is reached.

For the second sequential move, begin at $u_{s}$ via edge $e_{s}$ . Then for $i=s,...,2$ , travel along the edge $(v_{i},v_{i-1})$ to correct items misplaced among the cycles. Once $v_{1}=w_{i_{1}}$ is reached, introduce (if needed) an edge from $w_{i_{1}}$ to $w_{i_{2}}$ whose item corresponds to the item sent from $w_{i_{1}}$ in $P$ . Follow this edge and then finish the move by following $P$ from $w_{i_{2}}$ to $w_{i_{3}}$ and then to $w_{t}$ . All clusters are then changed as desired by again using the arguments from Case 1. $\square$

In the proof of Theorem 2.1, we observe an important implication: if a path intersects a set of disjoint cycles at most three times, then all transfers correcponding to the cycles and the path can be correctly applied using one of the double-moves from the theorem.

Theorem 2.2

Let ${\mathcal{C}},{\mathcal{C}}^{\prime}$ be $k$ -clusterings where ${CDG(\mathcal{C},\mathcal{C}^{\prime})}$ consists of a set $\mathcal{Y}$ of disjoint cycles and a path $P$ . If $P$ intersects $\mathcal{Y}$ at most three times, then $d({\mathcal{C}},{\mathcal{C}}^{\prime})=2$ .

Proof

Let $P=w_{1}...w_{t}$ . As in the proof of Theorem 2.1, consider $w_{i_{1}}$ , $w_{i_{2}}$ , and $w_{i_{3}}$ : the first, second-to-last, and last vertices of $P$ which are covered by $\mathcal{Y}$ . If these are the only vertices of $P$ covered by $\mathcal{Y}$ , then all edges of $P$ can be applied in any of the four double-moves from Theorem 2.1. To see this, note that since no vertices on $P$ between $w_{i_{1}}$ and $w_{i_{2}}$ are covered by $\mathcal{Y}$ , we can follow all corresponding edges of $P$ when sending an item from $w_{i_{1}}$ to $w_{i_{2}}$ . As in the proof of the theorem, the same holds for all edges between $w_{i_{2}}$ and $w_{i_{3}}$ . Hence, all transfers corresponding to the edges from both $\mathcal{Y}$ and $P$ are correctly applied through the appropriate double-move. $\square$

Of course, since the number of times a path $P$ intersects a set $\mathcal{Y}$ of cycles is at most the number of vertices in $P$ , this implies that a path with at most three vertices can always be integrated with $\mathcal{Y}$ using one of the double-moves from Theorem 2.1.

Corollary 1

Let ${\mathcal{C}},{\mathcal{C}}^{\prime}$ be $k$ -clusterings where ${CDG(\mathcal{C},\mathcal{C}^{\prime})}$ consists of a set $\mathcal{Y}$ of disjoint cycles and a path $P$ with at most three vertices. Then $d({\mathcal{C}},{\mathcal{C}}^{\prime})=2$ .

3 Bounds on the Transformation Distance

In this section, we use the double-moves from Section 2 to prove upper bounds on the transformation distance between clusterings based on certain properties of the related $CDG$ .

Given any two $k$ -clusterings ${\mathcal{C}},{\mathcal{C}}^{\prime}$ with clustering-difference graph $D:={CDG(\mathcal{C},\mathcal{C}^{\prime})}$ , our goal is to transform ${\mathcal{C}}$ into ${\mathcal{C}}^{\prime}$ using as few cyclical and sequential moves as possible. Recall the fundamental difference between these two types of moves: cyclical moves transfer items among the clusters while preserving the original cluster sizes; on the other hand, sequential moves transfer items while increasing the size of one cluster and decreasing the size of another. This motivates a decomposition of $D$ into two parts corresponding to these different types of moves.

Definition 3 (Path-Cycle Decomposition)

Let ${\mathcal{C}},{\mathcal{C}}^{\prime}$ be $k$ -clusterings with clustering-difference graph $D:={CDG(\mathcal{C},\mathcal{C}^{\prime})}$ . For $i=1,...,k$ , let $\delta_{i}$ denote $\left|(|C_{i}|-|C^{\prime}_{i}|)\right|$ , the change in the size of cluster $C_{i}$ between ${\mathcal{C}}$ and ${\mathcal{C}}^{\prime}$ . A path-cycle decomposition $(D_{P},D_{Y})$ of $D$ is a decomposition of $D$ into two parts: a set $D_{P}$ containing $\frac{1}{2}\sum_{i=1}^{k}\delta_{i}$ directed paths and a graph $D_{Y}$ which decomposes into directed cycles.

For any path-cycle decomposition $(D_{P},D_{Y})$ of $D$ , the paths of $D_{P}$ adjust the cluster sizes of ${\mathcal{C}}$ to those of ${\mathcal{C}}^{\prime}$ and the edges of $D_{Y}$ apply any remaining transfers. Such a decomposition can be found easily: greedily construct directed paths in $D$ which begin at excess vertices, those vertices $c_{i}$ satisfying $d^{+}(c_{i})>d^{-}(c_{i})$ , and terminate at deficit vertices, those satisfying $d^{-}(c_{i})>d^{+}(c_{i})$ , and add the paths to $D_{P}$ . Once there do not exist any excess or deficit vertices in $D$ , the remaining edges in the graph form $D_{Y}$ . Alternatively, greedily remove directed cycles from $D$ to build $D_{Y}$ and the leftover edges will decompose into $D_{P}$ . Note that we can store $D_{P}$ either as a set of directed paths or as a graph which decomposes into paths. Nevertheless, the fixed number of paths from excess to deficit vertices in $D_{P}$ gives a lower bound on the transformation distance between the clusterings.

Lemma 3

Let ${\mathcal{C}},{\mathcal{C}}^{\prime}$ be $k$ -clusterings of the same data set. Then $d({\mathcal{C}},{\mathcal{C}}^{\prime})\geq\frac{1}{2}\sum_{i=1}^{k}\delta_{i}$ , where $\delta_{i}=|(|C_{i}|-|C^{\prime}_{i}|)|$ .

Proof

By definition, $\delta_{i}$ is the change in cluster size needed to transform $C_{i}$ into $C^{\prime}_{i}$ . Note that the sum $\sum_{i=1}^{k}\delta_{i}$ is therefore even. Cyclical moves do not change the size of any clusters while sequential moves change the size of exactly two clusters by one. Hence, at least $\frac{1}{2}\sum_{i=1}^{k}\delta_{i}$ sequential moves are required in order to change the cluster sizes of ${\mathcal{C}}$ to those of ${\mathcal{C}}^{\prime}$ . $\square$

Given a path-cycle decomposition $(D_{P},D_{Y})$ of $D$ , a straightforward approach for transforming ${\mathcal{C}}$ into ${\mathcal{C}}^{\prime}$ is to separately apply the paths of $D_{P}$ followed by the cycles of $D_{Y}$ . However, whereas a fixed number of sequential moves is required to apply all paths of $D_{P}$ , the number of cyclical moves required to apply all transfers in $D_{Y}$ is generally less than its number of cycles. Using the double-move from Figure 3, we can integrate sets of disjoint cycles from $D_{Y}$ to achieve a transformation distance bound which generalizes Corollary 7 in [2]. This serves as a starting point for our discussion on an improved upper bound for $d({\mathcal{C}},{\mathcal{C}}^{\prime})$ . Recall that the shared degree $\eta_{i}$ of a vertex $c_{i}$ in $D$ is the minimum of its indegree and outdegree.

Lemma 4

Let ${\mathcal{C}},{\mathcal{C}}^{\prime}$ be $k$ -clusterings of the same data set. Then

[TABLE]

where $\delta_{i}=|(|C_{i}|-|C^{\prime}_{i}|)|$ , $\eta_{i}$ is the shared degree of $c_{i}$ in ${CDG(\mathcal{C},\mathcal{C}^{\prime})}$ , $i_{1}=\arg\max\eta_{i}$ , and $i_{2}=\arg\max_{i\neq i_{1}}\eta_{i}$ .

Proof

Let $(D_{P},D_{Y})$ be any path-cycle decomposition of $D:={CDG(\mathcal{C},\mathcal{C}^{\prime})}$ . Applying the $\frac{1}{2}\sum_{i=1}^{k}\delta_{i}$ sequential moves corresponding to the paths of $D_{P}$ correctly adjusts all cluster sizes.

Next, we can use the method of Corollary 7 from [2] to apply the cycles in $D_{Y}$ . To do so, note that for $i=1,...,k$ , the shared degree of $c_{i}$ in $D_{Y}$ is at most $\eta_{i}$ . Hence, we may first apply at most $\eta_{i_{1}}-\eta_{i_{2}}$ cyclical moves to reduce the maximum shared degree in $D_{Y}$ to at most $\eta_{i_{2}}$ . Next, using the technique in Corollary 3 from [2], we can solve a maximum flow problem to obtain a set of disjoint cycles in $D_{Y}$ covering all vertices of maximum shared degree. All transfers from this cycle cover can be applied via at most two cyclical moves using the double-move from Figure 3. Repeating until the maximum shared degree of $D_{Y}$ is zero, all transfers from $D_{Y}$ are performed in at most $(\eta_{i_{1}}-\eta_{i_{2}})+2\eta_{i_{2}}=\eta_{i_{1}}+\eta_{i_{2}}$ cyclical moves. Therefore, at most $\eta_{i_{1}}+\eta_{i_{2}}+\frac{1}{2}\sum_{i=1}^{k}\delta_{i}$ cyclical and sequential moves are required to transform ${\mathcal{C}}$ into ${\mathcal{C}}^{\prime}$ . $\square$

This initial upper bound on $d({\mathcal{C}},{\mathcal{C}}^{\prime})$ uses the double-move for integrating disjoint cyclical moves from Figure 3 but does not yet take advantage of any of the double-moves from Section 2 which integrate both cyclical and sequential moves. For instance, when applying the cyclical moves from a disjoint cycle cover $\mathcal{Y}$ of all vertices of maximum shared degree in $D_{Y}$ , we could attempt to integrate a path $P$ from $D_{P}$ . If $P$ is disjoint from $\mathcal{Y}$ , if $P$ intersects at most one cycle of $\mathcal{Y}$ , or if $P$ intersects $\mathcal{Y}$ at most three times, we could use one of the double-moves from Lemma 1, Lemma 2, or Theorem 2.2 to integrate $P$ at no extra cost, reducing the number of remaining sequential moves. However, we cannot guarantee that such a path $P$ exists in $D_{P}$ .

Nevertheless, we can achieve an improved bound on the transformation distance by considering each path in $D_{P}$ as the combination of a (short) sequential move with a cyclical move. To motivate this, suppose ${CDG(\mathcal{C},\mathcal{C}^{\prime})}$ consists of a set $\mathcal{Y}$ of disjoint cycles and a path $P=w_{1}...w_{t}$ with $t\geq 4$ , where $w_{1}$ , $w_{t-1}$ , and $w_{t}$ are all covered by $\mathcal{Y}$ . We can apply one of the four double-moves from Theorem 2.1 to perform all transfers corresponding to the edges in $\mathcal{Y}$ while decreasing the excess of $w_{1}$ and the deficit of $w_{t}$ . However, the double-move does not apply any edges of $P$ between $w_{2}$ and $w_{t-1}$ . Additionally, $w_{t-1}$ receives an incorrect item from $w_{1}$ during the double-move which still needs to be sent to $w_{2}$ . Hence, a new edge from $w_{t-1}$ to $w_{2}$ is introduced and the resulting $CDG$ consists of the directed cycle $w_{2}...w_{t-1}w_{2}$ .

In this manner, we can represent the sequential move corresponding to any path $P=w_{1}...w_{t}$ in $D_{P}$ covering $t\geq 4$ vertices as the combination of such a cycle $P_{Y}=w_{2}...w_{t-1}w_{2}$ and a path $P^{\prime}$ with three vertices. As depicted in Figure 12, there are two cases to consider regarding the interior vertex of $P^{\prime}$ and the item sent along the artificially introduced edge $e_{P}:=(w_{t-1},w_{2})$ in $P_{Y}$ . These cases depend on the order in which the corresponding transfers are applied. Let $x_{2}$ denote the item to be sent from $w_{1}$ to $w_{2}$ in $P$ , and let $x_{t}$ denote the item to be sent from $w_{t-1}$ to $w_{t}$ . If $P^{\prime}$ is applied first (as in the previous paragraph), let $P^{\prime}:=w_{1}w_{t-1}w_{t}$ and let $e_{P}$ send $x_{2}$ from $w_{t-1}$ to the correct destination $w_{2}$ . This case is depicted in Figure 12(a). On the other hand, if $e_{P}$ is applied before $P^{\prime}$ , let $e_{P}$ send $x_{t}$ from $w_{t-1}$ to $w_{2}$ and let $P^{\prime}:=w_{1}w_{2}w_{t}$ as depicted in Figure 12(b). Either case has the same effect on the underlying clusters as the original path $P$ .

Therefore, we decompose each path $P$ from $D_{P}$ with more than three vertices in this manner, adding the resulting cycle $P_{Y}$ to $D_{Y}$ and replacing $P$ with $P^{\prime}$ in $D_{P}$ . All paths in $D_{P}$ then have at most three vertices and Corollary 1 implies that we can completely integrate any of these paths with a disjoint cycle cover from $D_{Y}$ in only two moves. Note also that each vertex in a cycle $P_{Y}$ is an interior vertex of the original path $P$ . Hence, even after introducing these additional cycles to $D_{Y}$ , the shared degree of each vertex in $D_{Y}$ remains at most the shared degree of that vertex in the original clustering-difference graph. This allows us to improve upon the distance bound given in Lemma 4.

The challenge in this approach lies in the fact that the interior vertex of $P^{\prime}$ (either $w_{t-1}$ or $w_{2}$ ) and the label of $e_{P}$ (either $x_{2}$ or $x_{t}$ ) depend on the order in which the corresponding transfers are applied. If a cycle cover $\mathcal{Y}$ does not include $e_{P}$ , then integrating $P^{\prime}$ with $\mathcal{Y}$ is straightforward – simply choose the correct interior vertex for $P^{\prime}$ depending on whether or not $e_{P}$ has been applied yet. Once $P^{\prime}$ is applied, then if $e_{P}$ remains in $D_{Y}$ , change its label from $x_{t}$ to $x_{2}$ .

However, if $e_{P}$ is contained in $\mathcal{Y}$ , then we must make adjustments to the double-moves from Theorem 2.1 in order to take into account the different possible cases for $e_{P}$ and $P^{\prime}$ . Nevertheless, this approach allows us to integrate sequential moves from $D_{P}$ with disjoint cyclical moves from $D_{Y}$ at no extra cost, resulting in the following greatly improved distance bound. The bound depends only on the larger of the second-largest shared degree and the overall change in cluster sizes rather than on the sum of these values as in Lemma 4.

Theorem 3.1

Let ${\mathcal{C}},{\mathcal{C}}^{\prime}$ be $k$ -clusterings of the same data set. Then

[TABLE]

where $\delta_{i}=|(|C_{i}|-|C^{\prime}_{i}|)|$ , $\eta_{i}$ is the shared degree of $c_{i}$ in ${CDG(\mathcal{C},\mathcal{C}^{\prime})}$ , $i_{1}=\arg\max\eta_{i}$ , and $i_{2}=\arg\max_{i\neq i_{1}}\eta_{i}$ .

Proof

Let $(D_{P},D_{Y})$ be any path-cycle decomposition of $D:={CDG(\mathcal{C},\mathcal{C}^{\prime})}$ . For each path $P=w_{1}...w_{t}$ of the $\frac{1}{2}\sum_{i=1}^{k}\delta_{i}$ paths in $D_{P}$ , if $t\geq 4$ , decompose $P$ into a cycle $P_{Y}$ and a short path $P^{\prime}$ as depicted in the cases of Figure 12. Specifically, let $P^{\prime}:=w_{1}w_{t-1}w_{t}$ where the label of edge $(w_{1},w_{t-1})$ is the item $x_{2}$ to be sent from $w_{1}$ to $w_{2}$ in $P$ , as depicted by the blue edges in Figure 12(a). Replace $P$ with $P^{\prime}$ in $D_{P}$ . In addition, introduce the cycle $P_{Y}=w_{2}...w_{t-1}w_{2}$ to $D_{Y}$ , where the label of the artificial edge $e_{P}:=(w_{t-1},w_{2})$ in $P_{Y}$ is the item $x_{t}$ to be sent from $w_{t-1}$ to $w_{t}$ in $P$ , as depicted by the green edges in Figure 12(b). Note that each vertex in $P_{Y}$ is an interior vertex of the original path $P$ ; hence, in the resulting cycle graph $D_{Y}$ , the shared degree of each vertex $c_{i}$ remains at most $\eta_{i}$ .

As in the proof of Lemma 4, first apply at most $\eta_{i_{1}}-\eta_{i_{2}}$ cyclical moves to reduce the maximum shared degree in $D_{Y}$ to at most $\eta_{i_{2}}$ . Whenever an artificial edge $e_{P}$ is applied in such a move, change the interior vertex of the corresponding path $P^{\prime}$ in $D_{P}$ so that $P^{\prime}=w_{1}w_{2}w_{t}$ as in Figure 12(b).

Now, again as in the proof of Lemma 4, we can reduce the maximum shared degree in $D_{Y}$ by finding a disjoint cycle cover for the vertices of maximum shared degree and applying a double-move. However, in each such double-move, we will also integrate a path from $D_{P}$ .

Let $\mathcal{Y}$ be such a set of disjoint cycles in $D_{Y}$ , which can be found using the technique from [2]. Choose any path from $D_{P}$ . Since each path in $D_{P}$ has at most three vertices, if the selected path is an original path from ${CDG(\mathcal{C},\mathcal{C}^{\prime})}$ , integrating the path with $\mathcal{Y}$ in a double-move is straightforward via Corollary 1. Again, if an artificial edge $e_{P}$ is applied through this double-move and the corresponding path $P^{\prime}$ remains in $D_{P}$ , switch the interior vertex of $P^{\prime}$ from $w_{t-1}$ to $w_{2}$ as in Figure 12(b).

Hence, assume the selected path from $D_{P}$ is an introduced path of the form $P^{\prime}$ with corresponding artificial edge $e_{P}$ . If $e_{P}$ is not contained in the cycle cover $\mathcal{Y}$ , integrating $P^{\prime}$ with $\mathcal{Y}$ is again straightforward via Corollary 1: the interior vertex of $P^{\prime}$ is known and we can simply apply one of the double-moves from Theorem 2.1. After the double-move is applied, make any necessary adjustments to the remaining paths in $D_{P}$ as in the previous paragraph. Additionally, if $e_{P}$ remains in $D_{Y}$ , change its label from $x_{t}$ to $x_{2}$ as in Figure 12(a).

However, if the edge $e_{P}$ corresponding to $P^{\prime}$ is contained in $\mathcal{Y}$ , then we must make modifications to the double-moves of Theorem 2.1 to account for the two different cases for $e_{P}$ and $P^{\prime}$ . Note that if this situation arises, $e_{P}$ has not yet been applied so $P^{\prime}$ has the initial form $P^{\prime}=w_{1}w_{t-1}w_{t}$ . We modify each of the four cases regarding the intersection points of $P^{\prime}$ with $\mathcal{Y}$ from Theorem 2.1 to perform the necessary transfers. Since $e_{P}$ is included in $\mathcal{Y}$ , both $w_{t-1}$ and $w_{2}$ are necessarily covered by $\mathcal{Y}$ , but $w_{1}$ and $w_{t}$ need not be covered. Several of the case modifications depend on whether or not these two vertices are actually covered by the cycles.

Case 1: All three vertices of $P^{\prime}=w_{1}w_{t-1}w_{t}$ are covered by $\mathcal{Y}$ , where $w_{1}$ and $w_{t}$ belong to different cycles of $\mathcal{Y}$ and $w_{t-1}$ (and hence, also $e_{P}$ and $w_{2}$ ) belongs to the same cycle as $w_{t}$ . See the examples in Figure 13 – the artificial edge $e_{P}$ is given by the green edge from $w_{t-1}$ to $w_{2}$ and the other edges of $\mathcal{Y}$ are given in black. Note that we cannot simply apply the double-move from Case 1 in Theorem 2.1 as depicted for this scenario in Figure 13(a) (compare to Figure 8). In the first cyclical move, the edge $e_{P}$ would be applied, sending the item $x_{t}$ from $w_{t-1}$ to $w_{2}$ . Hence, $w_{t-1}$ would then be unable to send $x_{t}$ to $w_{t}$ in the second sequential move.

We can address this by making a slight modification to this first cyclical move: instead of sending $x_{2}$ from $w_{1}$ to $w_{t-1}$ and then applying $e_{P}$ , simply send $x_{2}$ directly from $w_{1}$ to $w_{2}$ . Then $x_{t}$ remains at $w_{t-1}$ , and in the second sequential move, $w_{t-1}$ can send $x_{t}$ to $w_{t}$ as seen in Figure 13(b).

Note that in this modified double-move, the artificial edge $e_{P}$ from $w_{t-1}$ to $w_{2}$ is never actually applied. However, its intended purpose is accomplished: item $x_{2}$ is correctly received by $w_{2}$ from $w_{1}$ , and $x_{t}$ is correctly sent from $w_{t-1}$ to $w_{t}$ . Therefore, after the double-move is applied, we can remove $P^{\prime}$ from $D_{P}$ and $e_{P}$ along with the other edges of $\mathcal{Y}$ from $D_{Y}$ , as desired.

Case 2: The first and last vertices of $P^{\prime}=w_{1}w_{t-1}w_{t}$ which are covered by $\mathcal{Y}$ belong to different cycles, and the second-to-last vertex covered by $\mathcal{Y}$ belongs to a different cycle than the last vertex. There are three double-moves based on the double-move from Case 2 of Theorem 2.1 which can be used depending on whether or not $w_{1}$ , $w_{t}$ , or both $w_{1}$ and $w_{t}$ are covered by $\mathcal{Y}$ . Depictions of these moves are given in Figure 14.

a)

Vertices $w_{1}$ and $w_{t}$ are both covered by $\mathcal{Y}$ . In this situation, for Case 2 to apply, $w_{1}$ and $w_{t}$ must belong to different cycles of $\mathcal{Y}$ and $w_{t-1}$ must not belong to the same cycle as $w_{t}$ . See Figure 14(a). Then when performing the double-move from Case 2 of Theorem 2.1 as depicted in Figure 9, the edge $e_{P}$ is applied in the first sequential move before any of the edges from $P^{\prime}$ , sending $x_{t}$ from $w_{t-1}$ to $w_{2}$ . Hence, if we switch the interior vertex of $P^{\prime}$ from $w_{t-1}$ to $w_{2}$ , we can apply this original double-move without any further modifications, as depicted in Figure 14(a).

b)

Vertex $w_{1}$ is not covered by $\mathcal{Y}$ . Then for Case 2 to apply, $w_{t-1}$ and $w_{t}$ must belong to different cycles of $\mathcal{Y}$ . See Figure 14(b). As in Case 1, we cannot apply the original double-move since then $w_{t-1}$ would be unable to send $x_{t}$ to $w_{t}$ in the second move. We can address this in the same way as in the modified double-move from Case 1: send $x_{2}$ directly from $w_{1}$ to $w_{2}$ in the first sequential move and then send $x_{t}$ directly from $w_{t-1}$ to $w_{t}$ in the second cyclical move, as depicted in Figure 14(b). Although $e_{P}$ is never actually applied, all desired transfers are accomplished as in Case 1. 3. c)

Vertex $w_{t}$ is not covered by $\mathcal{Y}$ . Then for Case 2 to apply, $w_{1}$ and $w_{t-1}$ must belong to different cycles of $\mathcal{Y}$ . See Figure 14(c). We make a similar modification to that of the previous case: correctly send $x_{t}$ directly from $w_{t-1}$ to $w_{t}$ in the first sequential move and then correctly send $x_{2}$ directly from $w_{1}$ to $w_{2}$ in the second cyclical move, as depicted in Figure 14(c).

Case 3: The first and last vertices of $P=w_{1}w_{t-1}w_{t}$ which are covered by $\mathcal{Y}$ belong to the same cycle in $\mathcal{Y}$ , while the second-to-last vertex belongs to a different cycle. The only scenario in which this case applies is when $w_{1}$ and $w_{t}$ belong to the same cycle of $\mathcal{Y}$ and $w_{t-1}$ belongs to a different cycle. We make a modification similar to the third double-move from the previous case. In a first cyclical move send $x_{t}$ directly from $w_{t-1}$ to $w_{t}$ , and in a second sequential move send $x_{2}$ directly from $w_{1}$ to $w_{2}$ . See Figure 15.

Case 4: All vertices of $P^{\prime}=w_{1}w_{t-1}w_{t}$ which are covered by $\mathcal{Y}$ belong to the same cycle. There are two double-moves, depicted in Figure 16, which can be used depending on whether or not $w_{1}$ is covered by $\mathcal{Y}$ .

a)

Vertex $w_{1}$ is covered by $\mathcal{Y}$ . As in the first double-move for Case 2, in the original double-move for Case 4 of Theorem 2.1 the edge $e_{P}$ is applied before any of the edges from $P^{\prime}$ . Hence, if we switch the interior vertex of $P^{\prime}$ from $w_{t-1}$ to $w_{2}$ , we can apply the double-move without any further modifications, as depicted in Figure 16(a). Note that $w_{t}$ may or may not be covered by the cycle containing $w_{1}$ and $e_{P}$ .

b)

Vertex $w_{1}$ is not covered by $\mathcal{Y}$ . We make a modification similar to that of the second double-move for Case 2. In a first sequential move, directly send $x_{2}$ from $w_{1}$ to $w_{2}$ , and in a second sequential move, directly send $x_{t}$ from $w_{t-1}$ to $w_{t}$ . See Figure 16(b). Note again that $w_{t}$ may or may not be covered by the cycle containing $e_{P}$ .

In each case, we are able to integrate $P^{\prime}$ with $\mathcal{Y}$ and apply all necessary transfers in only two moves. Therefore, at most $\eta_{i_{2}}$ double-moves are needed to reduce the shared degree of $D_{Y}$ to zero, and through each of these double-moves, we remove one of the $\frac{1}{2}\sum_{i=1}^{k}\delta_{i}$ paths from $D_{P}$ . Afterwards, we may simply apply the remaining paths in $D_{P}$ , if any, individually. The total number of moves used to transform ${\mathcal{C}}$ into ${\mathcal{C}}^{\prime}$ is thus at most

[TABLE]

$\square$

4 Circuit Diameter of Partition Polytopes

A fundamental open question in linear programming is whether or not there exists a polynomial pivot rule for the simplex method. The existence of such a pivot rule would require that the polynomial Hirsch conjecture [19] holds; i.e., that the combinatorial diameter of a polyhedron can be polynomially bounded. A recent effort to better understand the combinatorial diameter of polyhedra has been the study of the related circuit diameter [6, 8, 9, 18]. Whereas the original Hirsch conjecture is false in general [20, 25], the related Circuit Diameter Conjecture [6] remains open.

Recall that the circuits of the bounded-size partition polytope $BPP$ correspond to cyclical and sequential moves of items among clusters. Therefore, as long as no cluster size constraints are violated during a clustering transformation, any resulting bounds on the transformation distance between clusterings have implications on the circuit distance between vertices in $BPP$ . As a 0/1-polytope, the combinatorial diameter (and hence, also the circuit diameter) of $BPP$ satisfies the Hirsch conjecture [23] – specifically, the combinatorial diameter is at most the number of items $n$ . In this section, we will use the results from Section 3 to achieve much better upper bounds on the circuit diameter.

For the fixed-size partition polytope, Proposition 1 can be used to show that the combinatorial diameter is at most $\kappa_{1}+\kappa_{2}$ , where $\kappa_{1},\kappa_{2}$ are the two largest fixed cluster sizes [2]. We begin by generalizing this bound to the circuit diameter of the bounded-size partition polytope by also taking into account the largest possible change in cluster sizes. Although we do not yet utilize any double-moves which integrate sequential and cyclical moves (see the upcoming Theorem 4.1), the bound of the following lemma is already better than the naive bound achieved by simply counting the sequential and cyclical moves separately – we can relate the shared degree of a vertex in a $CDG$ to the change in size of the corresponding cluster.

Lemma 5

For a bounded-size partition polytope $BPP(\kappa^{+},\kappa^{-})$ , assume the corresponding clusters are indexed so that $\kappa_{1}^{+}\geq\cdots\geq\kappa_{k}^{+}$ and let $i_{1},i_{2}$ denote the two indices minimizing $\kappa_{i}^{+}-\kappa_{i}^{-}$ . Then the circuit diameter of $BPP(\kappa^{+},\kappa^{-})$ is at most

[TABLE]

Proof

Let $\mathcal{C},\mathcal{C}^{\prime}$ be $k$ -clusterings corresponding to vertices $y,y^{\prime}$ of ${BPP(\kappa^{+},\kappa^{-})}$ . We can transform $\mathcal{C}$ into $\mathcal{C}^{\prime}$ by separately applying sequential moves followed by cyclical double-moves in the manner of Lemma 4. All intermediate clusterings in this transformation satisfy the cluster size bounds of ${BPP(\kappa^{+},\kappa^{-})}$ , so the process indeed corresponds to a circuit walk from $y$ to $y^{\prime}$ in ${BPP(\kappa^{+},\kappa^{-})}$ .

Let $\eta_{i}$ denote the shared degree of vertex $c_{i}$ in $CDG(C,C^{\prime})$ , and let $\delta_{i}:=|(|C_{i}|-|C^{\prime}_{i}|)|$ . Lemma 4 then implies that the circuit distance from $y$ to $y^{\prime}$ in ${BPP(\kappa^{+},\kappa^{-})}$ is at most

[TABLE]

where $j_{1},j_{2}$ maximize $\eta_{i}$ over all $i=1,...,k$ . Trivially, for $i=1,...k$ , it holds that $\eta_{i}\leq\kappa_{i}^{+}$ and $\delta_{i}\leq\kappa_{i}^{+}-\kappa_{i}^{-}$ . Hence, we obtain the following upper bound on the circuit diameter of ${BPP(\kappa^{+},\kappa^{-})}$ as a natural implication of Lemma 4:

[TABLE]

Note however that this bound can be immediately improved. For $i=1,...,k$ , we must have $\eta_{i}+\delta_{i}\leq\kappa^{+}_{i}$ since $\eta_{i}+\delta_{i}$ is equal to the maximum of the indegree and outdegree of $c_{i}$ . Rearranging this inequality yields $\eta_{i}+\frac{1}{2}\delta_{i}\leq\kappa_{i}^{+}-\frac{1}{2}\delta_{i}$ . Substituting into (1), we obtain the following upper bound on the circuit distance from $y$ to $y^{\prime}$ :

[TABLE]

Note that $\sum_{i=j_{1},j_{2}}\left(\kappa^{+}_{i}-\frac{1}{2}\delta_{i}\right)\leq\kappa^{+}_{j_{1}}+\kappa^{+}_{j_{2}}\leq\kappa^{+}_{1}+\kappa^{+}_{2}$ . Similarly, it holds that $\frac{1}{2}\sum_{i\neq j_{1},j_{2}}^{k}\delta_{i}\leq\frac{1}{2}\sum_{i\neq j_{1},j_{2}}^{k}(\kappa_{i}^{+}-\kappa_{i}^{-})\leq\frac{1}{2}\sum_{i\neq i_{1},i_{2}}^{k}(\kappa_{i}^{+}-\kappa_{i}^{-})$ . Thus, we obtain the stated bound. $\square$

As in Theorem 3.1, we can significantly improve upon this diameter bound by using the double-moves from Theorem 2.1 to integrate sequential moves with sets of disjoint cyclical moves. Note that we must take care when applying these double-moves to bounded-size clusterings – certain moves require the existence of a vertex whose cluster size can be temporarily increased as demonstrated in Figure 7. Nevertheless, as long as there is at least some slack in the constraints for all but at most one cluster, we can ensure the existence of such a vertex through a simple pre-processing of the clusters. Hence, we obtain the following improved diameter bound as an implication of the transformation distance bound from Theorem 3.1, which depends on the maximum of the second-largest cluster size and the largest possible change in cluster sizes. We note that the cluster size slackness assumptions made in this theorem are quite natural in any application involving bounded-size clusterings.

Theorem 4.1

For a bounded-size partition polytope $BPP(\kappa^{+},\kappa^{-})$ , assume the corresponding clusters are indexed so that $\kappa_{1}^{+}\geq\cdots\geq\kappa_{k}^{+}$ and let $i_{1}$ denote the index minimizing $\kappa_{i}^{+}-\kappa_{i}^{-}$ . If $\sum_{i=1}^{k}\kappa_{i}^{+}>n+k-2$ and if $\kappa_{i}^{+}>\kappa_{i}^{-}$ for $i\neq i_{1}$ , the circuit diameter of ${BPP(\kappa^{+},\kappa^{-})}$ is at most

[TABLE]

Proof

Let $\mathcal{C},\mathcal{C}^{\prime}$ be $k$ -clusterings corresponding to vertices $y,y^{\prime}$ of ${BPP(\kappa^{+},\kappa^{-})}$ . We can transform $\mathcal{C}$ into $\mathcal{C}^{\prime}$ in the manner of Theorem 3.1. However, in order for all intermediate clusterings to satisfy the bounds of ${BPP(\kappa^{+},\kappa^{-})}$ , we must make sure that when applying any version of the double-move from Case 4 of Theorem 2.1, there exists a suitable choice for $u_{s}$ whose corresponding cluster size is strictly less than its upper bound and can be temporarily increased.

To ensure that this is always the case, we pre-process $\mathcal{C}$ and $\mathcal{C}^{\prime}$ in the following manner. If there exists more than one cluster $C_{i}$ (or $C^{\prime}_{i}$ in the case of $\mathcal{C}^{\prime}$ ) such that $|C_{i}|=\kappa^{+}_{i}$ , then choose such an index $j$ with $|C_{j}|=\kappa^{+}_{j}>\kappa^{-}_{j}$ , which is possible since at most one index $i$ satisfies $\kappa^{+}_{i}=\kappa^{-}_{i}$ . Transfer any item from $C_{j}$ to a different cluster $C_{\ell}$ which satisfies $|C_{\ell}|<\kappa^{+}_{\ell}-1$ . Such an index $\ell$ must exist, else we would have

[TABLE]

Repeat this process at most $k-2$ times until sizes of all clusters but at most one are strictly less than their upper bounds. In the resulting $k$ -clustering, it is then always possible to choose $u_{s}$ in Case 4 of Theorem 2.1 such that the corresponding cluster size can be temporarily increased when performing the double-move: any cycle in a clustering-difference graph covers at least two vertices, and at least one of these vertices must have a corresponding cluster size less than its upper bound. Additionally, the choice of the vertex $u_{s}$ is not affected by the modifications in Case 4 of Theorem 3.1.

Pre-processing both $\mathcal{C}$ and ${\mathcal{C}}^{\prime}$ in this manner requires at most $2(k-2)$ single item transfers. Once these transfers have been applied, we may perform a circuit walk between the resulting $k$ -clusterings by applying the moves of Theorem 3.1 to their clustering-difference graph. Such a circuit walk has length at most

[TABLE]

where $j_{1},j_{2}$ denote the two indices maximizing the shared degree $\eta_{i}$ .

As in the proof of Lemma 5, since $\eta_{j_{1}}+\frac{1}{2}\delta_{j_{1}}\leq\kappa_{j_{1}}^{+}-\frac{1}{2}\delta_{j_{1}}$ , this bound is at most

[TABLE]

Taking into account the at most $2(k-2)$ circuit steps needed to adjust the cluster sizes, we obtain the stated improved bound. $\square$

5 Conclusions and Future Directions

In this work, we provide methods based on linear programming and network theory for transforming $k$ -clusterings using sequences of cyclical and sequential moves of items among clusters. This leads to upper bounds on the transformation distance between two general $k$ -clusterings as well as the circuit diameter of the bounded-size partition polytope. There are several natural directions for future research in this area.

We prove in Theorem 4.1 an upper bound on the circuit diameter of the bounded-size partition polytope using the transformation distance bound from Theorem 3.1 and modified double-moves from Theorem 2.1 which integrate sequential moves of items with cyclical moves. A subsequent research question is whether or not we can also bound the combinatorial diameter of the polytope in such a manner. The edges of $BPP$ have a more technical characterization than its circuits – only certain cyclical and sequential moves actually correspond to edges between vertices [11]. However, through a careful ordering of cyclical and sequential moves and double-moves, we believe new bounds on the combinatorial distance between vertices in the polytope could be achievable.

Additionally, in Theorem 3.1, we use an arbitrary path-cycle decomposition $(D_{P},D_{Y})$ of the clustering-difference graph $D:={CDG(\mathcal{C},\mathcal{C}^{\prime})}$ to bound the transformation distance between the clusterings. It is possible to instead construct a decomposition exhibiting potentially useful properties. For instance, solving a minimum-cost circulation problem over $D$ yields a decomposition in which $D_{Y}$ has a maximum number of edges. Modifying this circulation problem can yield a decomposition in which the maximum shared degree in $D_{P}$ is minimized. Through further analysis, these extremal choices for the path-cycle decomposition might lead to better upper bounds on the transformation distance.

Finally, we note that the transformation distance $d({\mathcal{C}},{\mathcal{C}}^{\prime})$ is formally a metric. Hence, if we are able to compute $d({\mathcal{C}},{\mathcal{C}}^{\prime})$ , we can interpret it as a measure of the distance between given $k$ -clusterings of the same data set. There is significant interest in comparing clusterings in the literature [14, 22]. However, most measures typically do not take into account the potential labels of the clusters and are instead based on pairwise relationships among the items. Our new metric takes a fundamentally different approach to measuring the difference between clusterings, motivating a comparative study.

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. Balinski and A. Russakoff. On the assignment polytope. SIAM Review , 16(4):516–525, 1974.
2[2] S. Borgwardt. On the diameter of partition polytopes and vertex-disjoint cycle cover. Mathematical Programming, Ser. A , 141(1):1–20, 2013.
3[3] S. Borgwardt, A. Brieden, and P. Gritzmann. Mathematics in agriculture and forestry: Geometric clustering for land consolidation. IFORMS news , Dec. issue, 2013.
4[4] S. Borgwardt, A. Brieden, and P. Gritzmann. Geometric clustering for the consolidation of farmland and woodland. The Mathematical Intelligencer , 36(2):37–44, 2014.
5[5] S. Borgwardt, A. Brieden, and P. Gritzmann. Geometrisches Clustering: Mathematik für die Flurverbesserung (Geometric clustering: Mathematics for land improvement). Mitteilungen der DMV , 23:82–90, 2015.
6[6] S. Borgwardt, E. Finhold, and R. Hemmecke. On the circuit diameter of dual transportation polyhedra. SIAM Journal on Discrete Mathematics , 29(1):113–121, 2016.
7[7] S. Borgwardt and F. Happach. Good Clusterings Have Large Volume. Operations Research , 67(1):215–231, 2019.
8[8] S. Borgwardt, J. A. De Loera, and E. Finhold. Edges vs circuits: a hierarchy of diameters in polyhedra. Advances in Geometry , 16(4):511–530, 2016.