Cluster Deletion on Interval Graphs and Split Related Graphs
Athanasios L. Konstantinidis, Charis Papadopoulos

TL;DR
This paper proves that the Cluster Deletion problem can be solved in polynomial time on interval graphs, resolving a long-standing open problem, and explores its complexity on related graph classes.
Contribution
It provides the first polynomial-time algorithm for Cluster Deletion on interval graphs and analyzes its complexity on subclasses of split graphs.
Findings
Polynomial-time algorithm for Cluster Deletion on interval graphs.
NP-completeness of Cluster Deletion on certain split graph generalizations.
Two polynomial-time algorithms for subclasses of these generalizations.
Abstract
In the {\sc Cluster Deletion} problem the goal is to remove the minimum number of edges of a given graph, such that every connected component of the resulting graph constitutes a clique. It is known that the decision version of {\sc Cluster Deletion} is NP-complete on (-free) chordal graphs, whereas {\sc Cluster Deletion} is solved in polynomial time on split graphs. However, the existence of a polynomial-time algorithm of {\sc Cluster Deletion} on interval graphs, a proper subclass of chordal graphs, remained a well-known open problem. Our main contribution is that we settle this problem in the affirmative, by providing a polynomial-time algorithm for {\sc Cluster Deletion} on interval graphs. Moreover, despite the simple formulation of the algorithm on split graphs, we show that {\sc Cluster Deletion} remains NP-complete on a natural and slight generalization of split graphs that…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Cluster Deletion on Interval Graphs and Split Related Graphs
Athanasios L. Konstantinidis Department of Mathematics, University of Ioannina, Greece. E-mail: [email protected]. This research has been financially supported by the State Scholarships Foundation (IKY).
Charis Papadopoulos Department of Mathematics, University of Ioannina, Greece. E-mail: [email protected]
Abstract
In the Cluster Deletion problem the goal is to remove the minimum number of edges of a given graph, such that every connected component of the resulting graph constitutes a clique. It is known that the decision version of Cluster Deletion is NP-complete on (-free) chordal graphs, whereas Cluster Deletion is solved in polynomial time on split graphs. However, the existence of a polynomial-time algorithm of Cluster Deletion on interval graphs, a proper subclass of chordal graphs, remained a well-known open problem. Our main contribution is that we settle this problem in the affirmative, by providing a polynomial-time algorithm for Cluster Deletion on interval graphs. Moreover, despite the simple formulation of the algorithm on split graphs, we show that Cluster Deletion remains NP-complete on a natural and slight generalization of split graphs that constitutes a proper subclass of -free chordal graphs. To complement our results, we provide two polynomial-time algorithms for Cluster Deletion on subclasses of such generalizations of split graphs.
1 Introduction
In graph theoretic notions, clustering is the task of partitioning the vertices of the graph into subsets, called clusters, in such a way that there should be many edges within each cluster and relatively few edges between the clusters. In many applications, the clusters are restricted to induce cliques, as the represented data of each edge corresponds to a similarity value between two objects [17, 18]. Under the term cluster graph. which refers to a disjoint union of cliques, one may find a variety of applications that have been extensively studied [1, 6, 22]. Here we consider the Cluster Deletion problem which asks for a minimum number of edge deletions from an input graph, so that the resulting graph is a disjoint union of cliques. In the decision version of the problem, we are also given an integer and we want to decide whether at most edge deletions are enough to produce a cluster graph.
Although Cluster Deletion is NP-hard on general graphs [23], settling its complexity status restricted on graph classes has attracted several researchers. Regarding the maximum degree of a graph, Komusiewicz and Uhlmann [21] have shown an interesting complexity dichotomy result: Cluster Deletion remains NP-hard on -free graphs with maximum degree four, whereas it can be solved in polynomial time for graphs having maximum degree at most three. Quite recently, Golovach et al. have shown that it remains NP-hard on planar graphs [13]. For graph classes characterized by forbidden induced subgraphs, Gao et al. [11] showed that Cluster Deletion is NP-hard on -free graphs and on -free graphs. Regarding -free graphs, Grüttemeier et al. [15], showed a complexity dichotomy result for any graph consisting of at most four vertices. In particular, for any graph on four vertices such that , Cluster Deletion is NP-hard on -free graphs, whereas it can be solved in polynomial time on - or paw-free graphs [15]. Interestingly, Cluster Deletion remains NP-hard on -free chordal graphs [3].
On the positive side, Cluster Deletion have been shown to be solved in polynomial time on cographs [11], proper interval graphs [3], split graphs [3], and -reducible graphs [2]. More precisely, iteratively picking maximum cliques defines a clustering on the graph which actually gives an optimal solution on cographs (i.e., -free graphs), as shown by Gao et al. in [11]. In fact, the greedy approach of selecting a maximum clique provides a -approximation algorithm, though not necessarily in polynomial-time [8]. As the problem is already NP-hard on chordal graphs [3], it is natural to consider subclasses of chordal graphs such as interval graphs and split graphs. Although for split graphs there is a simple polynomial-time algorithm, restricted to interval graphs only the complexity of a proper subclass, proper interval graphs, was determined by giving a solution that runs in polynomial-time [3]. Settling the complexity of Cluster Deletion on interval graphs, was left open [3, 2, 11].
For proper interval graphs, Bonomo et al. [3] characterized their optimal solution by consecutiveness of each cluster with respect to their natural ordering of the vertices. Based on this fact, a dynamic programming approach led to a polynomial-time algorithm. It is not difficult to see that such a consecutiveness does not hold on interval graphs, as potential clusters might require to break in the corresponding vertex ordering. Here we characterize an optimal solution of interval graphs whenever a cluster is required to break. In particular, we take advantage of their consecutive arrangement of maximal cliques and describe subproblems of maximal cliques containing the last vertex. One of our key observations is that the candidate clusters containing the last vertex can be enumerated in polynomial time given two vertex orderings of the graph. We further show that each such candidate cluster separates the graph in a recursive way with respect to optimal subsolutions, that enables to define our dynamic programming table to keep track about partial solutions. Thus, our algorithm for interval graphs suggests to consider a particular consecutiveness of a solution and apply a dynamic programming approach defined by two vertex orderings.
Furthermore, we complement the previously-known NP-harness of Cluster Deletion on -free chordal graphs, by providing a proper subclass of such graphs for which we prove that the problem remains NP-hard. This result is inspired and motivated by the very simple characterization of an optimal solution on split graphs: either a maximal clique constitutes the only non-edgeless cluster, or there are exactly two non-edgeless clusters whenever there is a vertex of the independent set that is adjacent to all the vertices of the clique except one [3]. Due to the fact that true twins belong to the same cluster in an optimal solution, it is natural to consider true twins at the independent set, as they are expected not to influence the solution characterization. Surprisingly, we show that Cluster Deletion remains NP-complete even on such a slight generalization of split graphs. We then study two different classes of such generalization of split graphs that can be viewed as the parallel of split graphs that admit disjoint clique-neighborhood and nested clique-neighborhood. For Cluster Deletion we provide polynomial-time algorithms on both classes of graphs.
2 Preliminaries
All graphs considered here are simple and undirected. A graph is denoted by with vertex set and edge set . We use the convention that and . The neighborhood of a vertex of is and the closed neighborhood of is . For , and . A graph is a subgraph of if and . For , the subgraph of induced by , , has vertex set , and for each vertex pair from , is an edge of if and only if and is an edge of . For , denotes the graph , that is a subgraph of and for , denotes the graph , that is an induced subgraph of . For two set of vertices and , we write to denote the edges that have one endpoint in and one endpoint in . Two adjacent vertices and are called true twins if , whereas two non-adjacent vertices and are called false twins if .
A clique of is a set of pairwise adjacent vertices of , and a maximal clique of is a clique of that is not properly contained in any clique of . An independent set of is a set of pairwise non-adjacent vertices of . For , the chordless path on vertices is denoted by and the chordless cycle on vertices is denoted by . For an induced path , the vertices of degree one are called endvertices. A vertex is universal in if and is isolated if . A graph is connected if there is a path between any pair of vertices. A connected component of is a maximal connected subgraph of . For a set of finite graphs , we say that a graph is -free if does not contain an induced subgraph isomorphic to any of the graphs of .
The problem of Cluster Deletion is formally defined as follows: given a graph , the goal is to compute the minimum set of edges such that every connected component of is a clique. A cluster graph is a -free graph, or equivalently, any of its connected components is a clique. Thus, the task of Cluster Deletion is to turn the input graph into a cluster graph by deleting the minimum number of edges. Let be a solution of Cluster Deletion such that is a clique. In such terms, the problem can be viewed as a vertex partition problem into . Each is simple called cluster. Edgeless clusters, i.e., clusters containing exactly one vertex, are called trivial clusters. The edges of are partitioned into internal and external edges: an internal edge has both its endpoints in the same cluster , whereas an external edge has its endpoints in different clusters and , for . Then, the goal of Cluster Deletion is to minimize the number of external edges which is equivalent to maximize the number of internal edges. We write to denote an optimal solution for Cluster Deletion of the graph , that is, a cluster subgraph of having the maximum number of edges. Given a solution , the number of edges incident only to the same cluster, that is the number of internal edges, is denoted by .
For a clique , we say that a vertex is -compatible if . We start with few preliminary observations regarding twin vertices. Notice that for true twins and , if belongs to any cluster then is -compatible.
Lemma 2.1** ([3]).**
Let and be true twins in . Then, in any optimal solution and belong to the same cluster.
The above lemma shows that we can contract true twins and look for a solution on a vertex-weighted graph that does not contain true twins. Even though false twins cannot be grouped into the same cluster as they are non-adjacent, we can actually disregard one of the false twins whenever their neighborhood forms a clique.
Lemma 2.2**.**
Let and be false twins in such that is a clique. Then, there is an optimal solution such that constitutes a trivial cluster.
Proof.
Let and be the clusters of and , respectively, in an optimal solution such that and . We construct another solution by replacing both clusters by and , respectively. To see that this indeed a solution, first observe that is adjacent to all the vertices of because , and forms a clique by the assumption. Moreover, since and , we know that , implying that the number of internal edges in the constructed solution is at least as the number of internal edges of the optimal solution. ∎
Moreover, we prove the following generalization of Lemma 2.1.
Lemma 2.3**.**
Let and be two clusters of an optimal solution and let and . If is -compatible then is not -compatible.
Proof.
Let be an optimal solution such that . Assume for contradiction that is -compatible. We show that is not optimal. Since is -compatible, we can move to and obtain a solution that contains the clusters and . Similarly, we construct a solution from , by moving to so that . Notice that the forms a clustering, since is -compatible. We distinguish between the following cases, according to the values and .
- •
If then , because .
- •
If then , because .
In both cases we reach a contradiction to the optimality of . Therefore, is not -compatible. ∎
Corollary 2.4**.**
Let be a cluster of an optimal solution and let . If there is a vertex that is -compatible and , then belongs to .
Proof.
Assume for contradiction that belongs to a cluster different than . Then, observe that is -compatible. Indeed, for any vertex of , we know , since is adjacent to and . Thus, by Lemma 2.3 we reach a contradiction, so that . ∎
3 Polynomial-time algorithm on interval graphs
Here we present a polynomial-time algorithm for the Cluster Deletion problem on interval graphs. A graph is an interval graph if there is a bijection between its vertices and a family of closed intervals of the real line such that two vertices are adjacent if and only if the two corresponding intervals intersect. Such a bijection is called an interval representation of the graph, denoted by . We identify the intervals of the given representation with the vertices of the graph, interchanging these notions appropriately. Whether a given graph is an interval graph can be decided in linear time and if so, an interval representation can be generated in linear time [10]. Notice that every induced subgraph of an interval graph is an interval graph.
Let be an interval graph. Instead of working with the interval representation of , we consider its sequence of maximal cliques. It is known that a graph with maximal cliques is an interval graph if and only if there is an ordering of the maximal cliques of , such that for each vertex of , the maximal cliques containing appear consecutively in the ordering (see e.g., [4]). A path following such an ordering is called a clique path of . Notice that a clique path is not necessarily unique for an interval graph. Also note that an interval graph with vertices contains at most maximal cliques. By definition, for every vertex of , the maximal cliques containing form a connected subpath in .
Given a vertex , we denote by the maximal cliques containing with respect to , where and are the first (leftmost) and last (rightmost) maximal cliques containing . Notice that holds. Moreover, for every edge of there is a maximal clique of that contains both endpoints of the edge. Thus, two vertices and are adjacent if and only if or .
For a set of vertices , we write and to denote the minimum and maximum value, respectively, among all with . Similarly, and correspond to the minimum and maximum value, respectively, with respect to .
With respect to the Cluster Deletion problem, observe that for any cluster of a solution, we know that where , as forms a clique. A vertex is called guarded by two vertices and if
[TABLE]
For a clique , observe that is -compatible if and only if there exists a maximal clique such that with .
Lemma 3.1**.**
Let be three vertices of such that is guarded by and . If and belong to the same cluster of an optimal solution and is -compatible then .
Proof.
To ease the presentation, for three non-negative numbers we write if holds. Without loss of generality, assume that . Assume for contradiction that belongs to another cluster . We apply Lemma 2.3 to either and or and . To do so, we need to show that is -compatible or is -compatible, as is already -compatible. Since is a cluster that contains , there is a maximal clique such that with .
We show that or . If then , because . As is guarded by and , we know that . Now observe that if then , implying that and are non-adjacent, reaching a contradiction to the fact that . Thus, which shows that . This means that or .
Hence, or belong to the maximal clique for which . Therefore, at least one of or is -compatible and by Lemma 2.3 we conclude that . ∎
Let be an ordering of the vertices such that . For every with , we define the following set of vertices:
[TABLE]
That is, contains all vertices that are guarded by and . We write to denote the value of and we simple write and instead of and . Notice that for a neighbor of with , we have either or . This means that all neighbors of that are totally included (i.e., all vertices such that ) belong to for any with . To distinguish such neighbors of , we define the following sets:
- •
contains the neighbors of such that (neighbors of in that partially overlap ).
- •
contains the neighbors of such that (neighbors of that are totally included within ).
In the forthcoming arguments, we restrict ourselves to the graph induced by . It is clear that the first maximal clique that contains a vertex of is , whereas the last maximal clique is .
We now explain the necessary sets that our dynamic programming algorithm uses in order to compute an optimal solution of . For two vertices with , we define the following:
- •
is the value of an optimal solution for Cluster Deletion of the graph .
To ease the notation, when we say a cluster of we mean a cluster of an optimal solution of . Notice that is the desired value for the whole graph , since .
Our task is to construct the values for by taking into account all possible clusters that contain . To do so, we show that (i) the number of clusters containing in is polynomial and (ii) each such candidate cluster containing separates the graph in a recursive way with respect to optimal subsolutions.
Observe that if then if and only if , whereas if and only if ; in the latter case, it is not difficult to see that , according to the definition of . Thus, whenever holds, we have . The candidates of a cluster of containing lie among and . Let us show with the next two lemmas that we can restrict ourselves into a polynomial number of such candidates. To avoid repeating ourselves, in the forthcoming statements we let be two vertices with .
Lemma 3.2**.**
*Let be a cluster of containing . If there is a vertex such that then there is a maximal clique with such that and . *
Proof.
Since , we know that there is a maximal clique for which with . We show that all other vertices of are guarded by and . Notice that for every vertex we already know that and . Thus, for every vertex we have and . This means that all vertices of are guarded by and . Moreover, since , we know that all vertices of are -compatible. Therefore, we apply Lemma 3.1 to every vertex of , showing that . Furthermore, there is no vertex of that belongs to , because . ∎
By Lemma 3.2, we know that we have to pick the entire set for constructing candidates to form a cluster that contains and some vertices of . As there are at most choices for , we get a polynomial number of such candidate sets. We next show that we can construct polynomial number of candidate sets that contain and vertices of . For doing so, we consider the vertices of increasingly ordered with respect to their first maximal clique. More precisely, let be an increasingly order of the vertices of such that . The right part of Figure 1 illustrates the corresponding case.
Lemma 3.3**.**
*Let be a cluster of containing and let .If then every vertex of that is -compatible belongs to . *
Proof.
Let be a vertex of . We show that is guarded by and . By the definition of , we know that . Moreover, observe that holds by the fact that and . Thus, we apply Lemma 3.1 to , because and is -compatible, showing that as desired. ∎
For , let . Observe that each may be an empty set. On the part , all vertices are grouped into the sets . Similar to , let . Then, all vertices of are -compatible and all vertices of are -compatible. Figure 1 depicts the corresponding sets.
Lemma 3.4**.**
Let be a cluster of containing . Then, there is such that .
Proof.
Assume for contradiction that no set is contained in . Let and let . Notice that because of the assumption as there are no other neighbors of in . Then, holds, because . We show that . Observe that . If then clearly . Assume that and let be a non-empty subset of that forms a cluster in . Then, all vertices of are -compatible and all vertices of are -compatible, because . Thus, we reach a contradiction by Lemma 2.3 to the optimality of . This means that there is a vertex that is contained in together with . Therefore, by Lemma 3.2, there is a set that is included in . ∎
All vertices of a cluster containing belong to . Thus, can be partitioned into and . Also notice that for some . Combined with the previous lemmas, we can enumerate all such subsets of in polynomial-time. In particular, we first build all candidates for , which are exactly the sets by Lemma 3.2 and Lemma 3.4. Then, for each of such candidate , we apply Lemma 3.3 to construct all subsets containing the last vertices of . Thus, there are at most number of candidate sets from the vertices of that belong to the same cluster with .
3.1 Splitting into partial solutions
We further partition the vertices of . Given a pivot group , we consider the vertices that lie on the right part of . More formally, for , we define the set
[TABLE]
The reason of breaking the vertices of the part into sets is the following.
Lemma 3.5**.**
Let be a cluster of such that , for . Then, for any two vertices and , there is no cluster of that contains both of them.
Proof.
First observe that . We consider two cases for , depending on whether or not. Assume that . If , then by Lemma 3.2, which implies that . If then .
Now assume that . If , then does not belong to , so that . If , then we show that does not belong to a cluster with any vertex of . Assume for contradiction that belongs to a cluster such that . This means that with and . Then is -compatible and is -compatible, as both and belong to . Therefore, by Lemma 2.3 we reach a contradiction to and belonging to different clusters. ∎
For a non-empty set , we write to denote the following solutions:
- •
, where is the vertex of having the smallest and is the vertex of having the largest .
Having this notation, observe that , for any with . However, it is important to notice that does not necessarily represent the optimal solution of , since the vertices of may not be consecutive with respect to , so that is only a subset of in the corresponding solution for . Under the following assumptions, with the next result we show that for the chosen sets we have .
Observation 3.6**.**
Let be two vertices with and let , for any maximal clique of with .
- (i)
If then , where and . 2. (ii)
If then , where and .
Proof.
We prove the case for . As each contains vertices of , we have . Observe that either or . In both cases we show that . Assume that there is a vertex with . Then as , and by the consecutiveness of the clique path. This shows that because . Thus, . We show that . If there is a vertex in with then leading to a contradiction that . Hence we have and . Moreover, observe that by the definition of , we already know that . Now it remains to notice that for every vertex with and we have . This follows from the fact that and . Therefore we get . Completely symmetric arguments along the previous lines, shows the case for . ∎
Given the clique path , a clique-index is an integer . Let be two clique-indices such that and . We denote by the minimum value of among all vertices of having . Clearly, holds. A pair of clique-indices is called admissible pair for a vertex , if both and hold. Given an admissible pair , we define the following set of vertices:
- •
.
Observe that all vertices of induce a clique in , because . We say that a vertex crosses the pair if and . It is not difficult to see that for a vertex that crosses , we have . We prove the following properties of .
Lemma 3.7**.**
*Let be two vertices with and let be an admissible pair for . Moreover, let be the vertices of having the smallest and largest , respectively. If the vertices of form a cluster in then the following statements hold: *
. 2. 2.
If holds for a vertex , then crosses . 3. 3.
Every vertex of does not belong to the same cluster with any vertex of . 4. 4.
Every vertex that crosses does not belong to the same cluster with any vertex having .
Proof.
First we show that . Assume that there is a vertex . Then and is distinct from because, by definition, . Also notice that implies and . By the second inequality, we get . Suppose that . As we already know that , we conclude that leading to a contradiction that . Thus we have and , showing that . This means that , so that .
For the second statement, observe that if then . Since , we conclude that by the first statement. Thus holds, implying that crosses .
With respect to the third statement, observe that no vertex of belongs to the clique . This means that all vertices of belong to both sets and . Thus Lemma 3.5 and the first statement show that no two vertices and belong to the same cluster.
For the fourth statement, let be a vertex that crosses . By the first statement we know that . If then and the third statement show that and do not belong to the same cluster. Suppose that . If then contradicting the fact that . Putting together, we have . Now assume for contradiction that and belong to the same cluster . By the fact that , observe that . We consider the graph induced by . We show that there is a vertex of that is -compatible and there is a vertex of that is -compatible. Notice that is -compatible, because crosses so that . To see that there is a vertex of that is -compatible, choose to be the vertex of having the smallest . This means that . Then is adjacent to every vertex of because and . Thus, is -compatible. Therefore, Lemma 2.3 shows the desired contradiction, implying that and do not belong to the same cluster. ∎
Notice that the number of admissible pairs for is polynomial because there are at most choices for each clique-index. Moreover, if then . A pair of clique-indices with is called bounding pair for if either holds, or crosses . Given an bounding pair for , we write to denote the set of bounding pairs for such that
- •
, whenever holds, and
- •
, otherwise.
Observe that if holds, then describes all bounding pairs for with no restriction, regardless of . On the other hand, if and hold, then is not a bounding pair for . In fact, we will show that the latter case will not be considered in our partial subsolutions. For any admissible pair and any bounding pair for , observe that and . Intuitively, an admissible pair corresponds to the cluster containing , whereas a bounding pair forbids to select certain vertices as they have already formed a cluster that does not contain .
Our task is to construct subsolutions over all admissible pairs for with the property that the vertices of form a cluster. To do so, we consider a vertex with and a cluster containing . Let be an admissible pair for such that . The previous results suggest to consider solutions in which the vertices of form a cluster in an optimal solution. It is clear that if then . Moreover, if , then no vertex of belongs to . Thus, we need to construct solutions for , whenever is a bounding pair for and the vertices of form a cluster. Such an idea is formally described in the following restricted solutions.
Let be a bounding pair for . We call the following solution, -restricted solution:
- •
is the value of an optimal solution for Cluster Deletion of the graph such that the vertices of form a cluster.
Hereafter, we assume that with corresponds to an empty set. Figure 2 illustrates a partition of the vertices with respect to . Notice that an optimal solution without any restriction is described in terms of by , since no vertex of belongs to . Therefore, corresponds to the optimal solution of the whole graph . As base cases, observe that if contains at most one vertex then for all bounding pairs , since there are no internal edges. For a set , we write to denote the number . With the following result, we describe a recursive formulation for the optimal solution , which is our central tool for our dynamic programming algorithm.
Lemma 3.8**.**
Let be a bounding pair for . Then,
[TABLE]
where and .
Proof.
We first argue that corresponds to the correct cluster containing . Observe that , because is a bounding pair for , so that whenever holds. By Lemmas 3.3 and 3.2, there are and , where and , such that . We show that such a set is obtained from a correct choice among the described . Assume first that . Then , because for every vertex of we know the , so that . This means that for every bounding pair , as described in the given formula. Now assume that . Since crosses , Lemma 3.7 (4) shows that is not contained in a cluster with a vertex having . Thus, for any vertex we know that where . This means that there is a set that contains exactly the vertices of such that . Therefore, holds, as desired.
Next, we consider the sets and . We show that and correctly store the optimal values of each part. To do so, we show first that the vertex sets of each part correspond to the correct sets and, then, each pair and is indeed a bounding pair for the last vertex of and , respectively. We start with some preliminary observations. Notice that , because , which means that every vertex does not belong to . Since contains only vertices of and , no vertex of is considered in the described formula, as required in . By the properties of and , we have the following:
- •
Let . Then, either or crosses the pair . Moreover, if a vertex crosses then .
- •
Let . Then, either or crosses the pair . Moreover, if a vertex crosses but does not cross then .
Let be the set of vertices of that cross and let be the set of vertices of that cross . The previous properties imply that we can partition to the vertices of and the vertices of that belong to . Similarly, is partitioned to the vertices of and the vertices of that belong to . See Figure 2 for an exposition of the corresponding sets. Thus, we have the following partitions for and :
- •
, where .
- •
, where .
Let be the vertices of with and . We now show that corresponds to the optimal solution of the graph such that the vertices of form a cluster. Assume for contradiction that there is a vertex of that does not belong to . First notice that if and only if is an empty set. In such a case, by Observation 3.6, we have , contradicting the existence of such a vertex . Suppose that . Then or , because is the first maximal clique of all vertices of . If then and . This means that for every , we have , reaching a contradiction. If then and is empty, reaching again a contradiction. Suppose now that . It is clear that . If then , so that . Assume that . Now observe that if , then is a vertex of . Thus, . If then because . This means that . If then , leading to a contradiction that , and if then , leading to a contradiction that . Thus, we know that and . This, however, implies that , reaching a contradiction to the fact that . Therefore, we have shown that an optimal solution of the vertices of corresponds to an optimal solution of the vertices of .
Furthermore, we argue that is a bounding pair for in . Assume that . If then , because . As , we have . Then, if , we get , which implies that , showing that is a bounding pair for . Assume next that . Then, , implying that . Thus, for any value of we know that is a bounding pair for . Therefore, corresponds to the optimal solution of the graph .
Next we consider the vertices of , in order to show that corresponds to an optimal solution of the graph . Let be the vertices of with and . Assume for contradiction that there is a vertex of that does not belong to . Every vertex of belongs to , so that . This means that , since , and , since . Then we obtain , showing that . Thus we reach a contradiction, because . Hence, the vertices described in correspond to the vertices of , as desired.
With respect to , it remains to show that is a bounding pair for . If then , which means that is a bounding pair for . Next suppose that . If then , contradicting the fact that . Thus, we know that . If further , then , contradicting . Hence, we conclude that crosses , showing that is indeed a bounding pair for .
To complete the proof, observe that no vertex of belongs to the same cluster with a vertex of by Lemma 3.7 (3). Thus, the optimal solutions described by and do not overlap in . Therefore, the claimed formula holds. ∎
Now we are ready to obtain our main result, namely a polynomial-time algorithm for Cluster Deletion on interval graphs.
Theorem 3.9**.**
Cluster Deletion* is polynomial-time solvable on interval graphs.*
Proof.
We describe a dynamic programming algorithm that computes based on Lemma 3.8. In a preprocessing step, we first compute two orderings of the vertices according to their first and last maximal cliques. Then we visit all vertices in ascending order with respect to and for each such vertex we consider the vertices with in descending order with respect to . In such a way, we construct the sets . We use a table to store the values of each . At the end, we output the maximum value of that corresponds to , as already explained. Regarding the running time, observe that the number of our table entries is at most , as each table index is bounded by . Moreover, computing a single table entry requires time, since we take the maximum of at most table entries. Therefore, the overall running time of the algorithm is . ∎
4 Cluster Deletion on a generalization of split graphs (split-twin graphs)
A graph is a split graph if can be partitioned into a clique and an independent set , where is called a split partition of . Split graphs are characterized as -free graphs [9]. They form a subclass of the larger and widely known graph class of chordal graphs, which are the graphs that do not contain induced cycles of length or more as induced subgraphs. In general, a split graph can have more than one split partition and computing such a partition can be done in linear time [16].
Hereafter, for a split graph , we denote by a split partition of in which is a maximal clique. It is known that Cluster Deletion is polynomial-time solvable on split graphs [3]. In fact, the algorithm given in [3] is characterized by its simplicity due to the following elegant characterization of an optimal solution: if there is a vertex such that and has a neighbor in then the non-trivial clusters of an optimal solution are and ; otherwise, the only non-trivial cluster of an optimal solution is [3]. Here we study whether such a simple characterization can be extended into more general classes of split graphs. Due to Lemma 2.1, it is natural to consider true twins at the independent set, as they are grouped together in an optimal solution and they are expected not to influence the solution characterization. Surprisingly, we show that Cluster Deletion remains NP-complete even on such a slight generalization of split graphs. Before presenting our NP-completeness proof, let us first show that such graphs form a proper subclass of -free chordal graphs. We start by giving the formal definition of such graphs.
Definition 4.1**.**
A graph is called split-twin graph if its vertex set can be partitioned into and such that is a clique and the vertices of each connected component of form true twins in .
It is clear that in a split-twin graph the following holds: (i) each connected component of is a clique and forms a true-twin set in , and (ii) contracting the connected components of results in a split graph, denoted by . Figure 3 illustrates the induced subgraphs that are forbidden in a split-twin graph.
Proposition 4.2**.**
A graph is split-twin if and only if it does not contain any of the graphs as induced subgraphs.
Proof.
Let be the list of such subgraphs, i.e., . We show that split-twin graphs are exactly the -free graphs. It is clear that any subgraph of does not contain true twins. Moreover, besides and , each of the rest of the subgraphs contains an induced , which implies that all such subgraphs of are not split-twin graphs. Thus, if a graph contains one of the subgraphs of then is not a split-twin graph.
We show that any -free graph is split-twin. If is a split graph then, by definition, is split-twin. Assume that is not a split graph. Since does not contain or and split graphs are exactly the -free graphs, there is an induced in . Let and be the two edges of an induced . We show that the endpoints of at least one of the two edges are true twins. Assume for contradiction that neither nor are true twins in . Let be a neighbor of that is non-adjacent to , and let be a neighbor of that is non-adjacent to . We show that the vertices of induce one of the subgraphs of , contradicting the fact that no pair of vertices form true twins. If and then there is an induced or depending on whether and are adjacent or not. Thus, or . Observe that if is adjacent to at least one of or then is adjacent to both and ; otherwise, induce a . By symmetric arguments we know that either is adjacent to both or to none. Without loss of generality, assume that .
- •
Suppose that and are non-adjacent. If then there is a induced by . Moreover, by the previous argument, we know that if then , which implies a in induced by . Thus if we obtain a induced subgraph of .
- •
Suppose that and are adjacent. If , then all six vertices induce an graph. Otherwise, we know that , showing that all six vertices induce a graph , where and are the degree four vertices.
Thus in all cases we obtain an induced subgraph of , reaching to a contradiction that being an -free graph. This means that for any we know that at least one of the two edges contains true twin vertices in . By iteratively picking such true twins and contracting them into a new vertex, results in a graph that does not contain . Therefore is a split graph, implying that is a split-twin graph. ∎
Thus by Proposition 4.2, split-twin graphs form a proper subclass of -free chordal graphs, i.e., of -free graphs. Now let us show that decision version of Cluster Deletion is NP-complete on split-twin graphs. For the reduction we will use the NP-hard Edge Weighted Cluster Deletion problem. In the Edge Weighted Cluster Deletion problem, each edge of the input graph is associated with a weight and the objective is to construct a clustered graph having the maximum total (cumulative) weight of edges. It is known that Edge Weighted Cluster Deletion remains NP-hard on split graphs even when (i) all edges inside the clique have weight one, (ii) all edges incident to a vertex have the same weight , and (iii) [3]. We abbreviate the latter problem by EWCD and denote by an instance of the problem where is a split partition of the vertices of and is the total weight of the edges in a cluster solution for .
Theorem 4.3**.**
The decision version of Cluster Deletion is NP-hard on split-twin graphs.
Proof.
We prove the NP-hardness of the Cluster Deletion problem on split-twin graphs by giving a polynomial reduction from restricted version EWCD of Edge Weighted Cluster Deletion on split graphs which is known to be NP-hard [3]. Let be an instance of EWCD, where is a split graph. From , we build a split-twin graph by keeping the same clique , and for every vertex we apply the following:
- •
We replace by true twin vertices (i.e., by a -clique) such that for any vertex we have . That is, their neighbors outside are exactly . Moreover, the set of vertices form .
By the above construction, it is not difficult to see that is a split-twin graph, since the graph induced by is a disjoint union of cliques and two adjacent vertices of are true twins in . Also observe that the construction takes polynomial time because is at most . We claim that there is an edge weighted cluster solution for with total weight at least if and only if there is a cluster solution for having at least edges.
Assume that there is a cluster solution for with total weight at least . From , we construct a solution for having the desired number of edges. There are three types of clusters in :
- (a)
Cluster formed only by vertices of the clique , i.e., , where . We keep such clusters in . We denote by the total weight of clusters of type (a). Notice that since the weight of edges having both endpoints in are all equal to one, corresponds to the number of edges in .
- (b)
Cluster formed only by one vertex , i.e., . In we replace such cluster by the corresponding clique having exactly edges. It is clear that total weight of such clusters do not contribute to the value of .
- (c)
Cluster formed by the vertices , where and . As the weights of the edges between the vertices of is one, the total number of weights in such a cluster is . Let be the total weight of clusters of type (c). In we replace by the vertices of and obtain a cluster having number of edges.
Now observe that in we have total weight, which implies . Thus, in we have at least edges, giving the desired bound.
For the opposite direction, assume that there is a cluster solution for having at least edges. All vertices of are true twins and, thus, by Lemma 2.1 we know that they belong to the same cluster in . Thus, any cluster of has one of the following forms: (i) , where , (ii) , (iii) , where . This means that all internal edges having both endpoints in contribute to the value of by . Moreover, observe that for any internal edge of of the form with and , we know that there are exactly internal edges incident to and the vertices of . Thus such internal edges of correspond to exactly one internal edge of having weight where (because ) and is the vertex of associated with . Hence, all internal edges outside each in correspond to either a weighted internal edge in or to the same unweighted edge of the clique in . Therefore, there is an edge weighted solution having weight at least . ∎
4.1 Polynomial-time algorithms on subclasses of split-twin graphs
Due to the hardness result given in Theorem 4.3, it is natural to consider subclasses of split-twin graphs related to their analogue subclasses of split graphs. We consider two such subclasses. One of them corresponds to the split-twin graphs such that the vertices of have no common neighbor in the clique, unless they are true or false twins. The other one corresponds to threshold graphs (i.e., split graphs in which the vertices of the independent set have nested neighborhood) and form the split-twin graphs in which the vertices of have a nested neighborhood. We formally define such graphs and give polynomial-time algorithms for Cluster Deletion on both graph classes. For a vertex we write to denote the set .
Definition 4.4**.**
A split-twin graph with partition on its vertices is called 1-split-twin graph if for any two vertices , either or .
It is not difficult to see that in a 1-split-twin graph, any two vertices of having a common neighbor in have exactly the same neighborhood in .
Theorem 4.5**.**
Cluster Deletion* is polynomial-time solvable on 1-split-twin graphs.*
Proof.
Let be a 1-split-twin graph with partition . First observe that if is disconnected then contains isolated cliques, i.e., true twins having no neighbor in . Thus we can restrict ourselves to a connected graph , since by Lemma 2.1 each isolated clique is contained in exactly one cluster of an optimal solution. We now show that all vertices of that have a common neighbor in are true twins. Let and be two vertices of such that . All vertices of are adjacent to both and . Assume that there is a vertex that is adjacent to and non-adjacent to . If then by the definition of split-twin graphs and are true twins which contradicts the assumption of and . Otherwise, and are non-adjacent and since we reach a contradiction to the definition of 1-split-twin graphs. Thus, all vertices of that have a common neighbor in are true twins.
We partition the vertices of into true twin classes , such that each contains true twins of . From the previous discussion, we know that any vertex of is adjacent to all the vertices of exactly one class ; otherwise, there are vertices of different classes in that have common neighbor. For a class , we partition the vertices of into true twin classes such that .
We claim that in an optimal solution , the vertices of each class with constitute a cluster. To see this, observe first that the vertices of , , are true twins, and by Lemma 2.1 they all belong to the same cluster of . Also, by Lemma 2.1 we know that all the vertices of belong to the same cluster of . Moreover, all vertices between different classes , are non-adjacent and are -compatible. Since every vertex of is non-adjacent to all the vertices of , we know that any cluster of that contains is of the form either or . Assume that there is a cluster that contains with . Then, we substitute the vertices of by the vertices of and obtain a solution of at least the same size, because implies . Thus, all vertices of each class with constitute a cluster in an optimal solution .
This means that we can safely remove the vertices of with , by constructing a cluster that contains only . Hence, we construct a graph from , in which there are only matched pair of classes such that (i) all sets are non-empty except possibly the set , (ii) , (iii) , (iv) is a clique, and (v) is a clique. Our task is to solve Cluster Deletion on , since for the rest of the vertices we have determined their cluster. By Lemma 2.1, observe that if the vertices of belong to the same cluster then the vertices of each and constitute two respectively clusters. Thus, for each set of vertices we know that either one of or constitutes a cluster in . This boils down to compute a set of matched pairs from the classes, having the maximum value
[TABLE]
Let and be two pairs of classes such that . We show that if then . Assume for contradiction that and . Observe that , because is -compatible. Similarly, we know that . This however, shows that , contradicting the fact that . Thus implies .
This means that we can consider the pair of classes in a decreasing order according to their number of vertices . With a simple dynamic programming algorithm, starting from the largest ordered pair we know that either belongs to or not. In the former, we add to the optimal value of and in the latter we know that no pair belongs to giving a total value of . By choosing the maximum between the two values, we construct a table of size needed for the dynamic programming. Computing the twin classes and the partition takes linear time in the size of and sorting the pair of classes can be done time, since is bounded by . Thus, the total running time is , as the dynamic programming for computing requires time. Therefore, all steps can be carried out in linear time for a 1-split-twin graph . ∎
Definition 4.6**.**
A split-twin graph with partition on its vertices is called threshold-twin graph if the vertices of can be ordered such that for any with , we have .
Theorem 4.7**.**
Cluster Deletion* is polynomial-time solvable on threshold-twin graphs.*
Proof.
Let be a threshold-twin graph with partition . We show that there is no induced path on four vertices, , in . Assume for contradiction that there is a in . Since is a clique and is a disjoint union of cliques, at least one of , say , belongs to . If then because , which gives a contradiction as and are not true twins. Otherwise, we have , so that because and are not true twins . The latter, results again in a contradiction because and . Thus, is a -free graph. Therefore, by the polynomial-time algorithm for Cluster Deletion on -free graphs [11], we obtain a solution for Cluster Deletion on . ∎
5 Concluding remarks
It is notable that our algorithm for interval graphs, heavily relies on the linear structure obtained from their clique paths. Such an observation, leads us to consider few open questions regarding two main directions. On the one hand, it seems tempting to adjust our algorithm for other vertex partitioning problems on interval graphs within a more general framework, as already have been studied for particular graph properties [5, 12, 19, 20, 24]. On the other hand, it is reasonable to ask whether our approach works for Cluster Deletion on graphs admitting similar linear structure such as permutation graphs, or graphs having bounded linear related parameter. Towards the latter direction, observe that Cluster Deletion as a vertex partitioning problem seems to be expressible in monadic second order logic of second type with quantification over vertex sets and edge sets. Therefore, Cluster Deletion can be solved in linear time on graphs of bounded treewidth by using Courcelle’s machinery [7].
Although for other structural parameters it seems rather difficult to obtain a similar result, it is still interesting to settle the complexity of Cluster Deletion on distance hereditary graphs that admit constant clique-width [14]. In fact, we would like to settle the case in which from a given cograph (-free graph) we can append degree-one vertices. This comes in conjunction with the 1-split-twin graphs, as they can be seen as a degree-one extension of a clique.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] N. Bansal, A. Blum, and S. Chawla. Correlation clustering. Machine Learning , 56:89–113, 2004.
- 2[2] F. Bonomo, G. Durán, A. Napoli, and M. Valencia-Pabon. A one-to-one correspondence between potential solutions of the cluster deletion problem and the minimum sum coloring problem, and its application to P 4 subscript 𝑃 4 P_{4} -sparse graphs. Inf. Proc. Lett. , 115:600–603, 2015.
- 3[3] F. Bonomo, G. Durán, and M. Valencia-Pabon. Complexity of the cluster deletion problem on subclasses of chordal graphs. Theor. Comp. Science , 600:59–69, 2015.
- 4[4] A. Brandstädt, V. B. Le, and J. Spinrad. Graph Classes: A Survey . Society for Industrial and Applied Mathematics, 1999.
- 5[5] B. Bui-Xuan, J. A. Telle, and M. Vatshelle. Fast dynamic programming for locally checkable vertex subset and vertex partitioning problems. Theor. Comput. Sci. , 511:66–76, 2013.
- 6[6] M. Charikar, V. Guruswami, and A. Wirth. Clustering with qualitative information. In Proceedings of FOCS 2003 , pages 524–533, 2003.
- 7[7] B. Courcelle. The monadic second-order logic of graphs i: Recognizable sets of finite graphs. Information and Computation , 85:12–75, 1990.
- 8[8] A. Dessmark, J. Jansson, A. Lingas, E.-M. Lundell, and M. Persson. On the approximability of maximum and minimum edge clique partition problems. Int. J. Found. Comput. Sci. , 18:217–226, 2007.
