Convex optimization for the densest subgraph and densest submatrix   problems

Polina Bombina; Brendan Ames

arXiv:1904.03272·math.OC·April 9, 2019

Convex optimization for the densest subgraph and densest submatrix problems

Polina Bombina, Brendan Ames

PDF

1 Repo

TL;DR

This paper introduces a convex relaxation approach using nuclear norm minimization to efficiently identify dense subgraphs in large graphs, providing theoretical guarantees and empirical validation for its effectiveness.

Contribution

It proposes a novel convex relaxation method for the densest subgraph problem, with theoretical recovery guarantees and a practical first-order algorithm.

Findings

01

Exact recovery of dense subgraphs under certain probabilistic models

02

The proposed method outperforms traditional approaches in simulated and real-world networks

03

Empirical results confirm the theoretical recovery thresholds

Abstract

We consider the densest $k$ -subgraph problem, which seeks to identify the $k$ -node subgraph of a given input graph with maximum number of edges. This problem is well-known to be NP-hard, by reduction to the maximum clique problem. We propose a new convex relaxation for the densest $k$ -subgraph problem, based on a nuclear norm relaxation of a low-rank plus sparse decomposition of the adjacency matrices of $k$ -node subgraphs to partially address this intractability. We establish that the densest $k$ -subgraph can be recovered with high probability from the optimal solution of this convex relaxation if the input graph is randomly sampled from a distribution of random graphs constructed to contain an especially dense $k$ -node subgraph with high probability. Specifically, the relaxation is exact when the edges of the input graph are added independently at random, with edges within a…

Figures4

Click any figure to enlarge with its caption.

Tables1

Table 1. Table 1: Densest subgraphs extracted with ADMM.

Graph	Number of vertices	Number of edges	Size of dense subgraph	Running time
JAZZ	198	2742	100	0.605735 sec
EMAIL	1133	5451	289	20.139186 sec

Equations155

[P_{\Omega}(\boldsymbol{M})]=\left\{\begin{array}[]{rl}M_{ij},&\mbox{if }(i,j)\in\Omega\\ 0,&\mbox{otherwise.}\end{array}\right.

[P_{\Omega}(\boldsymbol{M})]=\left\{\begin{array}[]{rl}M_{ij},&\mbox{if }(i,j)\in\Omega\\ 0,&\mbox{otherwise.}\end{array}\right.

d (G (\hat{V})) = \frac{1}{k} ((2 k) - \frac{1}{2} ∥ P_{Ω} (\hat{X}) ∥_{0}),

d (G (\hat{V})) = \frac{1}{k} ((2 k) - \frac{1}{2} ∥ P_{Ω} (\hat{X}) ∥_{0}),

\begin{array}[]{ll}\min&\operatorname{{Tr}}(\boldsymbol{Y}\boldsymbol{e}\boldsymbol{e}^{T})\\ \operatorname{{s.t.}}&\operatorname{{Tr}}(\boldsymbol{X}\boldsymbol{e}\boldsymbol{e}^{T})\geq k^{2},\;P_{\Omega}(\boldsymbol{X}-\boldsymbol{Y})=\boldsymbol{0},\;\operatorname{{rank}}(\boldsymbol{X})=1\\ &\boldsymbol{X}=\boldsymbol{X}^{T},\;\boldsymbol{Y}=\boldsymbol{Y}^{T},\;\boldsymbol{X}\in{\{0,1\}}^{V\times V},\;\boldsymbol{Y}\in{{\{0,1\}}^{V\times V}}.\end{array}

\begin{array}[]{ll}\min&\operatorname{{Tr}}(\boldsymbol{Y}\boldsymbol{e}\boldsymbol{e}^{T})\\ \operatorname{{s.t.}}&\operatorname{{Tr}}(\boldsymbol{X}\boldsymbol{e}\boldsymbol{e}^{T})\geq k^{2},\;P_{\Omega}(\boldsymbol{X}-\boldsymbol{Y})=\boldsymbol{0},\;\operatorname{{rank}}(\boldsymbol{X})=1\\ &\boldsymbol{X}=\boldsymbol{X}^{T},\;\boldsymbol{Y}=\boldsymbol{Y}^{T},\;\boldsymbol{X}\in{\{0,1\}}^{V\times V},\;\boldsymbol{Y}\in{{\{0,1\}}^{V\times V}}.\end{array}

\begin{array}[]{ll}\min&\|\boldsymbol{X}\|_{*}+\gamma\operatorname{{Tr}}(\boldsymbol{Y}\boldsymbol{e}\boldsymbol{e}^{T})\\ \operatorname{{s.t.}}&\operatorname{{Tr}}(\boldsymbol{X}\boldsymbol{e}\boldsymbol{e}^{T})=k^{2},\;P_{\Omega}(\boldsymbol{X}-\boldsymbol{Y})=\boldsymbol{0},\;\boldsymbol{0}\leq\boldsymbol{X}\leq\boldsymbol{e}\boldsymbol{e}^{T},\;\boldsymbol{0}\leq\boldsymbol{Y},\end{array}

\begin{array}[]{ll}\min&\|\boldsymbol{X}\|_{*}+\gamma\operatorname{{Tr}}(\boldsymbol{Y}\boldsymbol{e}\boldsymbol{e}^{T})\\ \operatorname{{s.t.}}&\operatorname{{Tr}}(\boldsymbol{X}\boldsymbol{e}\boldsymbol{e}^{T})=k^{2},\;P_{\Omega}(\boldsymbol{X}-\boldsymbol{Y})=\boldsymbol{0},\;\boldsymbol{0}\leq\boldsymbol{X}\leq\boldsymbol{e}\boldsymbol{e}^{T},\;\boldsymbol{0}\leq\boldsymbol{Y},\end{array}

q - p \geq c_{1} max {max {σ_{q}^{2}, σ_{p}^{2}} \frac{lo g N}{k}, \frac{lo g N}{k} σ_{p}^{2} N, \frac{( lo g N ) ^{3/2}}{k}}

q - p \geq c_{1} max {max {σ_{q}^{2}, σ_{p}^{2}} \frac{lo g N}{k}, \frac{lo g N}{k} σ_{p}^{2} N, \frac{( lo g N ) ^{3/2}}{k}}

γ \in (\frac{c _{2}}{( q - p ) k}, \frac{c _{3}}{( q - p ) k}),

γ \in (\frac{c _{2}}{( q - p ) k}, \frac{c _{3}}{( q - p ) k}),

\begin{array}[]{cl}\displaystyle\min_{\boldsymbol{X},\boldsymbol{Y}\in{\{0,1\}}^{M\times N}}&\operatorname{{Tr}}(\boldsymbol{Y}\boldsymbol{e}\boldsymbol{e}^{T})\\ \operatorname{{s.t.}}&\operatorname{{Tr}}(\boldsymbol{X}\boldsymbol{e}\boldsymbol{e}^{T})=mn,\;P_{\Omega}(\boldsymbol{X}-\boldsymbol{Y})=\boldsymbol{0},\;\operatorname{{rank}}\boldsymbol{X}=1,\end{array}

\begin{array}[]{cl}\displaystyle\min_{\boldsymbol{X},\boldsymbol{Y}\in{\{0,1\}}^{M\times N}}&\operatorname{{Tr}}(\boldsymbol{Y}\boldsymbol{e}\boldsymbol{e}^{T})\\ \operatorname{{s.t.}}&\operatorname{{Tr}}(\boldsymbol{X}\boldsymbol{e}\boldsymbol{e}^{T})=mn,\;P_{\Omega}(\boldsymbol{X}-\boldsymbol{Y})=\boldsymbol{0},\;\operatorname{{rank}}\boldsymbol{X}=1,\end{array}

\begin{array}[]{cl}\displaystyle\min&\|\boldsymbol{X}\|_{*}+\gamma\operatorname{{Tr}}(\boldsymbol{Y}\boldsymbol{e}\boldsymbol{e}^{T})\\ \operatorname{{s.t.}}&\operatorname{{Tr}}(\boldsymbol{X}\boldsymbol{e}\boldsymbol{e}^{T})=mn,\;P_{\Omega}(\boldsymbol{X}-\boldsymbol{Y})=\boldsymbol{0},\;\boldsymbol{0}\leq\boldsymbol{X}\leq\boldsymbol{e}\boldsymbol{e}^{T},\;\boldsymbol{0}\leq\boldsymbol{Y},\end{array}

\begin{array}[]{cl}\displaystyle\min&\|\boldsymbol{X}\|_{*}+\gamma\operatorname{{Tr}}(\boldsymbol{Y}\boldsymbol{e}\boldsymbol{e}^{T})\\ \operatorname{{s.t.}}&\operatorname{{Tr}}(\boldsymbol{X}\boldsymbol{e}\boldsymbol{e}^{T})=mn,\;P_{\Omega}(\boldsymbol{X}-\boldsymbol{Y})=\boldsymbol{0},\;\boldsymbol{0}\leq\boldsymbol{X}\leq\boldsymbol{e}\boldsymbol{e}^{T},\;\boldsymbol{0}\leq\boldsymbol{Y},\end{array}

q - p \geq c_{1} max {max {σ_{q}^{2}, σ_{p}^{2}} \frac{lo g N _{m a x}}{n _{m i n}}, \frac{lo g N _{m a x}}{n _{m i n}} σ_{p}^{2} N_{m a x}, \frac{( lo g N _{m a x} ) ^{3/2}}{n _{m i n}}}

q - p \geq c_{1} max {max {σ_{q}^{2}, σ_{p}^{2}} \frac{lo g N _{m a x}}{n _{m i n}}, \frac{lo g N _{m a x}}{n _{m i n}} σ_{p}^{2} N_{m a x}, \frac{( lo g N _{m a x} ) ^{3/2}}{n _{m i n}}}

\frac{u ˉ v ˉ ^{T}}{mn} + W - λ e e^{T} + γ e e^{T} - Ξ + Λ

\frac{u ˉ v ˉ ^{T}}{mn} + W - λ e e^{T} + γ e e^{T} - Ξ + Λ

Tr (Λ^{T} (\overset{ˉ}{X} - e e^{T}))

Tr (Ξ^{T} \overset{ˉ}{Y})

W^{T} \overset{ˉ}{u} = 0, W \overset{ˉ}{v} = 0, ∥ W ∥

W_{ij} = λ - \frac{1}{mn} - Λ_{ij} =: \tilde{λ} - Λ_{ij},

W_{ij} = λ - \frac{1}{mn} - Λ_{ij} =: \tilde{λ} - Λ_{ij},

W_{ij} = - λ (\frac{ν _{j}}{m - ν _{j}}),

W_{ij} = - λ (\frac{ν _{j}}{m - ν _{j}}),

Ξ_{ij} = γ - \frac{λm}{m - ν _{j}} .

Ξ_{ij} = γ - \frac{λm}{m - ν _{j}} .

W_{ij} = - \frac{λ μ _{i}}{n - μ _{i}} Ξ_{ij} = γ - \frac{λn}{n - μ _{i}},

W_{ij} = - \frac{λ μ _{i}}{n - μ _{i}} Ξ_{ij} = γ - \frac{λn}{n - μ _{i}},

\left(\begin{array}[]{cc}n\boldsymbol{I}&\boldsymbol{e}\boldsymbol{e}^{T}\\ \boldsymbol{e}\boldsymbol{e}^{T}&m\boldsymbol{I}\end{array}\right)\left(\begin{array}[]{c}\boldsymbol{y}\\ \boldsymbol{z}\end{array}\right)=\left(\begin{array}[]{c}-\gamma\boldsymbol{\bar{\mu}}+n\tilde{\lambda}\boldsymbol{e}\\ -\gamma\boldsymbol{\bar{\nu}}+m\tilde{\lambda}\boldsymbol{e}\end{array}\right),

\left(\begin{array}[]{cc}n\boldsymbol{I}&\boldsymbol{e}\boldsymbol{e}^{T}\\ \boldsymbol{e}\boldsymbol{e}^{T}&m\boldsymbol{I}\end{array}\right)\left(\begin{array}[]{c}\boldsymbol{y}\\ \boldsymbol{z}\end{array}\right)=\left(\begin{array}[]{c}-\gamma\boldsymbol{\bar{\mu}}+n\tilde{\lambda}\boldsymbol{e}\\ -\gamma\boldsymbol{\bar{\nu}}+m\tilde{\lambda}\boldsymbol{e}\end{array}\right),

\left(\begin{array}[]{cc}n\boldsymbol{I}+\boldsymbol{e}\boldsymbol{e}^{T}&\boldsymbol{0}\\ \boldsymbol{0}&m\boldsymbol{I}+\boldsymbol{e}\boldsymbol{e}^{T}\end{array}\right)\left(\begin{array}[]{c}\boldsymbol{y}\\ \boldsymbol{z}\end{array}\right)=\left(\begin{array}[]{c}-\gamma\boldsymbol{\bar{\mu}}+n\tilde{\lambda}\boldsymbol{e}\\ -\gamma\boldsymbol{\bar{\nu}}+m\tilde{\lambda}\boldsymbol{e}\end{array}\right)

\left(\begin{array}[]{cc}n\boldsymbol{I}+\boldsymbol{e}\boldsymbol{e}^{T}&\boldsymbol{0}\\ \boldsymbol{0}&m\boldsymbol{I}+\boldsymbol{e}\boldsymbol{e}^{T}\end{array}\right)\left(\begin{array}[]{c}\boldsymbol{y}\\ \boldsymbol{z}\end{array}\right)=\left(\begin{array}[]{c}-\gamma\boldsymbol{\bar{\mu}}+n\tilde{\lambda}\boldsymbol{e}\\ -\gamma\boldsymbol{\bar{\nu}}+m\tilde{\lambda}\boldsymbol{e}\end{array}\right)

y = \frac{1}{n} (\tilde{λ} \frac{n ^{2}}{m + n} - γ \overset{ˉ}{μ} + γ \frac{μ ˉ ^{T} e}{m + n} e), z = \frac{1}{m} (\tilde{λ} \frac{m ^{2}}{m + n} - γ \overset{ˉ}{ν} + γ \frac{ν ˉ ^{T} e}{m + n} e) .

y = \frac{1}{n} (\tilde{λ} \frac{n ^{2}}{m + n} - γ \overset{ˉ}{μ} + γ \frac{μ ˉ ^{T} e}{m + n} e), z = \frac{1}{m} (\tilde{λ} \frac{m ^{2}}{m + n} - γ \overset{ˉ}{ν} + γ \frac{ν ˉ ^{T} e}{m + n} e) .

E [y] = \frac{n}{m + n} (\tilde{λ} - γ (1 - q)) e E [z] = \frac{m}{m + n} (\tilde{λ} - γ (1 - q)) e .

E [y] = \frac{n}{m + n} (\tilde{λ} - γ (1 - q)) e E [z] = \frac{m}{m + n} (\tilde{λ} - γ (1 - q)) e .

Pr (∣ s - ρ k ∣ > 6 max {ρ (1 - ρ) k lo g t, lo g t}) \leq 2 t^{- 6} .

Pr (∣ s - ρ k ∣ > 6 max {ρ (1 - ρ) k lo g t, lo g t}) \leq 2 t^{- 6} .

∣ \overset{μ}{ˉ}_{i} - (1 - q) n ∣

∣ \overset{μ}{ˉ}_{i} - (1 - q) n ∣

∣ \overset{ν}{ˉ}_{j} - (1 - q) m ∣

∣ \overset{ˉ}{ν}^{T} e - (1 - q) mn ∣ = ∣ \overset{ˉ}{μ}^{T} e - (1 - q) mn ∣ \leq 6 max {σ_{q}^{2} mn lo g N, lo g N}

∣ \overset{ˉ}{ν}^{T} e - (1 - q) mn ∣ = ∣ \overset{ˉ}{μ}^{T} e - (1 - q) mn ∣ \leq 6 max {σ_{q}^{2} mn lo g N, lo g N}

∣ y_{i}

∣ y_{i}

\leq 6 γ (1 + \frac{1}{m}) max {σ_{q}^{2} \frac{lo g N}{n}, \frac{lo g N}{n}}

∣ z_{i} - E [z_{i}] ∣ \leq 6 γ (1 + \frac{1}{n}) max {σ_{q}^{2} \frac{lo g N}{m}, \frac{lo g N}{m}}

∣ z_{i} - E [z_{i}] ∣ \leq 6 γ (1 + \frac{1}{n}) max {σ_{q}^{2} \frac{lo g N}{m}, \frac{lo g N}{m}}

Λ_{ij}

Λ_{ij}

\geq γ τ - 12 γ (1 + \frac{1}{m}) max {σ_{q}^{2} \frac{lo g N}{m}, \frac{lo g N}{m}}

τ = 12 (1 + \frac{1}{m}) max {σ_{q}^{2} \frac{lo g N}{m}, \frac{lo g N}{m}}

τ = 12 (1 + \frac{1}{m}) max {σ_{q}^{2} \frac{lo g N}{m}, \frac{lo g N}{m}}

Ξ_{ij} = γ - \frac{λm}{m - ν _{j}} = \frac{1}{m - ν _{j}} (γ (m q - ν_{j}) - γ m τ - \frac{m}{mn})

Ξ_{ij} = γ - \frac{λm}{m - ν _{j}} = \frac{1}{m - ν _{j}} (γ (m q - ν_{j}) - γ m τ - \frac{m}{mn})

Ξ_{ij} \geq \frac{m}{m - ν _{j}} (γ (q - p - 6 max {σ_{p}^{2} \frac{lo g N}{m}, \frac{lo g N}{m}} - τ) - \frac{1}{mn})

Ξ_{ij} \geq \frac{m}{m - ν _{j}} (γ (q - p - 6 max {σ_{p}^{2} \frac{lo g N}{m}, \frac{lo g N}{m}} - τ) - \frac{1}{mn})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pbombina/admmdsm
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Convex optimization for the densest subgraph and densest submatrix problems

Polina Bombina Department of Mathematics, University of Alabama, Tuscaloosa, Alabama, [email protected]

Brendan Ames University of Alabama, Tuscaloosa, Alabama, [email protected]

Abstract

We consider the densest $k$ -subgraph problem, which seeks to identify the $k$ -node subgraph of a given input graph with maximum number of edges. This problem is well-known to be NP-hard, by reduction to the maximum clique problem. We propose a new convex relaxation for the densest $k$ -subgraph problem, based on a nuclear norm relaxation of a low-rank plus sparse decomposition of the adjacency matrices of $k$ -node subgraphs to partially address this intractability. We establish that the densest $k$ -subgraph can be recovered with high probability from the optimal solution of this convex relaxation if the input graph is randomly sampled from a distribution of random graphs constructed to contain an especially dense $k$ -node subgraph with high probability. Specifically, the relaxation is exact when the edges of the input graph are added independently at random, with edges within a particular $k$ -node subgraph added with higher probability than other edges in the graph. We provide a sufficient condition on the size of this subgraph $k$ and the expected density under which the optimal solution of the proposed relaxation recovers this $k$ -node subgraph with high probability. Further, we propose a first-order method for solving this relaxation based on the alternating direction method of multipliers, and empirically confirm our predicted recovery thresholds using simulations involving randomly generated graphs, as well as graphs drawn from social and collaborative networks.

1 Introduction

We consider the densest $k$ -subgraph problem: given graph $G=(V,E)$ , identify the $k$ -node subgraph of $G$ of maximum density, i.e., maximum average degree. Equivalently, the problem reduces to finding the $k$ -node subgraph of $G$ with maximum number of edges. It is easy to see that the densest $k$ -subgraph problem is NP-hard by reduction to the maximum clique problem, well-known to be NP-hard [1]. Indeed, if $G$ contains a clique of size $k$ , it would induce the densest $k$ -subgraph of $G$ ; any polynomial time algorithm for densest $k$ -subgraph would immediately provide a polynomial-time algorithm for maximum clique. Moreover, it has been shown by [2, 3, 4] that the densest $k$ -subgraph problem does not admit polynomial-time approximation schemes in general. Despite this intractability, the identification of dense subgraphs plays a significant role in many practical applications, especially in the analysis of web graphs, and social and biological networks [5, 6, 7, 8, 9, 10].

We propose a new convex relaxation for the densest $k$ -subgraph problem to address this intractability. Although we do not, and should not, expect our algorithm to provide a good approximation of the densest $k$ -subgraph for all graphs, we will show that it is functionally equivalent to the densest $k$ -subgraph problem for a large class of program instances. In particular, suppose that the random input graph consists of a $k$ -node subgraph $H$ with edges added with significantly higher probability than those edges outside $H$ . We will show that if $k$ is sufficiently large then $H$ is the densest $k$ -subgraph of $G$ , and it can be recovered from the optimal solution of our convex relaxation.

This result can be thought of as a specialization of recent developments regarding the recovery of clusters in graphs. In graph clustering, one seeks to partition the nodes of a given graph into dense subgraphs. Several recent results [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], among others, have established sufficient conditions on the generative model under which dense subgraphs can be recovered in a random graph, typically from the solution of some convex relaxation. These results assume that the random graph is generated using some generalization of the stochastic block model (see [29]), which assumes that the edges are added within blocks or clusters with higher frequency than between blocks, and provide sufficient conditions on the number and relative sizes of clusters, and the probabilities of adding edges that guarantee that the underlying block structure can be recovered in polynomial-time. The recent survey article [30] provides an overview of such recovery guarantees.

Relatively few analogous results exist for the densest $k$ -subgraph problem. Ames and Vavasis [31, 32] consider convex relaxations for the maximum clique problem. Given an input graph, the maximum clique problem aims to identify the largest clique in the graph, that is, the vertex set of the largest complete subgraph (see [33, 34] for further discussion of the maximum clique problem). Ames and Vavasis [31, 32] establish that the maximum clique can be recovered from the optimal solution of particular convex relaxation if the input graph consists of a single large complete graph that is obscured by noise in the form of random edge additions and deletions. In particular, both results show that hidden cliques of size at least $\Omega(\sqrt{n})$ can be identified with high probability for $n$ -node random graphs constructed so that the probability of adding an edge between nodes in the hidden clique is significantly higher than adding other potential edges to the graph. The notation $f(x)=\Omega(g(x))$ indicates that there is some constant $C$ such that $f(x)\geq Cg(x)$ for all sufficiently large $x$ and we say that an event occurs with high probability if the probability of the event tends to $1$ polynomially as the size of the graph $N$ tends to $\infty$ . The latter result recasts the hidden clique problem as that of finding the densest $k$ -subgraph, where $k$ is the size of the hidden clique. Similar theoretical recovery guarantees can be found in [35, 36, 37, 38, 39, 40]. We delay the derivation of our convex relaxation and statement of the general recovery guarantee until Section 2.

We generalize these results for the densest subgraph problem to obtain an analogous recovery guarantee for the densest submatrix problem, which seeks to find the densest submatrix of given size. That is, we seek the submatrix of desired size with maximum number of nonzero entries. Similar results for the submatrix localization problem, where one seeks to find a block of entries with elevated mean in a random matrix, were presented in [41, 42, 43]. We will see that our convex relaxation correctly identifies the densest submatrix (of fixed size) in random matrices provided that entries within this submatrix are significantly more likely to be nonzero than an arbitrary entry of the matrix. We present our generalization of the densest subgraph problem to the densest submatrix problem and the statement of our theoretical recovery guarantees in Section 3. We provide proofs of our main results in Section 4 and conclude with discussion of a first-order method for solving our convex relaxations and empirical results illustrating efficacy of our approach in Section 5.

2 Relaxation of the Densest $k$ -Subgraph Problem and Perfect Recovery of a Planted Clique

Our relaxation hinges on the observation, made in [32], that the adjacency matrix of any subgraph of $G$ can represented as the difference of a rank-one matrix and a binary correction matrix; this observation is closely related to the sparse plus low-rank decomposition of clustered graphs first considered in [25, 44], although with the restriction to submatrices of the adjacency matrix. Let $\hat{V}\subseteq V$ be a subset of nodes of $G=(V,E)$ . We denote by $G(\hat{V})$ the subgraph induced by $\hat{V}$ ; that is, $G(\hat{V})$ is the graph with node set $\hat{V}$ and edge set given by the subset of $E$ with both endpoints in $\hat{V}$ . Let $\boldsymbol{v}\in\mathbf{R}^{V}$ be the characteristic vector of $\hat{V}$ : $v_{i}=1$ if $i\in\hat{V}$ and $v_{i}=0$ otherwise for all $i\in V$ . The matrix $\hat{\boldsymbol{X}}=\boldsymbol{v}\boldsymbol{v}^{T}$ is a rank-one binary matrix with nonzero entries indexed by $\hat{V}\times\hat{V}$ . If $G(\hat{V})$ is a complete subgraph, i.e., $ij\in E$ for all $i,j\in\hat{V}$ , then $\hat{\boldsymbol{X}}$ is equal to the sum of the adjacency matrix of $G(\hat{V})$ and the binary diagonal matrix $I_{\hat{V}}$ with nonzero entries indexed by $\hat{V}$ ; we call this sum the perturbed adjacency matrix of $G(\hat{V})$ , and denote it by $\boldsymbol{\tilde{A}_{G(\hat{V})}}$ .

If $G(\hat{V})$ is not a complete subgraph, then there is some $(i,j)\in V\times V$ , $i\neq j$ such that $ij\notin E$ . Let $\Omega$ denote the set of all such $(i,j)$ . For each $(i,j)\in\Omega$ , we have $\hat{X}_{ij}=1$ , while ${[\boldsymbol{\tilde{A}_{G(\hat{V})}}]}_{ij}=0$ . It follows that $\boldsymbol{\tilde{A}_{G(\hat{V})}}=\hat{\boldsymbol{X}}-P_{\Omega}(\hat{\boldsymbol{X}}),$ where $P_{\Omega}$ is the projection onto the set of matrices having support contained in $\Omega$ , defined by

[TABLE]

We call $(\hat{\boldsymbol{X}},\hat{\boldsymbol{Y}}):=(\hat{\boldsymbol{X}},P_{\Omega}(\hat{\boldsymbol{X}}))$ the matrix representation of the subgraph $G(\hat{V})$ . The density of $G(\hat{V})$ is given by

[TABLE]

where $\|\boldsymbol{M}\|_{0}$ denotes the number of nonzero entries of $\boldsymbol{M}$ , because $\|P_{\Omega}(\hat{\boldsymbol{X}})\|_{0}$ is equal to twice the number of nonadjacent nodes in $\hat{V}$ . Moreover, the entries of the correction matrix $P_{\Omega}(\hat{\boldsymbol{X}})$ are binary, which implies that $\|P_{\Omega}(\hat{\boldsymbol{X}})\|_{0}=\sum_{i,j\in V}{[P_{\Omega}(\hat{\boldsymbol{X}})]}_{ij}$ . Therefore, we may pose the densest $k$ -subgraph problem as the rank-constrained binary program

[TABLE]

where $\operatorname{{Tr}}:\mathbf{R}^{n\times n}\rightarrow\mathbf{R}^{n}$ denotes the matrix trace function, and $\boldsymbol{e}$ denotes the all-ones vector in $\mathbf{R}^{V}$ . Unfortunately, combinatorial optimization problems involving rank and binary constraints are intractable in general. In particular, the densest $k$ -subgraph problem is NP-hard and, hence, we cannot expect to be able to solve (1) efficiently. Relaxing the rank constraint with a nuclear norm penalty term given by $\|\boldsymbol{X}\|_{*}=\sum_{i=1}^{N}\sigma_{i}(\boldsymbol{X})$ as in [45], the binary constraints with box constraints, and ignoring the symmetry constraints yields the convex program

[TABLE]

where $\gamma>0$ is a regularization parameter chosen to control emphasis between the two objectives.

As mentioned earlier, we do not expect the solution of (2) to give a good approximation of the densest $k$ -subgraph for an arbitrary graph. We instead restrict our focus to those graphs which we can expect to to contain a single especially dense $k$ -subgraph with high probability.

Definition 2.1

We construct the edge set of a $N$ -node random graph $G=(V,E)$ as follows. Let $V^{*}\subseteq V$ be a $k$ -subset of nodes; for each $(i,j)\in V^{*}\times V^{*}$ , we add $ij$ to $E$ independently with probability $q$ . For each $(i,j)\in(V\times V)-(V^{*}\times V^{*})$ , we add $ij$ to $E$ independently with probability $p<q$ . We say such a graph $G$ is sampled from the planted dense $k$ -subgraph model.

This model, considered in [32], is a generalization of the planted clique model considered in [31], where $q$ is chosen to be $q=1$ . On the other hand, the planted dense $k$ -subgraph model is a special case of the generalized stochastic block model [44], corresponding to a graph with exactly one cluster of size $k$ and $N-k$ outlier nodes. Note that any graph $G$ sampled from the planted dense $k$ -subgraph contains a $k$ -subgraph, $G(V^{*})$ , with higher density than the rest of the graph in expectation. Our goal is to derive conditions on the size $k$ of this subgraph and the edge densities $p$ and $q$ that ensure recovery of the planted subgraph $G(V^{*})$ from the optimal solution of (2). The following theorem provides such a sufficient condition.

Theorem 2.1

Suppose that the $N$ -node graph $G=(V,E)$ is sampled from the planted dense $k$ -subgraph model with edge probabilities $q$ and $p$ respectively. Let $(\boldsymbol{X}^{*},\boldsymbol{Y}^{*})$ denote the matrix representation of the planted dense $k$ -subgraph $G(V^{*})$ . Then constants $c_{1},c_{2},c_{3}>0$ exist such that if

[TABLE]

then $(\boldsymbol{X}^{*},\boldsymbol{Y}^{*})$ is the unique optimal solution of (2) for penalty parameter

[TABLE]

and $G(V^{*})$ is the unique densest $k$ -subgraph of $G$ with high probability; here $\sigma_{q}^{2}$ and $\sigma_{p}^{2}$ are equal to the edge creation variances $q(1-q)$ and $p(1-p)$ inside and outside of the planted dense $k$ -subgraph, respectively.

Here, and in the rest of the paper, an event holding with high probability (w.h.p.), means that the event occurs with probability tending polynomially to one as $N\rightarrow\infty$ ; that is, there are scalars $\hat{c}_{1},\hat{c}_{2}>0$ such that the event occurs with probability at least $1-\hat{c}_{1}N^{-\hat{c}_{2}}$ . Note that (3) is only satisfiable when $k=\Omega((\log N)^{3/2})$ . To illustrate the contribution of Theorem 2.1 we consider a few choices of $p,q$ , and $k$ .

First, suppose that $p$ and $q$ are fixed so that the edge densities in $G$ are fixed as we vary $N$ . In this case, Theorem 2.1 states that we may recover $G(V^{*})$ , with high probability, provided that $k=\Omega\left((\log N)^{3/2}\right)$ . This bound is identical to that found many times in the planted clique literature [31, 46, 47, 48, 35], up to constants and the logarithmic term. It is widely believed that finding planted cliques of size $o(\sqrt{N})$ is intractable; indeed, several heuristic approaches have recently been proven to fail to recover planted cliques of size $o(\sqrt{N})$ in polynomial-time [3, 2, 4] and this intractability has been exploited in cryptographic applications [49].

On the other hand, our bound shows that planted cliques of size much smaller than $\sqrt{N}$ can be recovered in the presence of sparse noise. This should not be seen as a proof that we can recover planted cliques of size $o(\sqrt{N})$ in general, but rather evidence of the intimate relationship between the size of hidden cliques recoverable and the noise obscuring them. If very little noise in the form of diversionary edges is hiding the signal, here the planted clique, we should expect the signal to be significantly easier to recover. This is reflected in the fact that we can recover significantly smaller cliques than $o(\sqrt{N})$ in this setting. For example, let $q$ be a fixed constant and let $p$ vary with $N$ such that $p\leq\log N/N$ . The probability of adding an edge outside of $G(V^{*})$ tends to zero as $k,N\rightarrow\infty$ . Further, the left-hand side of (3) tends to $q$ as $N\rightarrow\infty$ , and the dominant term in the right-hand side is $(\log N)^{3/2}/k$ . This implies that we can have exact recovery of the hidden clique w.h.p. provided $k=\Omega\left((\log N)^{3/2}\right)$ . This lower bound on the size of recoverable $k$ -subgraph matches that for identifying clusters in sparse graphs provided in the graph clustering literature, albeit for the case where the graph contains the single cluster indexed by $V^{*}$ surrounding by many outlier nodes (see [30] and the references within). Moreover, this lower bound improves significantly upon that given by [32], where it is shown to that $k=\Omega(N^{1/3})$ is sufficient for exact recovery w.h.p. in the presence of sparse noise.

3 The Densest Submatrix Problem

The densest $k$ -subgraph problem is a specialization of the far more general densest submatrix problem. Let ${[M]}=\{1,2,\dots,M\}$ for each positive integer $M$ . Given a matrix $\boldsymbol{A}\in\mathbf{R}^{M\times N}$ , the densest $m\times n$ -submatrix problem seeks subsets $\bar{U}\subseteq{[M]}$ and $\bar{V}\subseteq{[N]}$ of cardinality $|\bar{U}|=m$ and $|\bar{V}|=n$ , respectively, such that the submatrix $\boldsymbol{A}{[\bar{U},\bar{V}]}$ with rows index by $\bar{U}$ and columns indexed by $\bar{V}$ contains the maximum number of nonzero entries. It should be clear that this specializes immediately to the densest $k$ -subgraph problem when the input matrix is the perturbed adjacency matrix $\boldsymbol{A}=\boldsymbol{A}_{G}+\boldsymbol{I}$ of the input graph and $m=n=k$ . However, the densest $m\times n$ -submatrix problem allows far more flexible problem settings.

For example, the densest submatrix problem also specializes immediately to the maximum edge/density biclique problem. Let $G=(U,V,E)$ be a bipartite graph. Given integer $m,n$ , the decision version of the maximum edge biclique problem determines if $G$ contains an $m\times n$ biclique, i.e., whether there are vertex sets $\bar{U}\subseteq U$ , $\bar{V}\subseteq V$ of cardinality $|\bar{U}|=m$ and $|\bar{V}|=n$ such that each vertex in $\bar{U}$ is adjacent to every vertex in $\bar{V}$ . This problem immediately specializes to the densest $m\times n$ -subgraph problem with $\boldsymbol{A}$ equal to the $(U,V)$ -block of the adjacency matrix of $G$ . Similar specializations exist for finding the densest subgraph in directed graphs, hypergraphs, and so on.

Let $\Omega$ denote the index set of zero entries of given matrix $\boldsymbol{A}\in\mathbf{R}^{M\times N}$ . Without loss of generality, we may assume that the entries of $\boldsymbol{A}$ are binary. If not, then we may replace $\boldsymbol{A}$ with the binary matrix having the same sparsity pattern without changing the index set of the densest $m\times n$ -submatrix. We would like to obtain a rank-one matrix $\boldsymbol{X}$ with $mn$ nonzero entries with minimum number of disagreements $\boldsymbol{A}$ on $\Omega$ :

[TABLE]

where $\boldsymbol{Y}$ is used to count the number of disagreements between $\boldsymbol{A}$ and $\boldsymbol{X}$ . Relaxing binary and rank constraints as before, we obtain the convex relaxation

[TABLE]

where $\gamma>0$ is a regularization parameter chosen to tune between the two objectives. As before, we should expect to recover the solution of (5) from that of (6) when $\boldsymbol{A}$ contains a single large dense $m\times n$ block. The following definition proposes a class of random matrices with this property.

Definition 3.1

We construct an $M\times N$ random binary matrix $\boldsymbol{A}$ as follows. Let $U^{*}\subseteq{[M]}$ and $V^{*}\subseteq{[N]}$ be $m$ and $n$ -index sets. For each $i\in U^{*}$ and $j\in V^{*}$ , we let $a_{ij}=1$ with probability $q$ and [math] otherwise. For each remaining $(i,j)$ we set $a_{ij}=1$ with probability $p<q$ and take $a_{ij}=0$ otherwise. We say such a matrix $\boldsymbol{A}$ is sampled from the planted dense $m\times n$ -submatrix model.

The following theorem provides a sufficient condition for exact recovery of a planted dense $m\times n$ -submatrix generalizing the analogous result for recovery of a planted dense $k$ -subgraph given by Theorem 2.1.

Theorem 3.1

Suppose that the matrix $\boldsymbol{A}\in\mathbf{R}^{M\times N}$ is sampled from the planted dense $m\times n$ -subgraph model with edge probabilities $q$ and $p$ , respectively, with rows and columns of the planted dense subgraph indexed by $U^{*}$ and $V^{*}$ , respectively. Let $(\boldsymbol{X}^{*},\boldsymbol{Y}^{*})$ denote the matrix representation of $\boldsymbol{A}(U^{*},V^{*})$ . Let $N_{\max}:=\max\{M,N\}$ and $n_{\min}:=\min\{m,n\}$ . Then there are constants $c_{1},c_{2},c_{3}>0$ such that if

[TABLE]

then $(\boldsymbol{X}^{*},\boldsymbol{Y}^{*})$ is the unique optimal solution of (6) for penalty parameter $\gamma=t/((q-p)n_{\min})$ for all $c_{2}\leq t\leq c_{3}$ , and $\boldsymbol{A}(U^{*},V^{*})$ is the unique densest $m\times n$ -submatrix of $\boldsymbol{A}$ with high probability.

In the case when $M=N$ and $m=n$ , the inequality (7) specializes to (3), although the constants $c_{1},c_{2},c_{3}$ should differ due to the lack of an assumption of symmetry of $\boldsymbol{X}^{*}$ and $\boldsymbol{Y}^{*}$ in Theorem 3.1.

4 Derivation of the Recovery Guarantees

This section will consist of a proof of Theorem 3.1. The proof of Theorem 2.1 is identical except for minor modifications due to the symmetry of $\boldsymbol{A}$ . We begin with the following theorem, which provides the required optimality conditions for (6).

Theorem 4.1

Let $\bar{U}\subseteq\{1,\dots,M\}$ be a $m$ -subset of $[M]$ and let $\bar{V}\subseteq\{1,\dots,N\}$ be a $n$ -subset of $[N]$ , and $\bar{\boldsymbol{u}},\bar{\boldsymbol{v}}$ be their characteristic vectors. Then the solutions $\bar{\boldsymbol{X}}=\bar{\boldsymbol{u}}\bar{\boldsymbol{v}}^{T}$ and $\bar{\boldsymbol{Y}}=P_{\Omega}(\bar{\boldsymbol{X}})$ are optimal for (6) if and only if there are dual multipliers $\lambda\geq 0$ , $\boldsymbol{\Lambda}\in\mathbf{R}^{M\times N}_{+}$ , $\boldsymbol{\Xi}\in\mathbf{R}^{M\times N}_{+}$ , and $\boldsymbol{W}\in\mathbf{R}^{M\times N}$ satisfying

[TABLE]

The proof of Theorem 4.1 is nearly identical to that of [32, Theorem 4.1] and is omitted. Suppose that $\boldsymbol{A}$ is sampled from the planted dense $(m,n)$ -subgraph model with edge probabilities $q>p$ . Our goal is to establish the conditions on $m,n,q,p$ given by Theorem 3.1 that guarantee exact recovery (w.h.p.) of the matrix representation $(\bar{\boldsymbol{X}},\bar{\boldsymbol{Y}})$ of the planted submatrix with rows and columns given by $\bar{U}$ and $\bar{V}$ respectively. Our approach follows that of [32, Section 4]. We first explicitly construct dual multipliers $\boldsymbol{W}$ and $\boldsymbol{\Xi}$ using the dual feasibility condition (8a) and the complementary slackness conditions (8b) and (8c). We then use the characterization of the subdifferential of the nuclear norm (8d) to construct the remaining dual variables $\lambda,\boldsymbol{\Lambda}$ . We conclude the proof by using concentration inequalities to establish feasibility of the proposed dual variables under the hypothesis of Theorem 2.1.

We choose $\boldsymbol{W}$ and $\boldsymbol{\Xi}$ according to the dual feasibility condition (8a) so that the orthogonality conditions $\boldsymbol{W}\bar{\boldsymbol{v}}=\boldsymbol{0}$ and $\boldsymbol{W}^{T}\bar{\boldsymbol{u}}=\boldsymbol{0}$ are satisfied. We consider the following cases.

Case 1. If $(i,j)\in\bar{U}\times\bar{V}-\Omega$ , then (8a) implies that

[TABLE]

if we take $\Xi_{ij}=\gamma$ and define $\tilde{\lambda}:=\lambda-1/\sqrt{mn}.$

Case 2. If $i\in\bar{U}$ , $j\in\bar{V}$ , and $(i,j)\in\Omega$ , then we have $\bar{X}_{ij}=\bar{Y}_{ij}=1/\sqrt{mn}$ , so $\Xi_{ij}=0$ by (8c). It follows that $W_{ij}=\tilde{\lambda}-\gamma-\Lambda_{ij}$ in this case.
Case 3. If $(i,j)\notin\bar{U}\times\bar{V}$ such that $(i,j)\notin\Omega$ then we take $W_{ij}=\lambda$ and $\Xi_{ij}=\gamma$ .
Case 4. If $i\notin\bar{U}$ , $j\notin\bar{V}$ such that $(i,j)\in\Omega$ , we take $W_{ij}=-\lambda p/(1-p)$ and $\Xi_{ij}=\gamma-\lambda/(1-p)$ .
Case 5. If $i\in\bar{U}$ and $j\notin\bar{V}$ such that $ij\in\Omega$ , we take

[TABLE]

where $\nu_{j}$ denotes the number of nonzero entries in the $j$ th column of $\boldsymbol{A}$ indexed by rows in $\bar{U}$ . so that ${[\boldsymbol{W}^{T}\bar{\boldsymbol{u}}]}_{j}=0$ . By our choice of $W_{ij}$ , we have

[TABLE]

Case 6. If $i\notin\bar{U}$ , $j\in\bar{V}$ , and $(i,j)\in\Omega$ then we take

[TABLE]

where $\mu_{i}$ denotes the number nonzero entries in the $i$ th row of $\boldsymbol{A}$ indexed by columns in $\bar{V}$ .

By our choice of $\boldsymbol{W}$ and $\boldsymbol{\Xi}$ , we have ${[\boldsymbol{W}\bar{\boldsymbol{v}}]}_{i}=0$ for all $i\notin\bar{U}$ and ${[\boldsymbol{W}^{T}\bar{\boldsymbol{u}}]}_{i}=0$ for all $i\notin\bar{V}$ . We choose the remaining dual variables $\lambda$ and $\boldsymbol{\Lambda}$ so that ${[\boldsymbol{W}\bar{\boldsymbol{v}}]}_{i}=0$ for all $i\in\bar{U}$ and ${[\boldsymbol{W}^{T}\bar{\boldsymbol{u}}]}_{i}=0$ for all $i\in\bar{V}$ .

The orthogonality conditions $\boldsymbol{W}^{T}\bar{\boldsymbol{u}}=\boldsymbol{0}$ and $\boldsymbol{W}\bar{\boldsymbol{v}}=\boldsymbol{0}$ define a linear system with $m+n$ equations for the $mn$ unknown entries of $\boldsymbol{\Lambda}$ when all other dual variables are fixed. To obtain a particular solution of this underdetermined linear system, we make the additional assumption that $\boldsymbol{\Lambda}(\bar{U},\bar{V})$ has rank at most 2, taking the form $\boldsymbol{\Lambda}(\bar{U},\bar{V})=\boldsymbol{y}\boldsymbol{e}^{T}+\boldsymbol{e}\boldsymbol{z}^{T}$ for some $\boldsymbol{y}\in\mathbf{R}^{m}$ and $\boldsymbol{z}\in\mathbf{R}^{n}$ . Under this assumption, the conditions ${[\boldsymbol{W}\boldsymbol{v}]}_{i}=0$ , $i\in\bar{U}$ and ${[\boldsymbol{W}^{T}{\boldsymbol{u}}]}_{j}=0$ , $j\in\bar{V}$ yield the linear system

[TABLE]

where the vectors ${\boldsymbol{\bar{\mu}}}$ and ${\boldsymbol{\bar{\nu}}}$ are defined by $\bar{\mu}_{i}=n-\mu_{i}$ for all $i\in\bar{U}$ and $\bar{\nu}_{j}=m-\nu_{j}$ for all $j\in\bar{V}$ . It is easy to see that this system is singular with null space spanned by $(\boldsymbol{e};-\boldsymbol{e})$ . However, it is also easy to see that the unique solution of

[TABLE]

is a solution of (9); see [32, Section 4.2] for further details. Applying the Sherman-Morrison-Woodbury Formula (see [50, Equation (2.1.4)]), we have

[TABLE]

The entries of ${\boldsymbol{\bar{\mu}}}$ and ${\boldsymbol{\bar{\nu}}}$ are binomial random variables corresponding to $n$ and $m$ independent Bernoulli trials ith probability of success $1-q$ , respectively. Therefore, we have

[TABLE]

Choosing $\lambda=\frac{1}{\sqrt{mn}}+\gamma(1-q)+\gamma\tau$ for some $\tau>0$ to be chosen later ensures that the entries of $\boldsymbol{\Lambda}$ are strictly positive in expectation.

We next describe how to choose $\tau$ so that the entries of $\boldsymbol{y}$ and $\boldsymbol{z}$ are positive with high probability. To do so, we will make repeated use of the following specialization of the classical Bernstein inequality to bound the sum of independent Bernoulli random variables (see, for example,[51, Section 2.8]).

Lemma 4.1

Let $x_{1},\dots,x_{k}$ be a sequence of $k$ independent $\{0,1\}$ Bernoulli random variables, each with probability of success $\rho$ . Let $s=\sum_{i=1}^{k}x_{i}$ be the binomially distributed random variable denoting the number of successes. Then

[TABLE]

Applying (13) with $t=N$ to each component of ${\boldsymbol{\bar{\mu}}}$ and ${\boldsymbol{\bar{\nu}}}$ and the union bound shows that

[TABLE]

for all $i\in\bar{U}$ and $j\in\bar{V}$ w.h.p., where $\sigma_{q}^{2}=q(1-q)$ . On the other hand, ${\boldsymbol{\bar{\nu}}}^{T}\boldsymbol{e}={\boldsymbol{\bar{\mu}}}^{T}\boldsymbol{e}$ is equal to the number of nonzero entries in the $\bar{U}\times\bar{V}$ block of $\boldsymbol{A}$ . Therefore, ${\boldsymbol{\bar{\nu}}}^{T}\boldsymbol{e}={\boldsymbol{\bar{\mu}}}^{T}\boldsymbol{e}$ is a binomially distributed random variable, with $\mathbb{E}[{\boldsymbol{\bar{\nu}}}^{T}\boldsymbol{e}]=\mathbb{E}[{\boldsymbol{\bar{\mu}}}^{T}\boldsymbol{e}]=mn(1-q)$ . Applying (13) with $t=N$ again establishes that

[TABLE]

w.h.p. It follows immediately that

[TABLE]

for each $i\in\bar{U}$ if (14) and (16) are satisfied. Following an identical argument, we see that

[TABLE]

if (15) and (16) hold. Substituting (17) and (18) into the formula for $\Lambda_{ij}$ shows that

[TABLE]

for all $i\in\bar{U}$ , $j\in\bar{V}$ w.h.p.; here, we use the assumption that $m\leq n$ . Choosing

[TABLE]

ensures that the entries of $\boldsymbol{\Lambda}$ are nonnegative w.h.p.

4.1 Nonnegativity of $\boldsymbol{\Xi}$

We next establish conditions on the regularization parameter $\gamma$ ensuring that the entries of the dual variable ${\boldsymbol{\Xi}}$ are nonnegative. Recall that $\Xi_{ij}$ takes value [math] or $\gamma$ for all $(i,j)$ except those corresponding to Cases 4 through 6 in the choice of $\boldsymbol{W}$ and ${\boldsymbol{\Xi}}$ .

We begin with Case 5 in the construction of $\boldsymbol{W}$ and ${\boldsymbol{\Xi}}$ . Recall that

[TABLE]

if $i\in\bar{U}$ and $j\notin\bar{V}$ such that $ij\in\Omega$ . Since $\nu_{j}$ is a binomial random variable corresponding to $m$ independent Bernoulli trials with probability of success $p$ , applying Bernstein’s inequality (13) shows that $\nu_{j}\geq pm+6\max\{\sqrt{\sigma_{p}^{2}m\log N},\log N\}$ , and, hence,

[TABLE]

w.h.p., where $\sigma_{p}^{2}:=p(1-p)$ . Under the gap assumption

[TABLE]

and the choice of $\tau$ given by we see that

[TABLE]

w.h.p. for some constant $\tilde{c}\geq 3$ . An identical bound holds for entries of ${\boldsymbol{\Xi}}$ corresponding to Case 6 by symmetry. Finally, the bound for Case 4 follows by substituting $\nu_{i}=pm$ in (20) which establishes that $\Xi_{ij}\geq 0$ if $\gamma(q-p)\geq 3/\sqrt{mn}$ in this case. Applying the union bound over all entries in ${\boldsymbol{\Xi}}$ establishes that ${\boldsymbol{\Xi}}$ is nonnegative w.h.p. if $q$ and $p$ satisfy the gap assumption (21) and

[TABLE]

4.2 A bound on the matrix $\boldsymbol{W}$

To complete the proof, we derive a sufficient condition involving $m,n,M,N,p$ , and $q$ that ensures that $\boldsymbol{W}$ , as constructed above, satisfies $\|\boldsymbol{W}\|<1$ with high probability. To simplify our notation, we again make the assumption that $m\leq n$ and $M\leq N$ . Our analysis will translate superficially to the cases when $m\leq n$ and $M\geq N$ , $m\geq n$ and $M\leq N$ , and $m\geq n$ and $M\geq N$ . We bound $\|\boldsymbol{W}\|$ using the triangle inequality and the decomposition $\boldsymbol{W}=\gamma{\boldsymbol{Q}}+\lambda\boldsymbol{S}$ , where $\gamma Q_{ij}=W_{ij}$ if $i\in\bar{U}$ , $j\in\bar{V}$ and $\gamma Q_{ij}=0$ otherwise. To bound the norms of ${\boldsymbol{Q}}$ and $\boldsymbol{S}$ , we will make repeated use of the following bound on the norm of a random matrix. Specifically, Lemma 4.2 is a special case of the matrix concentration inequality given by [52, Corollary 3.11] on the spectral norm of matrices with i.i.d. mean zero bounded entries.

Lemma 4.2

Let $\boldsymbol{A}=[a_{ij}]\in\mathbf{R}^{m\times n}$ be a random matrix with i.i.d. mean zero entries $a_{ij}$ having variance $\sigma^{2}$ and satisfying $|a_{ij}|\leq B$ . Let $n_{\max}=\max\{m,n\}$ . Then there is a constant $c>0$ such that

[TABLE]

for all $t>0$ .

The following lemma provides the desired bound on $\|{\boldsymbol{Q}}\|$ .

Lemma 4.3

Suppose that the matrix $\boldsymbol{W}$ is constructed according to Cases 1 through 6 for a matrix $\boldsymbol{A}$ sampled from the planted dense submatrix model with $m\leq n$ and $M\leq N$ . Then there is a constant $C_{Q}>0$ such that

[TABLE]

with high probability.

We delay the proof of Lemma 4.3 until Appendix A. The following lemma provides the required bound on $\|\boldsymbol{S}\|$ . Our analysis follows a similar argument to that of [31, Section 4.2]; we include it here for completeness.

Lemma 4.4

Suppose that the matrix $\boldsymbol{W}$ is constructed according to Cases 1 through 6 for a matrix $\boldsymbol{A}$ sampled from the planted dense submatrix model with $m\leq n$ and $M\leq N$ . Let $\tilde{\sigma}^{2}_{p}:=p/(1-p)$ and $B:=\max\{1,\tilde{\sigma}^{2}_{p}\}$ . Assume that $p$ is bounded away from $1$ so that $B=O(1)$ . Then exists constant $C_{S}>0$ such that

[TABLE]

with high probability.

It follows immediately from Lemmas 4.3 and 4.4 that

[TABLE]

w.h.p. On the other hand,

[TABLE]

where we obtain the first inequality by substituting the choice of $\gamma$ given by (22) and the upper bound $\tau\leq 3(q-p)/2\leq\tilde{c}(q-p)/2$ . We obtain the last inequality using the fact that $(1/\tilde{c}+1/2)(q-p)\leq q-p$ . Further, we have

[TABLE]

Substituting back into (24), we see that

[TABLE]

w.h.p. Enforcing $q-p$ so that (21) holds and the right-hand side of (25) is bounded above by $1$ establishes Theorem 3.1. This completes the proof.

5 A First-Order Method Based on the Alternating Direction Method of Multipliers

We conclude with discussion of an optimization algorithm for solution of (6) based on the alternating direction method of multipliers (ADMM); see [53] for details regarding the ADMM. We first present a derivation of the method and then empirically validate its performance using randomly generated matrices and real-world collaboration and communication networks.

5.1 The Optimization Algorithm

To apply the ADMM to (2), we first introduce artificial variables ${\boldsymbol{Q}}$ , $\boldsymbol{W}$ , $\boldsymbol{Z}$ to obtain the equivalent convex optimization problem

[TABLE]

where $\Omega_{Q},\Omega_{W},\Omega_{Z}$ denote the constraint sets

[TABLE]

Here, $\mathbbm{1}_{\boldsymbol{S}}:\mathbf{R}^{M\times M}\rightarrow\left\{0,+\infty\right\}$ is the indicator function of the set $S\subseteq\mathbf{R}^{M\times N}$ , such that $\mathbbm{1}_{S}(\boldsymbol{X})=0$ if $\boldsymbol{X}\in S$ , and $+\infty$ otherwise. We solve (26) iteratively using the ADMM. Specifically, we update each primal variable by minimizing the augmented Lagrangian in Gauss-Seidel fashion with respect to each primal variable. Then the dual variables are updated using the updated primal variables. The augmented Lagrangian of (26) is given by

[TABLE]

where $\tau$ is a regularization parameter chosen so that $L_{\tau}$ is strongly convex in each primal variable. Minimization of the augmented Lagrangian with respect to each of the artificial primal variables $\boldsymbol{Q},\boldsymbol{W}$ and $\boldsymbol{Z}$ is equvalent to projection onto each of the sets $\Omega_{Q},\Omega_{W}$ and $\Omega_{Z}$ ; each of these projections has an analytic expression.

We update $\boldsymbol{Y}$ using projection onto the nonnegative cone: $P_{\mathbf{R}^{M\times N}_{+}}(\boldsymbol{M})$ is the matrix with $ij$ th entry $m_{ij}$ if $m_{ij}\geq 0$ and [math] otherwise. On the other hand, we update $\boldsymbol{X}$ using the proximal function for the nuclear norm $\|\cdot\|_{*}$ , which can be computed by applying the soft thresholding operator defined by $S_{\phi}(\boldsymbol{x})=\textrm{sign}(\boldsymbol{x})\max\left\{|\boldsymbol{x}|-\phi\boldsymbol{e},\boldsymbol{0}\right\}$ to the vector of singular values. Here, $\textrm{sign}(\boldsymbol{x})$ is the vector whose entries are the signs of the corresponding entries of $\boldsymbol{x}$ , $|\boldsymbol{x}|$ denotes the vector whose entries are the magnitudes of the corresponding entries of $\boldsymbol{x}$ , and the maximum denotes the vector of pairwise maximums. We declare the algorithm to have converged when the primal and dual residuals $\|\boldsymbol{X}^{l+1}-\boldsymbol{W}^{l+1}\|_{F},\|\boldsymbol{X}^{l+1}-\boldsymbol{Z}^{l+1}\|_{F},\|\boldsymbol{W}^{l+1}-\boldsymbol{W}^{l}\|_{F},\|\boldsymbol{Z}^{l+1}-\boldsymbol{Z}^{l}\|_{F}$ , and $\|\boldsymbol{Q}^{l+1}-\boldsymbol{Q}^{l}\|_{F}$ are smaller than a desired error tolerance. The steps of the algorithm are summarized in Algorithm 1111A MATLAB implementation of Algorithm 1 is available from http://bpames.people.ua.edu/software..

5.2 Random matrices

We empirically verified the theoretical phase transitions provided by Theorem 3.1 using matrices randomly sampled from the planted dense subgraph model with fixed noise edge probability $p$ and varied the submatrix size $n$ and in-submatrix probability $q$ . We perform two sets of experiments: one where the matrix is sparse outside the planted submatrix and another when the noise obscuring the planted submatrix is relatively dense. For the dense graph simulations, we choose $p=0.25$ and $q\in\{0.25,0.30,\dots,0.95,1\}$ . In the sparse experiments, we choose $p=1/\sqrt{N}$ and $q=tp$ for ten equally spaced $t$ spanning the interval $[2,\sqrt{N}]$ . For each set of simulations, we vary $n\in\{10,20,30,\dots,240,250\}$ and set $m=2n$ . In the sparse experiments, we have $M=N=1000$ and we use $M=N=500$ in dense experiments. In both the dense and sparse graph simulations, we generate $10$ matrices according to the planted dense submatrix model for each choice of the parameters $q$ and $n$ (with remaining parameters $p$ , $M$ , and $N$ chosen as described above). We call Algorithm 1 to solve the instance of (6) corresponding to each randomly sampled matrix. The regularization parameter $\gamma=6/(q-p)n$ , augmented Lagrangian parameter $\tau=0.35$ , and stopping tolerance $\epsilon=10^{-4}$ are used in each call of Algorithm 1. We declared the planted submatrix to be recovered if the relative error $\|\boldsymbol{X}^{*}-\boldsymbol{X}^{0}\|_{F}/\|\boldsymbol{X}^{0}\|_{F}$ is less than $10^{-3}$ , where $\boldsymbol{X}^{*}$ is the solution returned by Algorithm 1 and $\boldsymbol{X}^{0}$ is the matrix representation of the planted submatrix. The empirical probability of recovery of planted submatrix is plotted in Figure 1. Color of a square indicates rate of recovery in the corresponding simulations, with black corresponding to [math] and white corresponding to $10$ recoveries out of $10$ trials. The dashed curves show the phase transition to perfect recovery predicted by Theorem 3.1. The empirical recovery rates observed in these trials closely matches that predicted by Theorem 3.1. The discrepancy between the observed phased transition and that more conservatively predicted by Theorem 3.1 is due to the presence of the logarithmic terms in (3); a slight modification of our proof to follow that of [31, Theorem 7] eliminates these terms when $p$ and $q$ are constants and the gap $q-p$ is sufficiently large.

5.2.1 Collaboration and Communication Networks

We also applied our algorithm to identify communities in networks taken from the 10th DIMACs Implementation Challenge, which focused on graph partitioning and clustering [54, 55]. The first graph (JAZZ) represents a collaboration network with $198$ musicians and $2742$ edges, and was compiled by [56]. Here, two musicians are connected if they have performed together. Earlier studies [10] showed that this network contains a cluster of $100$ musicians. We apply Algorithm 1 to the adjacency matrix of this network with regularization parameter $\tau=0.85$ , stopping tolerance $\epsilon=10^{-2}$ , and $m=n=100$ . Our algorithm converges to the dense submatrix representing this community after $50$ iterations. Figure 2 is a visualization of this network using the software package Gephi [57] and the ForceAtlas2 algorithm [58]. The statistics function of Gephi is used to identify three communities within this network, including the community of size $n=100$ identified by Algorithm 1.

We also consider the graph (EMAIL) representing the network of e-mail interchanges between faculty, researchers, technicians, managers, administrators, and graduate students of the Univeristy Rovira i Virgili (Tarragona). Two individuals are connected if they exchanged an email. There are $1133$ nodes and $5451$ edges. From [59], we know that the EMAIL graph has a dense subgraph of $289$ vertices, representing a community of $289$ ; additionally, we can identify $7$ clusters using the statistics function of Gephi corresponding to academic units within this university, including this community. Applying Algorithm 1 with $m=n=289$ , $\tau=0.35$ , and stopping criteria $\epsilon=10^{-2}$ finds this subgraph in $15$ iterations. The results of these analyses are summarized in Table 1.

6 Conclusions

We have presented an analysis of new convex relaxations for the densest subgraph and submatrix problems and have established sufficient conditions under which the optimal solution of original combinatorial problem coincides with that of these relaxations. In particular, these sufficient conditions characterize a signal-to-noise (SNR) ratio for matrices sampled from a particular distribution of random matrices, such that if this ratio is sufficiently large then we can expect to recover the combinatorial solution from the solution of the relaxation. Here, we expect perfect recovery if the strength of signal, as measured by the gap between probabilities of existence of nonzero within-group edges and out-group entries, is sufficiently larger than noise, as measured by variability of presence of nonzero entries. Further, the SNR corresponding to this phase transition to perfect recovery matches the current state of the art identified in the previous literature (up to constant and logarithmic terms); see [43].

This recovery guarantee provides a sufficient condition for perfect recovery of the planted subgraph or submatrix. It would be very interesting to determine if this condition is also necessary. For example, we establish that we have perfect recovery of planted dense $k$ -submatrices and subgraphs if $k\geq\Omega(N)$ when the probabilities $q$ , $p$ are sufficiently large constants. It is unclear if it is possible, either using our relaxation or some other method, to efficiently recover planted submatrices and subgraphs of size $O(n^{1/2-\epsilon})$ for some $\epsilon>0$ .

A secondary open problem focuses on efficient solution of the proposed convex relaxations. We currently solve these problems using a multi-block variant of the ADMM. Each iteration of this algorithm requires $O(N^{3})$ arithmetic operations; the bulk of these operations are used by the calculation of the singular value decomposition used to update $\boldsymbol{X}$ . This per-iteration cost scales unfavorably when $N$ is large. The recent manuscript by Sotirov [60] proposed a coordinate descent heuristic for the densest $k$ -subgraph, with empirical evidence that this heuristic efficiently solves large-scale instances of the densest $k$ -subgraph problem. However, sufficient conditions for perfect recovery of a planted dense submatrix have not yet been established for this method has not been. Further research is needed to design efficient and scalable algorithms, i.e., with per-iteration cost $O(N)$ , with provable theoretical guarantees of recovery for the solution of the densest subgraph and submatrix problems.

Acknowledgements. B. Ames was supported by University of Alabama Research Grants RG14678 and RG14838.

Appendix A Proof of Lemma 4.3

The proof is virtually identical to that of [32, Lemma 4.5]. We decompose ${\boldsymbol{Q}}$ as ${\boldsymbol{Q}}=\lambda\boldsymbol{e}\boldsymbol{e}^{T}-\boldsymbol{H}+\boldsymbol{y}\boldsymbol{e}^{T}+\boldsymbol{e}\boldsymbol{z}^{T}$ , where $\boldsymbol{H}$ is matrix defined by $H_{ij}=1$ if $ij\in\Omega$ and $H_{ij}=0$ otherwise. We can further decompose ${\boldsymbol{Q}}$ as ${\boldsymbol{Q}}={\boldsymbol{Q}}_{1}+{\boldsymbol{Q}}_{2}+{\boldsymbol{Q}}_{3}+{\boldsymbol{Q}}_{4},$ where ${\boldsymbol{Q}}_{1},{\boldsymbol{Q}}_{2},{\boldsymbol{Q}}_{3},{\boldsymbol{Q}}_{4}$ are constructed as below.

We first bound ${\boldsymbol{Q}}_{1}:=(1-q)\boldsymbol{e}\boldsymbol{e}^{T}-\boldsymbol{H}$ . Note that ${\boldsymbol{Q}}_{1}$ has i.i.d. mean-zero entries, with variance $\sigma^{2}=\sigma_{q}^{2}$ and values either $1-q$ with probability $q$ or $-q$ with probability $1-q$ . Applying (23) with $B=1$ and $t=N$ ,

[TABLE]

w.h.p. Next, we let ${\boldsymbol{Q}}_{2}:=\frac{1}{n}({\boldsymbol{\bar{\mu}}}\boldsymbol{e}^{T}-(1-q)n\boldsymbol{e}\boldsymbol{e}^{T})$ . Note that

[TABLE]

Applying (14) shows that $\bar{\mu}_{i}-(1-q)n\leq 6\max\left\{\sqrt{\sigma_{q}^{2}n\log N},\log N\right\}$ for all $i\in\bar{U}$ w.h.p. It follows that

[TABLE]

w.h.p. Next, let $Q_{3}:=\frac{1}{m}\boldsymbol{e}{\boldsymbol{\bar{\nu}}}^{T}-(1-q)\boldsymbol{e}\boldsymbol{e}^{T}$ . An identical argument shows that

[TABLE]

w.h.p. Finally, we let

[TABLE]

It is easy to confirm that $\gamma({\boldsymbol{Q}}_{1}+{\boldsymbol{Q}}_{2}+{\boldsymbol{Q}}_{3}+{\boldsymbol{Q}}_{4})=\boldsymbol{W}(\bar{U},\bar{V})$ . Applying (16) shows that

[TABLE]

w.h.p. Combining (27), (28), (29), and (30) establishes that

[TABLE]

w.h.p., as required.

Appendix B Proof of Lemma 4.4

To obtain the desired bound on $\boldsymbol{S}$ , we first approximate $\boldsymbol{S}$ with a random matrix with mean zero entries. In particular, we let $\tilde{\boldsymbol{S}}_{1}$ be the random matrix constructed as follows. For all $(i,j)\notin\bar{U}\times\bar{V}$ such that $ij\notin\Omega$ , or $(i,j)\in([M]-\bar{U})\times([N]-\bar{V})$ such that $(i,j)\in\Omega$ , we let ${[\tilde{\boldsymbol{S}}_{1}]}_{ij}=S_{ij}$ . All remaining entries of $\tilde{\boldsymbol{S}}_{1}$ are sampled independently from the generalized Bernoulli distribution $\mathcal{B}$ , where $x$ sampled from $\mathcal{B}$ satisfy

[TABLE]

Note that $\tilde{\boldsymbol{S}}_{1}$ is a random matrix with i.i.d. mean zero entries sampled independently from $\mathcal{B}$ by our choice of $\boldsymbol{W}$ . Applying Lemma 4.2 shows that

[TABLE]

w.h.p. The remainder of the proof establishes that $\boldsymbol{S}$ is well-approximated by $\tilde{\boldsymbol{S}}_{1}$ , i.e., we complete the proof by bounding the norm of the error $\boldsymbol{S}-\tilde{\boldsymbol{S}}_{1}$ . We begin with the error in the $\bar{U}\times\bar{V}$ block. Let $\tilde{\boldsymbol{S}}_{2}=-\tilde{\boldsymbol{S}}_{1}(\bar{U},\bar{V})$ . Applying Lemma 4.2 with $t=N$ again shows that

[TABLE]

w.h.p. We define $\tilde{\boldsymbol{S}}_{3}$ by

[TABLE]

To bound the norm of $\tilde{\boldsymbol{S}}_{3}$ , we will use the following lemma, which provides a bound on the spectral norm of random matrices of this form.

Lemma B.1

Let $\boldsymbol{A}$ be an $n\times N$ matrix whose entries are chosen according to $\mathcal{B}$ with $n\leq N$ . Let $\tilde{\boldsymbol{A}}$ be the random matrix defined by

[TABLE]

where $n_{j}$ is the number of $1$ ’s in the $j$ th column of $\boldsymbol{A}$ . Then there are constants $c_{1},c_{2}>0$ such that

[TABLE]

The proof of Lemma B.1 follows a similar argument to that of [31, Theorem 4] and is included as Appendix C. It is easy to see that the nonzero block $\tilde{\boldsymbol{S}}_{3}$ has form $\boldsymbol{A}-\tilde{\boldsymbol{A}}$ as in the hypothesis of Lemma B.1. It follows that

[TABLE]

w.h.p. Similarly, we define the final correction matrix by

[TABLE]

Applying Lemma B.1 to the transpose of $\tilde{\boldsymbol{S}}_{4}$ we have

[TABLE]

w.h.p. Combining (31), (32), (33), and (34) and applying the union bound completes the proof.

Appendix C Proof of Lemma B.1

The result relies on an application of the Matrix Bernstein Inequality; see [61, Theorem 6.1.1] and [62, Theorem 1.6] for further details. We first state the necessary bound on the spectral norm of the sum of finitely many independent, bounded random matrices.

Theorem C.1 (Matrix Bernstein Inequality)

Let $\{\boldsymbol{S}_{k}\}$ be a finite sequence of independent $d_{1}\times d_{2}$ random matrices such that $\mathbb{E}[\boldsymbol{S}_{k}]=\boldsymbol{0}$ and $\|\boldsymbol{S}_{k}\|\leq L$ for all $k$ almost surely. Let $\boldsymbol{Z}:=\sum_{k}\boldsymbol{S}_{k}$ and let $v(\boldsymbol{Z})$ denote the matrix variance defined by

[TABLE]

Then

[TABLE]

for all $t>0$ .

The remainder of the proof consists of a specialization of this inequality to the special case $\boldsymbol{Z}=\boldsymbol{A}-\tilde{\boldsymbol{A}}$ . Indeed, let

[TABLE]

where $\boldsymbol{d}_{j}:=[\boldsymbol{A}-\tilde{\boldsymbol{A}}]_{j}$ denotes the $j$ th column of $\boldsymbol{A}-\tilde{\boldsymbol{A}}$ and $\boldsymbol{e}_{j}$ denotes the $j$ th standard basis vector. It is clear that $\boldsymbol{Z}=\boldsymbol{A}-\tilde{\boldsymbol{A}}=\sum_{j=1}^{N}\boldsymbol{S}_{j}$ . It remains to estimate an upper bound $L$ on the spectral norms of the matrices $\{\boldsymbol{S}_{j}\}$ and an upper bound on the variance $v(\boldsymbol{Z})$ . Once we have estimated these quantities, we will substitute them into (36) to complete the proof.

We begin with the following estimate on $L$ , which is an immediate consequence of the standard Bernstein Inequality (13).

Lemma C.1

There is a constant $c_{1}>0$ such that matrices $\{\boldsymbol{S}_{j}\}$ defined by (37) satisfy

[TABLE]

for all $j=1,2,\dots,N$ with probability at least $1-2N^{-5}$ .

**Proof: ** Fix $j\in\{1,\dots,N\}$ . Note that the $\|\boldsymbol{S}_{j}\|=\|\boldsymbol{d}_{j}\boldsymbol{e}_{j}^{T}\|=\|\boldsymbol{d}_{j}\|\|\boldsymbol{e}_{j}\|=\|\boldsymbol{d}_{j}\|$ . Moreover, the Bernstein Inequality (13) implies that

[TABLE]

with probability at least $1-2N^{-6}$ , where the last inequality follows from the fact that $n-n_{j}=O(n)$ w.h.p. (by (13)) and $\log N=O(n)$ (by the gap assumption). Taking the square root completes the proof.

We next bound the matrix variance $v(\boldsymbol{Z})$ .

Lemma C.2

The matrix $\boldsymbol{Z}=\sum_{j}\boldsymbol{S}_{j}$ defined by (37) satisfies $v(\boldsymbol{Z})\leq c\tilde{\sigma}^{2}_{p}N$ for any constant $c>0$ satisfying $n-n_{j}>(1/c)n$ for all $j$ .

**Proof: ** It suffices to construct upper bounds on each of $\|\mathbb{E}[\boldsymbol{Z}\boldsymbol{Z}^{T}]\|$ and $\|\mathbb{E}[\boldsymbol{Z}^{T}\boldsymbol{Z}]\|$ . We begin with the latter. Note that $\boldsymbol{S}_{j}^{T}\boldsymbol{S}_{j}^{T}=(\boldsymbol{d}_{j}^{T}\boldsymbol{d}_{j})\boldsymbol{e}_{j}\boldsymbol{e}_{j}^{T}$ . This implies that $\boldsymbol{Z}^{T}\boldsymbol{Z}$ is a diagonal matrix with $j$ th diagonal entry equal to $\|\boldsymbol{d}_{j}\|^{2}$ . It follows that $\|\mathbb{E}(\boldsymbol{Z}^{T}\boldsymbol{Z})\|=\max_{j}\mathbb{E}[\|\boldsymbol{d}_{j}\|^{2}].$ For each $j=1,2,\dots,N$ , we have

[TABLE]

where the inequality follows from the assumption that $n-n_{j}\geq(1/c)n$ and the second to last inequality follows from the fact that $\mathbb{E}[(n_{j}-pn)^{2}]$ is equal to the variance of the binomial variable $n_{j}$ . This implies that

[TABLE]

On the other hand, $\mathbb{E}[\boldsymbol{Z}\boldsymbol{Z}^{T}]=\sum_{j=1}^{N}\mathbb{E}[\boldsymbol{S}_{j}\boldsymbol{S}_{j}^{T}]=\sum_{j=1}^{N}\mathbb{E}[\boldsymbol{d}_{j}\boldsymbol{d}_{j}^{T}]$ . It follows immediately that

[TABLE]

by the triangle inequality and Jensen’s inequality. Applying (38), we see that

[TABLE]

Substituting (39) and (41) into the formula for $v(\boldsymbol{Z})$ , we see that we have $v(\boldsymbol{Z})\leq cN\tilde{\sigma}^{2}_{p}$ .

We are now ready to complete the proof of Lemma B.1. Let’s consider the following cases. First, suppose that $\tilde{\sigma}^{2}_{p}N\geq\log N$ and let $t=\tilde{c}\sqrt{\tilde{\sigma}^{2}_{p}N}\log N$ in (36). Recall that applying the Bernstein Inequality (13) to each binomial variable $n_{j}$ implies that there is a constant $c$ such that $n-n_{j}\geq(1/c)n$ for all $j$ with probability at least $1-2N^{-5}$ . This implies that we have $v(\boldsymbol{Z})\leq c\tilde{\sigma}^{2}_{p}N$ with the same probability. Substituting into (36), along with the choice of $L$ from Lemma C.1, we see that

[TABLE]

using the assumption that $\sqrt{\log N}\leq\sqrt{\tilde{\sigma}^{2}_{p}N}$ . Rearranging further, we see that

[TABLE]

if we choose $\tilde{c}$ large enough that $\tilde{c}^{2}/(c+\tilde{c}\sqrt{B})>7$ (which is possible if we impose the assumption that $B=O(1)$ ).

Next, consider the case that $\log N>\tilde{\sigma}^{2}_{p}N$ and let $t=\tilde{c}\log^{3/2}(N)$ . Then the Matrix Bernstein Inequality (36) implies that

[TABLE]

for any $\tilde{c}$ satisfying $\tilde{c}^{2}/(c+\tilde{c}\sqrt{B})>7$ . Combining the two cases, we see that there are constants $c_{1},c_{2}>0$ such that

[TABLE]

with probability at least $1-c_{2}N^{-5}$ . This completes the proof.

Bibliography62

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Karp, R.: Reducibility among combinatorial problems. Complexity of Computer Computations 40 (4), 85–103 (1972)
2[2] Alon, N., Arora, S., Manokaran, R., Moshkovitz, D., Weinstein, O.: Inapproximability of densest κ 𝜅 \kappa -subgraph from average case hardness. Unpublished manuscript 1 (2011)
3[3] Feige, U.: Relations between average case complexity and approximation complexity. In: Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pp. 534–543. ACM (2002)
4[4] Khot, S.: Ruling out ptas for graph min-bisection, dense k-subgraph, and bipartite clique. SIAM Journal on Computing 36 (4), 1025–1071 (2006)
5[5] Henzinger, M.R., Motwani, R., Silverstein, C.: Challenges in web search engines. In: ACM SIGIR Forum, vol. 36, pp. 11–22. ACM (2002)
6[6] Gibson, D., Kumar, R., Tomkins, A.: Discovering large dense subgraphs in massive graphs. In: Proceedings of the 31st international conference on Very large data bases, pp. 721–732. VLDB Endowment (2005)
7[7] Angel, A., Sarkas, N., Koudas, N., Srivastava, D.: Dense subgraph maintenance under streaming edge weight updates for real-time story identification. Proceedings of the VLDB Endowment 5 (6), 574–585 (2012)
8[8] Gajewar, A., Das Sarma, A.: Multi-skill collaborative teams based on densest subgraphs. In: Proceedings of the 2012 SIAM International Conference on Data Mining, pp. 165–176. SIAM (2012)

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Convex optimization for the densest subgraph and densest submatrix problems

Abstract

1 Introduction

2 Relaxation of the Densest kkk-Subgraph Problem and Perfect Recovery of a Planted Clique

Definition 2.1

Theorem 2.1

3 The Densest Submatrix Problem

Definition 3.1

Theorem 3.1

4 Derivation of the Recovery Guarantees

Theorem 4.1

Lemma 4.1

4.1 Nonnegativity of Ξ\boldsymbol{\Xi}Ξ

4.2 A bound on the matrix W\boldsymbol{W}W

Lemma 4.2

Lemma 4.3

Lemma 4.4

5 A First-Order Method Based on the Alternating Direction Method of Multipliers

5.1 The Optimization Algorithm

5.2 Random matrices

5.2.1 Collaboration and Communication Networks

6 Conclusions

Appendix A Proof of Lemma 4.3

Appendix B Proof of Lemma 4.4

Lemma B.1

Appendix C Proof of Lemma B.1

Theorem C.1** (Matrix Bernstein Inequality)**

Lemma C.1

Lemma C.2

2 Relaxation of the Densest $k$ -Subgraph Problem and Perfect Recovery of a Planted Clique

4.1 Nonnegativity of $\boldsymbol{\Xi}$

4.2 A bound on the matrix $\boldsymbol{W}$

Theorem C.1 (Matrix Bernstein Inequality)