Sharp Online Hardness for Large Balanced Independent Sets

Abhishek Dhawan; Eren C. K{\i}z{\i}lda\u{g}; Neeladri Maitra

arXiv:2508.20785·cs.DS·September 12, 2025

Sharp Online Hardness for Large Balanced Independent Sets

Abhishek Dhawan, Eren C. K{\i}z{\i}lda\u{g}, Neeladri Maitra

PDF

Open Access

TL;DR

This paper investigates the size and algorithmic complexity of large balanced independent sets in dense bipartite graphs, establishing tight bounds and designing an online algorithm that nearly achieves the optimal size.

Contribution

It introduces a precise size characterization for large balanced independent sets in dense bipartite graphs and develops an online algorithm that approaches this bound, supported by a new lower bound using the OGP framework.

Findings

01

Largest $oldsymbol{ extgamma}$-balanced independent set size is $oldsymbol{rac{ ext{log}_b n}{ extgamma(1- extgamma)}}$ whp.

02

Proposed online algorithm achieves size close to the upper bound with high probability.

03

No online algorithm can surpass the bound by a factor of $(1+oldsymbol{ extepsilon})$, establishing tightness.

Abstract

We study the algorithmic problem of finding large $γ$ -balanced independent sets in dense random bipartite graphs; an independent set is $γ$ -balanced if a $γ$ proportion of its vertices lie on one side of the bipartition. In the sparse regime, Perkins and Wang established tight bounds within the low-degree polynomial (LDP) framework, showing a factor- $1/ (1 - γ)$ statistical-computational gap via the Overlap Gap Property (OGP) framework tailored for stable algorithms. However, these techniques do not appear to extend to the dense setting. For the related large independent set problem in dense random graph, the best known algorithm is an online greedy procedure that is inherently unstable, and LDP algorithms are conjectured to fail even in the "easy" regime where greedy succeeds. We show that the largest $γ$ -balanced independent set in dense random bipartite graphs…

Equations245

α_{STAT} : = \frac{lo g _{b} n}{γ ( 1 - γ )}, where b : = b (p) = \frac{1}{1 - p} .

α_{STAT} : = \frac{lo g _{b} n}{γ ( 1 - γ )}, where b : = b (p) = \frac{1}{1 - p} .

α_{COMP} : = (1 - γ) α_{STAT} = \frac{lo g _{b} n}{γ} .

α_{COMP} : = (1 - γ) α_{STAT} = \frac{lo g _{b} n}{γ} .

P [∣ A (G) ∣ \geq k] \geq δ .

P [∣ A (G) ∣ \geq k] \geq δ .

k = (1 - ϵ) α_{COMP}, and δ = 1 - exp (- Ω (n^{ϵ /2})) .

k = (1 - ϵ) α_{COMP}, and δ = 1 - exp (- Ω (n^{ϵ /2})) .

k = (1 + ϵ) α_{COMP}, and δ = exp (- O (ϵ^{2} lo g_{b}^{2} n)) .

k = (1 + ϵ) α_{COMP}, and δ = exp (- O (ϵ^{2} lo g_{b}^{2} n)) .

\Biggl{|}\;\bigcup_{t=1}^{2n}S_{t}\;\cap\;\binom{\mathcal{A}(G_{\mathrm{bip}}(n,p))}{2}\Biggr{|}\;\leq\;c(\log_{b}n)^{2}.

\Biggl{|}\;\bigcup_{t=1}^{2n}S_{t}\;\cap\;\binom{\mathcal{A}(G_{\mathrm{bip}}(n,p))}{2}\Biggr{|}\;\leq\;c(\log_{b}n)^{2}.

k = (1 + ϵ) α_{COMP}, and δ = 1 - exp (- n^{Θ (1)}) .

k = (1 + ϵ) α_{COMP}, and δ = 1 - exp (- n^{Θ (1)}) .

c (γ, ϵ) : = \frac{1 - γ}{γ} ((1 + ϵ)^{2} - (c^{'})^{2}),

c (γ, ϵ) : = \frac{1 - γ}{γ} ((1 + ϵ)^{2} - (c^{'})^{2}),

c^{\prime}<\frac{\gamma\bigl{(}1-(1+\epsilon)(1-\gamma)\bigr{)}}{(1-\gamma)^{2}}.

c^{\prime}<\frac{\gamma\bigl{(}1-(1+\epsilon)(1-\gamma)\bigr{)}}{(1-\gamma)^{2}}.

Z_{α} (γ) = Z_{α}^{L} (γ) + Z_{α}^{R} (γ) .

Z_{α} (γ) = Z_{α}^{L} (γ) + Z_{α}^{R} (γ) .

E [Z_{α}] = (α γ n) (α ( 1 - γ ) n) (1 - p)^{γ (1 - γ) α^{2}} .

E [Z_{α}] = (α γ n) (α ( 1 - γ ) n) (1 - p)^{γ (1 - γ) α^{2}} .

E [Z_{α}]

E [Z_{α}]

= exp (α lo g n - γ (1 - γ) α^{2} lo g \frac{1}{1 - p})

\leq exp (α (lo g n - (1 + ϵ) γ (1 - γ) \frac{lo g _{b} n}{γ ( 1 - γ )} lo g \frac{1}{1 - p}))

\displaystyle=\exp\left(-\epsilon\alpha\log n\right)=\exp\bigl{(}-\Omega(\log^{2}n)\bigr{)},

E [Z_{α}] = exp (Θ (lo g^{2} n)) = ω (1) .

E [Z_{α}] = exp (Θ (lo g^{2} n)) = ω (1) .

Z : = (L^{'}, R^{'}) : L^{'} \subset L, R^{'} \subset R ∣ L^{'} ∣ = γ α_{ϵ}, ∣ R^{'} ∣ = (1 - γ) α_{ϵ} \sum I_{(L^{'}, R^{'})},

Z : = (L^{'}, R^{'}) : L^{'} \subset L, R^{'} \subset R ∣ L^{'} ∣ = γ α_{ϵ}, ∣ R^{'} ∣ = (1 - γ) α_{ϵ} \sum I_{(L^{'}, R^{'})},

Z^{2} : = (L^{'}, R^{'}) : L^{'} \subset L, R^{'} \subset R ∣ L^{'} ∣ = γ α_{ϵ}, ∣ R^{'} ∣ = (1 - γ) α_{ϵ} \sum (L^{''}, R^{''}) : L^{''} \subset L, R^{''} \subset R ∣ L^{''} ∣ = γ α_{ϵ}, ∣ R^{''} ∣ = (1 - γ) α_{ϵ} \sum I_{(L^{'}, R^{'})} I_{(L^{''}, R^{''})} .

Z^{2} : = (L^{'}, R^{'}) : L^{'} \subset L, R^{'} \subset R ∣ L^{'} ∣ = γ α_{ϵ}, ∣ R^{'} ∣ = (1 - γ) α_{ϵ} \sum (L^{''}, R^{''}) : L^{''} \subset L, R^{''} \subset R ∣ L^{''} ∣ = γ α_{ϵ}, ∣ R^{''} ∣ = (1 - γ) α_{ϵ} \sum I_{(L^{'}, R^{'})} I_{(L^{''}, R^{''})} .

i_{1} : = ∣ L^{'} \cap L^{''} ∣ and i_{2} : = ∣ R^{'} \cap R^{''} ∣.

i_{1} : = ∣ L^{'} \cap L^{''} ∣ and i_{2} : = ∣ R^{'} \cap R^{''} ∣.

0 \leq i_{1} \leq γ α_{ϵ}, and 0 \leq i_{2} \leq (1 - γ) α_{ϵ} .

0 \leq i_{1} \leq γ α_{ϵ}, and 0 \leq i_{2} \leq (1 - γ) α_{ϵ} .

N (i_{1}, i_{2}) : = (i _{1} γ α _{ϵ}) (γ α _{ϵ} - i _{1} n - γ α _{ϵ}) (i _{2} ( 1 - γ ) α _{ϵ}) (( 1 - γ ) α _{ϵ} - i _{2} n - ( 1 - γ ) α _{ϵ}) .

N (i_{1}, i_{2}) : = (i _{1} γ α _{ϵ}) (γ α _{ϵ} - i _{1} n - γ α _{ϵ}) (i _{2} ( 1 - γ ) α _{ϵ}) (( 1 - γ ) α _{ϵ} - i _{2} n - ( 1 - γ ) α _{ϵ}) .

E [I_{(L^{'}, R^{'})} I_{(L^{''}, R^{''})}] = (1 - p)^{2 γ (1 - γ) α_{ϵ}^{2}} (1 - p)^{- i_{1} i_{2}},

E [I_{(L^{'}, R^{'})} I_{(L^{''}, R^{''})}] = (1 - p)^{2 γ (1 - γ) α_{ϵ}^{2}} (1 - p)^{- i_{1} i_{2}},

E [Z^{2}]

E [Z^{2}]

\frac{E [ Z ^{2} ]}{E [ Z ] ^{2}} = i_{1} = 0 \sum γ α_{ϵ} i_{2} = 0 \sum (1 - γ) α_{ϵ} \frac{( i _{1} γ α _{ϵ} ) ( γ α _{ϵ} - i _{1} n - γ α _{ϵ} )}{( γ α _{ϵ} n )} \frac{( i _{2} ( 1 - γ ) α _{ϵ} ) ( ( 1 - γ ) α _{ϵ} - i _{2} n - ( 1 - γ ) α _{ϵ} )}{( ( 1 - γ ) α _{ϵ} n )} (1 - p)^{- i_{1} i_{2}} .

\frac{E [ Z ^{2} ]}{E [ Z ] ^{2}} = i_{1} = 0 \sum γ α_{ϵ} i_{2} = 0 \sum (1 - γ) α_{ϵ} \frac{( i _{1} γ α _{ϵ} ) ( γ α _{ϵ} - i _{1} n - γ α _{ϵ} )}{( γ α _{ϵ} n )} \frac{( i _{2} ( 1 - γ ) α _{ϵ} ) ( ( 1 - γ ) α _{ϵ} - i _{2} n - ( 1 - γ ) α _{ϵ} )}{( ( 1 - γ ) α _{ϵ} n )} (1 - p)^{- i_{1} i_{2}} .

\frac{( γ α _{ϵ} - i _{1} n - γ α _{ϵ} )}{( γ α _{ϵ} n )} \leq \frac{( γ α _{ϵ} - i _{1} n - i _{1} )}{( γ α _{ϵ} n )} = \frac{( n - i _{1} )!}{n !} \frac{( γ α _{ϵ} )!}{( γ α _{ϵ} - i _{1} )!} \leq (\frac{γ α _{ϵ}}{n})^{i_{1}},

\frac{( γ α _{ϵ} - i _{1} n - γ α _{ϵ} )}{( γ α _{ϵ} n )} \leq \frac{( γ α _{ϵ} - i _{1} n - i _{1} )}{( γ α _{ϵ} n )} = \frac{( n - i _{1} )!}{n !} \frac{( γ α _{ϵ} )!}{( γ α _{ϵ} - i _{1} )!} \leq (\frac{γ α _{ϵ}}{n})^{i_{1}},

\frac{( ( 1 - γ ) α _{ϵ} - i _{1} n - ( 1 - γ ) α _{ϵ} )}{( γ α _{ϵ} n )} \leq \frac{( ( 1 - γ ) α _{ϵ} - i _{2} n - i _{2} )}{( ( 1 - γ ) α _{ϵ} n )} = \frac{( n - i _{2} )!}{n !} \frac{(( 1 - γ ) α _{ϵ} )!}{(( 1 - γ ) α _{ϵ} - i _{2} )!} \leq (\frac{( 1 - γ ) α _{ϵ}}{n})^{i_{2}} .

\frac{( ( 1 - γ ) α _{ϵ} - i _{1} n - ( 1 - γ ) α _{ϵ} )}{( γ α _{ϵ} n )} \leq \frac{( ( 1 - γ ) α _{ϵ} - i _{2} n - i _{2} )}{( ( 1 - γ ) α _{ϵ} n )} = \frac{( n - i _{2} )!}{n !} \frac{(( 1 - γ ) α _{ϵ} )!}{(( 1 - γ ) α _{ϵ} - i _{2} )!} \leq (\frac{( 1 - γ ) α _{ϵ}}{n})^{i_{2}} .

\frac{E [ Z ^{2} ]}{E [ Z ] ^{2}}

\frac{E [ Z ^{2} ]}{E [ Z ] ^{2}}

\leq i_{1} = 0 \sum γ α_{ϵ} i_{2} = 0 \sum (1 - γ) α_{ϵ} q (i_{1}, i_{2}) exp (- (i_{1} + i_{2}) (lo g n - 2 lo g α_{ϵ}) + i_{1} i_{2} lo g b),

q (i_{1}, i_{2}) \leq exp (- 2 (lo g n - 2 lo g α_{ϵ}) + lo g n) = exp (- Ω (lo g n)),

q (i_{1}, i_{2}) \leq exp (- 2 (lo g n - 2 lo g α_{ϵ}) + lo g n) = exp (- Ω (lo g n)),

q (i_{1}, i_{2})

q (i_{1}, i_{2})

= exp (i_{1} i_{2} lo g b (1 - (\frac{1}{i _{1}} + \frac{1}{i _{2}}) (lo g_{b} n - 2 lo g_{b} α_{ϵ}))),

\frac{1}{i _{1}} + \frac{1}{i _{2}} \geq \frac{1}{γ α _{ϵ}} + \frac{1}{( 1 - γ ) α _{ϵ}} = \frac{1}{( 1 - ϵ ) lo g _{b} n} \geq \frac{1 + ϵ}{lo g _{b} n} .

\frac{1}{i _{1}} + \frac{1}{i _{2}} \geq \frac{1}{γ α _{ϵ}} + \frac{1}{( 1 - γ ) α _{ϵ}} = \frac{1}{( 1 - ϵ ) lo g _{b} n} \geq \frac{1 + ϵ}{lo g _{b} n} .

1-\left(\frac{1}{i_{1}}+\frac{1}{i_{2}}\right)\bigl{(}\log_{b}n-2\log_{b}\alpha_{\epsilon}\bigr{)}\leq 1-(1+\epsilon)\left(1-\frac{2\log_{b}\alpha_{\epsilon}}{\log_{b}n}\right)\leq-\frac{\epsilon}{2},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplexity and Algorithms in Graphs · Markov Chains and Monte Carlo Methods · Advanced Graph Theory Research

Full text

Sharp Online Hardness for Large Balanced Independent Sets

Abhishek Dhawan Email: [email protected]. Partially supported by the NSF RTG grant DMS-1937241.

Eren C. Kızıldağ Email: [email protected]. Department of Statistics, University of Illinois Urbana-Champaign

Neeladri Maitra Email: [email protected].

Abstract

We study the algorithmic problem of finding large $\gamma$ -balanced independent sets in dense random bipartite graphs; an independent set is $\gamma$ -balanced if a $\gamma$ proportion of its vertices lie on one side of the bipartition. In the sparse regime, Perkins and Wang [PW24] established tight bounds within the low-degree polynomial (LDP) framework, showing a factor- $1/(1-\gamma)$ statistical–computational gap via the Overlap Gap Property (OGP) framework tailored for stable algorithms. However, these techniques do not appear to extend to the dense setting. For the related large independent set problem in dense Erdős-Rényi random graph $G(n,p)$ , the best known algorithm is an online greedy procedure that is inherently unstable, and LDP algorithms are conjectured to fail even in the “easy” regime where greedy succeeds.

For constant $p,\gamma\in(0,1)$ , we show that the largest $\gamma$ -balanced independent set in $G_{\text{bip}}(n,p)$ has size $\alpha_{\rm STAT}:=\frac{\log_{b}n}{\gamma(1-\gamma)}$ with high probability (whp), where $n$ is the size of each bipartition, $p$ is the edge probability, and $b=1/(1-p)$ . We design a two-stage online algorithm—revealing vertices sequentially and making irrevocable decisions based solely on current information—that achieves $(1-\epsilon)\alpha_{\rm COMP}$ whp for any $\epsilon>0$ , where $\alpha_{\rm COMP}:=(1-\gamma)\alpha_{\rm STAT}$ . We complement this with a sharp lower bound, showing that no online algorithm can achieve $(1+\epsilon)\alpha_{\rm COMP}$ with nonnegligible probability.

Our results suggest that the same factor- $1/(1-\gamma)$ gap is also present in the dense setting, supporting its conjectured universality. While the classical greedy procedure on $G(n,p)$ is straightforward, our algorithm is more intricate: it proceeds in two stages, incorporating a stopping time and suitable truncation to ensure that $\gamma$ -balancedness—a global constraint—is met despite operating with limited information. Our lower bound utilizes the OGP framework. Although the traditional scope of the OGP has been stable algorithms, we build on a recent refinement of this framework for online models and extend it to the bipartite setting.

1 Introduction

In this paper, we study the algorithmic problem of finding large balanced independent sets in dense random bipartite graphs. While finding large independent sets—or even approximating them to within an $n^{1-\epsilon}$ factor—is NP-hard in the worst-case [Hås99, Kho01], the situation becomes far more intriguing in the presence of randomness.

For the Erdős-Rényi random graph $G(n,\frac{1}{2})$ , the largest independent set has size approximately $2\log_{2}n$ with high probability (whp) [Mat70, Mat76, GM75, BE76]. Moreover, a simple greedy algorithm operating in an online fashion—where vertices are revealed sequentially, and the decision at step $t$ depends only on partial information available at step $t$ —finds an independent set of size $\log_{2}n$ with high probability [GM75]. In 1976, Karp asked whether it is possible to design an efficient algorithm that, whp, finds an independent set of size $(1+\epsilon)\log_{2}n$ for $\epsilon>0$ [Kar76]. Surprisingly, this question remains open and is widely believed to be computationally intractable. It is worth mentioning that proving the hardness of Karp’s task unconditionally would imply $P\neq NP$ .

Karp’s problem stands as a central question in average-case complexity and the algorithmic theory of random graphs.111For instance, Frieze explicitly highlighted it as a major open problem in his 2014 ICM plenary lecture [Fri14]. It is perhaps the earliest instance of a statistical-computational gap—a gap between the existential bound and the best known polynomial-time algorithm. This gap has been extensively studied (e.g., prompting Jerrum to propose the now-famous planted clique model [Jer92]), and a large body of work has since uncovered similar “factor 2-gaps” in other random graph models, suggesting a certain universality. For a broad overview of such gaps both in the context of random graphs and beyond, see the surveys [BPW18, Gam21, GMZ22, Gam25].

The sparse case, $G(n,\frac{d}{n})$ with constant $d$ , has seen substantial progress. In this case, the largest independent set has size $\sim 2n\frac{\log d}{d}$ [Fri90, FL92, BGT10], while the best known efficient algorithm only achieves $\sim n\frac{\log d}{d}$ , both whp [LW07].222Both guarantees hold in the double limit $n\to\infty$ followed by $d\to\infty$ . Is it hard to find independent sets of size $(1+\epsilon)n\frac{\log d}{d}$ ? As noted above, resolving this question would imply a conclusion even stronger than $P\neq NP$ . Accordingly, contemporary research has instead focused on providing rigorous evidence of hardness, e.g., by establishing unconditional lower bounds for certain classes of algorithms. This algorithmic gap led Gamarnik and Sudan to introduce the Overlap Gap Property (OGP) framework [GS14]. Leveraging this framework, tight hardness results were obtained for powerful algorithmic classes, including low-degree polynomial (LDP) algorithms [Wei22] and local algorithms [RV17]. These arguments were subsequently extended to sparse random bipartite graphs by Perkins and Wang [PW24], a work closely related to ours (see below); see also [DW24] for an extension to hypergraphs.

Dense Random Graphs

The situation is markedly different in the dense regime, $G(n,p)$ with constant $p$ . In this setting, while the hardness results were recently established for LDP algorithms [HS25], it remains unknown whether the greedy algorithm can itself be implemented as an LDP. In fact, it has been conjectured in a recent AIM workshop [24] that for $G(n,\frac{1}{2})$ :

Conjecture 1.1.

No degree- $o(\log^{2}n)$ polynomial can return an independent set of size $0.9\log_{2}n$ .

That is, LDP algorithms likely fail even in the regime where the greedy algorithm succeeds, suggesting they may not be viable at all in the dense setting. Given that LDP algorithms are quite powerful in many high-dimensional problems, this is particularly surprising—especially since the greedy algorithm operates using only partial, sequential information without accessing the full graph. Thus, different techniques are needed for analyzing the greedy algorithm and the online setting.

A recent work [GKW25] has refined the OGP framework and subsequently obtained sharp lower bounds for a broad class of online algorithms, which includes the greedy algorithm as a special case. Our work extends these techniques to dense random bipartite graphs, as we detail below.

1.1 Random Bipartite Graphs

Bipartite graphs arise frequently in modeling real-world scenarios with an inherent two-part structure (e.g., job assignment). From a theoretical standpoint, many classical problems in graph theory and extremal combinatorics—such as Turán- and Ramsey-type questions—have natural bipartite counterparts. In our context, random bipartite graphs are a natural testbed for investigating the robustness of statistical-computational gaps (e.g., the factor- $2$ gap discussed above).

Our focus is on the dense Erdős-Rényi random bipartite graph $G_{\mathrm{bip}}(n,p)$ defined on disjoint vertex sets $(L,R)$ where $|L|=|R|=n$ and $p$ is a constant. Each edge between $L$ and $R$ is included independently with probability $p$ . In general, maximum independent sets in bipartite graphs can be found efficiently via max-flow. Likewise, finding large independent sets in $G_{\mathrm{bip}}(n,p)$ is algorithmically easy; in fact, even approximately counting or sampling such sets in the sparse case $G_{\text{bip}}(n,d/n)$ is tractable—see [DW24] and references therein. However, powerful statistical mechanics heuristics [MP87] suggest that introducing global constraints may lead to a glassy phase and computational hardness.

A natural global constraint, as studied in [PW24], is to consider balanced independent sets $I$ , where $|I\cap L|=|I\cap R|$ . More generally, they allow for a specified proportion of vertices from each side of the bipartition, which is also our focus. Indeed, introducing such a global constraint already leads to computational barriers: finding a largest balanced independent set in a bipartite graph is NP-hard [GJ90, Fei02]. For further references and detailed discussion, see [PW24].

In this paper, we study $\gamma$ -balanced independent sets—those in which a $\gamma$ fraction of vertices lie in one part and a $(1-\gamma)$ fraction in the other, where $\gamma\in(0,\frac{1}{2}]$ without loss of generality.

Definition 1.2.

Given a bipartite graph $G$ with bipartition $(L,R)$ , an independent set $I$ of $G$ is $\gamma$ -balanced if $\bigl{|}|I\cap L|-\gamma|I|\bigr{|}<1$ or $\bigl{|}|I\cap R|-\gamma|I|\bigr{|}<1$ .

For the sparse case, $G_{\mathrm{bip}}(n,d/n)$ , Perkins and Wang [PW24] provide a fairly comprehensive picture. They show that (i) the largest $\gamma$ -balanced independent set has size $\left(1/(\gamma(1-\gamma))\pm o_{d}(1)\right)n\frac{\log d}{d}$ , (ii) there are local/LDP algorithms finding an independent set of size $\left(1/\gamma-o_{d}(1)\right)n\frac{\log d}{d}$ , and (iii) for $d$ large, local/LDP algorithms fail to find an independent set of size $\left(1/\gamma+\epsilon\right)n\frac{\log d}{d}$ . That is, a factor- $1/(1-\gamma)$ statistical-computational gap emerges. In the balanced case $\gamma=1/2$ , this reproduces the familiar factor- $2$ gap.

Can we extend these results to dense bipartite graphs? To begin with, their algorithmic result crucially exploits sparsity—specifically, the fact that sparse graphs are locally ‘tree-like’. This structural property no longer holds in the dense regime. In fact, as discussed earlier and suggested by Conjecture 1.1, such algorithms may not even be viable candidates in the dense setting. Instead, online algorithms emerge as a natural candidate. However, enforcing the $\gamma$ -balancedness constraint in an online setting with randomly ordered vertex arrivals poses new difficulties (see below). Furthermore, the core of their hardness result relies on the stability of local/LDP algorithms. Online algorithms, however, may be unstable; see [GKW25].

Dense Bipartite Graphs

In this paper, we characterize the landscape for dense random bipartite graphs, including (i) the statistical threshold $\alpha_{\rm STAT}$ for the largest $\gamma$ -balanced independent set, (ii) an online algorithm achieving the threshold $\alpha_{\rm COMP}=(1-\gamma)\alpha_{\rm STAT}$ , and (iii) a sharp algorithmic lower bound, showing that no online algorithm can surpass $\alpha_{\rm COMP}$ . This suggests that $\alpha_{\rm COMP}$ is the computational threshold for this problem. In particular, our results establish a similar factor- $1/(1-\gamma)$ statistical-computational gap, thus reinforcing its apparent universality.

Before describing our main results, we highlight key challenges. First and foremost, any approach that incurs even mild logarithmic factors would fall short of addressing a constant-factor computational gap. In the sparse setting, the OGP framework has proven to be a powerful tool for addressing constant-factor gaps. However, this framework applies primarily to stable algorithms, including local/LDP algorithms. In light of Conjecture 1.1, such algorithms are unlikely to be viable in the dense case. Instead, online algorithms offer a more natural starting point—further motivated by the fact that the best known algorithm for $G(n,p)$ itself is an online algorithm. Despite this, applications of the OGP framework to online algorithms remain only a handful, see Section 2.1. Even for classical greedy on $G(n,p)$ , lower bounds via OGP require delicate refinements and have been established only very recently [GKW25]. In our setting, the technical challenges are compounded by two additional factors: (i) the greedy algorithm operates with only partial information, while the $\gamma$ -balancedness constraint is global; and (ii) the random vertex arrival order—a core feature of online algorithms—can lead to situations revealing few or no cross-edges in the bipartite graph, further complicating the analysis (see below).

In the classical online setting (e.g., greedy on $G(n,p)$ ), vertices arrive sequentially and in random order. At each step $t$ , the algorithm decides whether to include the incoming vertex $v_{t}$ by inspecting its connections to $I_{t-1}$ , the independent set constructed thus far. Crucially, the decision for $v_{t}$ must be made immediately—it cannot be deferred.

The introduction of the $\gamma$ -balancedness constraint makes the situation more delicate. In the setting of [PW24], the algorithm is not online; once an independent set is constructed, it can be rebalanced—e.g., by discarding vertices—to satisfy the balancedness constraint. In our setting, however, such post-hoc pruning is not allowed, as it breaks the requirement that decisions are irrevocable. Consequently, the algorithm must be cognizant of the balancedness constraint throughout its execution.

This challenge is further compounded by the randomness in vertex arrival order. Crucially, if majority of the vertices come from the same side of the bipartition in, e.g., the first $n$ steps, the algorithm receives limited information about the status of cross-edges. In contrast, in $G(n,p)$ , each step reveals the status of some new edges, regardless of the vertex order.

The random arrival order, along with the $\gamma$ -balancedness constraint also necessitates truncation, elaborated next. Consider the case where the first $n$ (or $n-o(n)$ ) vertices all come from the same side of the bipartition. A greedy algorithm might add all of them—clearly violating the balancedness constraint. More subtly, even adding as few as $(1+\epsilon)\log_{b}n$ of these vertices can prevent any subsequent vertex from the opposite side from being included—indicating that a careful truncation (without violating onlineness) is essential. Hence, while the analysis for classical greedy on $G(n,p)$ is rather easy, this is not the case in our setting. Moreover, any meaningful algorithmic lower bound should be oblivious to the arrival order—including adversarial ones such as the scenarios described above.

To address these challenges, we construct suitable auxiliary stochastic processes to track the evolution of the algorithm’s output, along with judiciously chosen stopping times that enforce the balancedness constraint without violating the online requirement.

1.2 Summary of Main Results

Recall that $G_{\mathrm{bip}}(n,p)$ is the random bipartite graph on vertex set $(L,R)$ with $|L|=|R|=n$ , where each edge in $L\times R$ is present independently with probability $p=\Theta(1)$ . Fix $\gamma\in(0,\frac{1}{2}]$ and set

[TABLE]

For the remainder of this paper, we ignore all floor/ceiling operators for simplicity with the understanding that this does not affect the overall arguments.

Theorem 1.3 (Informal, see Theorem 3.1).

The largest $\gamma$ -balanced independent set in $G_{\mathrm{bip}}(n,p)$ has size $\bigl{(}1\pm o(1)\bigr{)}\alpha_{\rm STAT}$ whp.

Theorem 1.3 identifies $\alpha_{\rm STAT}$ in (1) as the statistical threshold. Equipped with this, a natural algorithmic question arises: can we efficiently find large $\gamma$ -balanced independent sets? We set

[TABLE]

Our next result shows that $\alpha_{\rm COMP}$ is attainable in polynomial time.

Theorem 1.4 (Informal, see Theorem 3.4).

There is an online algorithm—oblivious to the vertex arrival order—which, for any $\epsilon>0$ , finds a $\gamma$ -balanced independent set of size $(1-\epsilon)\alpha_{\mathrm{COMP}}$ whp.

See Definition 3.2 for a description of online algorithms. The decision at time $t$ takes polynomial time, so the overall runtime of our algorithm is polynomial in $n$ . Notably, our algorithm is completely oblivious to the vertex arrival order. As mentioned earlier, while the analysis of the classical greedy algorithm on $G(n,p)$ is relatively straightforward, the situation here is more delicate. The presence of the $\gamma$ -balancedness constraint, along with the random arrival order, introduces additional challenges. We address these by (i) analyzing suitable stochastic processes that track the algorithm’s evolution, and (ii) employing a careful truncation to ensure the $\gamma$ -balancedness.

Observe that there is a factor- $1/(1-\gamma)$ gap between the statistical threshold and the algorithmic value, reminiscent of the computational gap in the sparse setting. Our final result addresses this gap and establishes a sharp computational lower bound.

Theorem 1.5 (Informal, see Theorem 3.5).

For any $\epsilon>0$ , no online algorithm finds a $\gamma$ -balanced independent set of size $(1+\epsilon)\alpha_{\rm COMP}$ with probability at least $\exp(-O(\log^{2}n))$ .

We note that there is no restriction on the runtime of the algorithms ruled out. Furthermore, Theorem 1.5 establishes strong hardness—it rules out online algorithms that succeed even with vanishing probability. Importantly, the probability guarantee is essentially optimal: the probability that a randomly chosen $\gamma$ -balanced set of size $(1+\epsilon)\alpha_{\rm COMP}$ is an independent set is itself $\exp(-\Theta(\log^{2}n))$ .

Taken together, Theorems 1.4 and 1.5 provide a tight characterization of the performance of online algorithms, providing rigorous evidence toward the following conjecture (we note that a version of this conjecture in the sparse regime appears in [PW24]).

Conjecture 1.6.

For any $\epsilon>0$ and $p=\Theta(1)$ , no polynomial-time algorithm finds a $\gamma$ -balanced independent set of size $(1+\epsilon)\alpha_{\rm COMP}$ in $G_{\mathrm{bip}}(n,p)$ whp.

Future Queries

The setting of [GKW25] permits querying a limited set of future edges—edges incident to vertices not yet seen—at each step. That is, the decision at time $t$ is based not only on the edges revealed so far, but also on a restricted set of such future edges. In this augmented model, [GKW25] prove both lower bounds and algorithmic guarantees: specifically, they show that algorithms with modest access to future information can in fact exceed the $(1+\epsilon)\log_{b}n$ threshold, albeit using quasi-polynomial time.

This naturally raises the following question: can algorithms with limited future information outperform the $(1+\epsilon)\alpha_{\rm COMP}$ threshold for $G_{\mathrm{bip}}(n,p)$ ? We show that the answer is yes:

Theorem 1.7 (Informal, see Theorem 3.8).

For any $\epsilon>0$ , there exists an online algorithm which makes limited future queries and finds a $\gamma$ -balanced independent set of size $(1+\epsilon)\alpha_{\rm COMP}$ with probability at least $1-\exp(-n^{\Theta(1)})$ .

Our algorithm is online, and runs in super-polynomial time. It has a mild dependence on the vertex arrival order—indeed, no online algorithm that is fully oblivious to arrival order can surpass $(1+\epsilon)\alpha_{\rm COMP}$ . See Section 3.3 for further discussion.

1.3 Proof Overview

In this section, we will provide an overview of our proof techniques. The proof of the statistical threshold follows a standard application of the first and the second moment similar to that of $G(n,p)$ tailored to the bipartite setting. Much of our effort is in the proof of the computational threshold.

Achievability result

The greedy algorithm for ordinary independent sets is inherently online, however, as mentioned earlier the global nature of the $\gamma$ -balancedness constraint introduces technical challenges. We overcome these challenges by designing a two-stage online algorithm with truncation steps, i.e., we stop adding vertices to the independent set $I$ from a partition $\eta$ once we have reached the desired number of vertices in $I\cap\eta$ . More formally:

In stage one, we greedily add vertices to the independent set $I$ until $I$ contains $(1-\epsilon)\gamma\alpha_{\rm COMP}$ vertices from one partition. 2. 2.

At this point, without loss of generality, let us assume $|I\cap L|=(1-\epsilon)\gamma\alpha_{\rm COMP}$ . During stage two, we only add vertices in $R$ to the independent set, stopping once $|I\cap R|=(1-\epsilon)(1-\gamma)\alpha_{\rm COMP}$ .

Note that there are two “truncation” points in our algorithm (one in each stage). Furthermore, since $\gamma\leq 1/2$ we may assume $|I\cap R|<(1-\epsilon)(1-\gamma)\alpha_{\rm COMP}$ at the beginning of stage two.

In order to prove our acheivability result, we must show that enough vertices from $R$ are added to $I$ during stage two whp. Under the assumption that $|I\cap L|=(1-\epsilon)\gamma\alpha_{\rm COMP}$ , the probability that an arbitrary vertex in $R$ considered during stage two is in fact added to $I$ is $n^{-1+\epsilon}$ . Therefore, it is enough to show that $\gg n^{1-\epsilon}$ vertices in $R$ remain to be exposed during stage two. The key part of our analysis, therefore, is the following result: stage one concludes in at most $n^{1-\epsilon/2}$ steps whp irrespective of the vertex arrival order (Lemma 5.1).

Impossibility result

The proof of the upper bound of our computational threshold falls within the OGP framework (see Section 2.1 for a brief history of the technique). As mentioned earlier, OGP-based arguments have predominantly served as a barrier to stable algorithms. As online algorithms may not be stable, one needs to refine the approach to adapt it to this setting. Our work is one of the first to do so.

Given an algorithm $\mathcal{A}$ , we aim to bound the probability, denoted by $\delta$ , that $\mathcal{A}$ finds a $\gamma$ -balanced independent set of size at least $(1+\epsilon)\alpha_{\rm COMP}$ . At the heart of our proof lies a sequence of correlated random graphs $(G_{i}^{(T)})_{i\in[m],T\in[2n]}$ for $m=\Theta(\epsilon^{-2})$ . We define a successful event $\mathcal{S}$ determined by running the algorithm $\mathcal{A}$ on each of the graphs $G_{i}^{(T)}$ . Roughly speaking, $\mathcal{S}$ denotes the event that for a specific timestep $\tau\coloneqq\tau\left(\mathcal{A},\,(G_{i}^{(T)})_{i\in[m],T\in[2n]}\right)$ , the independent sets $\mathcal{A}(G_{1}^{(\tau)}),\ldots,\mathcal{A}(G_{m}^{(\tau)})$ all have size at least $(1+\epsilon)\alpha_{\rm COMP}$ . We then show the following:

$\mathbb{P}[\mathcal{S}]\geq\delta^{m}$ , and 2. 2.

$\mathbb{P}[\mathcal{S}]=\exp(-\Omega(\log_{b}^{2}n))$ .

Combining the above completes the proof.

We note that the sequence of correlated graphs $(G_{i}^{(T)})_{i\in[m],T\in[2n]}$ is defined with respect to the algorithm $\mathcal{A}$ . In particular, the vertex arrival order determines the correlations within the sequence. This is in stark contrast to other applications of the OGP framework and is the key refinement to adapt the OGP technique to the (potentially unstable) online setting. For instance, in [Wei22, PW24, DW24], the authors define sequences of correlated random (hyper)graphs independent of the algorithm $\mathcal{A}$ in question. They then apply the algorithm to each graph to construct a forbidden substructure whp—a sequence of independent sets which appear with low probability; therefore, arriving at a contradiction.

1.4 Open Problems

We conclude this introduction with a description of potential future directions of inquiry.

One-Stage Algorithm

Our algorithm relies on a two-stage structure to enforce the $\gamma$ -balancedness constraint. A natural question is whether a one-stage algorithm—akin to greedy on $G(n,p)$ —can achieve $\alpha_{\rm COMP}$ . This seems difficult due to the $\gamma$ -balancedness constraint, though one might try introducing a bias at each step: based on the vertex’s side of the bipartition, current balance, and connectivity, decide via a coin flip. We leave this for future work.

Hypergraphs

A recent work [DW24] extends the results of [GS14, RV17, Wei22] to sparse random $r$ -uniform hypergraphs, obtaining algorithmic guarantees and sharp computational lower bounds for LDP algorithms, and demonstrating the emergence of an analogous statistical–computational gap. It also investigates the universality of this gap via a multipartite hypergraph version of the largest balanced independent set problem (introduced in [Dha25]) in sparse $r$ -uniform $r$ -partite hypergraphs, recovering and generalizing the results of [PW24]. Extending our results and the results of [GKW25] to dense random hypergraphs are both interesting directions for future work.

Optimization Problems with Global Constraints

The balanced independent set problem in bipartite graphs is an example of how a problem in $P$ can be made $NP$ -hard by imposing global constraints. It is worth investigating whether such gaps persist across similar problems. For instance, the largest induced matching problem—finding the largest matching $M\subseteq E(G)$ such that $E(G[V(M)])=M$ —is $NP$ -hard [Cam89]. The statistical threshold was determined in [Coo+21], while the computational threshold is unknown. Another problem to explore would be the $m$ -partite graph analogue of the largest balanced independent set problem; we note that a coloring variant of this problem was suggested in [Cha23] for deterministic graphs.

2 Statistical-Computational Gaps and OGP: Prior Work

A wide range of problems across probability theory, computer science, high-dimensional statistics, and machine learning involve randomness and exhibit a common phenomenon: a statistical-computational gap. That is, there is often a discrepancy between what is information-theoretically possible and what is achievable by efficient algorithms. For example, in random optimization problems such as the one we study, the optimal value can often be identified through non-constructive means. However, known polynomial-time algorithms yield strictly suboptimal solutions, and no efficient method is known for finding a global optimum without brute-force search. The models with such an apparent gap include random constraint satisfaction problems (CSP) [MMZ05, AR06, AC08, GS17a, BH21, Yun24], spin glass models [Che+19, HS21, HS23, GJ21, GJW20, GJK23, Kız25, Sel25], number balancing and discrepancy minimization [MS25, GK23, Gam+23], Ising perceptron [Gam+22, LSZ24], as well as various computational problems over random graphs [GS14, GS17, GJW20, Wei22, PW24, DDL23, DW24] and more.

Standard complexity theory is tailored primarily to worst-case hardness and offers limited insight into average-case models (see [Ajt96, BBB21, GK21, VV25] for a few notable exceptions). Nevertheless, these gaps are a very active area of investigation; researchers have developed various frameworks for providing rigorous evidence of hardness. For an overview of these methods, we refer the reader to the excellent surveys [WX18, BPW18, Gam21, GMZ22, Gam25, Wei25].

2.1 Computational Gaps in Random Optimization Problems

For random optimization problems, arguably the most powerful framework for establishing algorithmic hardness is the Overlap Gap Property (OGP) introduced by Gamarnik and Sudan [GS14] (and formally named in [GL18]). Building on insight from statistical physics—particularly the intriguing connection between the onset of algorithmic hardness and geometric phase transitions in random CSPs [MMZ05, AC08, AR06]—the OGP framework has proven instrumental in establishing rigorous algorithmic barriers by leveraging intricate geometry of the optimization landscape. For surveys on OGP, see [Gam21, Gam25].

We briefly describe this framework in its original context: finding large independent sets in sparse random graphs. Gamarnik and Sudan [GS14] established that independent sets of size $(1+1/\sqrt{2})n\frac{\log d}{d}$ exhibit an ‘overlap-gap’: any two such sets have either a large or a small intersection, with no overlaps of intermediate size. This structural property enabled them to rule out local algorithms at this threshold, thereby refuting a conjecture of Hatami-Lovász-Szegedy [HLS14] which posited that local algorithms can find maximum independent sets in $d$ -regular random graphs.

Subsequent work by Rahman-Virág [RV17] extended this hardness result down to the sharp threshold $n\frac{\log d}{d}$ , below which polynomial-time algorithms are known [LW07]. Unlike the pairwise OGP considered in earlier work, their approach relied on analyzing overlaps among multiple independent sets—a notion termed multi-OGP—to obtain tight lower bounds. Recent works have introduced refined variants of the multi OGP: for instance, asymmetric versions of OGP yielded tight lower bounds against LDP algorithms [Wei22, BH21], and the branching OGP [HS21] has emerged as a very powerful tool in the study of spin glasses. The OGP framework has since become the ‘bread-and-butter’ for proving sharp computational lower bounds in numerous random optimization problems. For certain models—such as Ising perceptron and discrepancy minimization—OGP-based hardness results are complemented by more traditional notions of average-case hardness (e.g., worst-case hardness of approximating the shortest vector in lattices) [VV25].333Interestingly, there exist models exhibiting the OGP, which remain solvable in polynomial time (e.g., by linear programming) [LS24]—beyond the classical counterexample of random XOR-SAT solvable by Gaussian elimination. The literature on OGP is now quite extensive; we refer the reader to references above.

OGP for Online Algorithms

The OGP framework is primarily tailored for stable algorithms—those whose output is insensitive to small variations in the input.444Informally, an algorithm $\mathcal{A}$ is stable if for any inputs $G,G^{\prime}$ with small $\|G-G^{\prime}\|$ , the outputs $\mathcal{A}(G),\mathcal{A}(G^{\prime})$ are close. Many prominent algorithms for average-case models fall into this category, including local algorithms (e.g., factors of iid) [GS14, RV17], LDP algorithms, approximate message passing [GJ21], Boolean circuits with low depth [GJW21], as well as gradient descent and Langevin dynamics [GJW20]. OGP-based hardness arguments rely critically on this stability—for instance, to construct interpolation arguments showing that the algorithm’s trajectory evolves smoothly and thus avoids the intermediate overlap region which is forbidden in the solution space. Online algorithms, however, may be unstable—see [GKW25, Proposition 1.1].

Can OGP-based barriers be extended to online algorithms? This question is especially relevant in the modern era of big data, where the online setting is a natural model of decision-making under uncertainty and have been extensively studied in the optimization and machine learning literature [RST10, RST11, RST11a, RS13, Haz+16]. This question has first been addressed in [Gam+23], where sharp lower bounds for online algorithms were obtained for the Ising perceptron. More recently, OGP-based barriers for online algorithms were obtained for the graph alignment problem [DGH25] and the largest submatrix problem [BGG25].

Extending such online barriers to random graphs—the very setting where the OGP has first emerged in—turned out to be quite challenging. This is particularly due to the lack of stability, a key feature that OGP-based arguments crucially build on. The first lower bounds for online algorithms in $G(n,p)$ were obtained in [GKW25] through novel technical refinements. Their arguments include (i) the construction of temporal interpolation paths that evolve with the algorithm (in contrast to earlier OGP-based barriers, which are algorithm-independent) as well as (ii) the use of stopping times tracking the size of the output. In the present paper, we extend the techniques of [GKW25] to the bipartite setting where the arguments are further refined to (i) incorporate the $\gamma$ -balancedness constraint, and (ii) handle the random arrival order, which may reveal very few cross-edges—thus providing limited information.

3 Main Results

We begin by determining the size of the largest $\gamma$ -balanced independent set in $G_{\mathrm{bip}}(n,p)$ for constant $p$ . Recall from (1) that $\alpha_{\rm STAT}=\frac{\log_{b}n}{\gamma(1-\gamma)}$ , where $b=1/(1-p)$ .

Theorem 3.1.

Let $Z_{\alpha}(\gamma)$ denote the number of $\gamma$ -balanced independent sets of size $\alpha$ . For any fixed $\epsilon\in(0,1)$ and $\alpha_{\rm STAT}$ as in (1), the following hold:

(S1)

For $\alpha\geq(1+\epsilon)\alpha_{\rm STAT}$ , $\mathbb{P}[Z_{\alpha}(\gamma)\geq 1]=\exp(-\Theta(\log^{2}n))$ . 2. (S2)

For $\alpha\leq(1-\epsilon)\alpha_{\rm STAT}$ , $\mathbb{P}[Z_{\alpha}(\gamma)\geq 1]\geq 1-\exp(-\Theta(\log n))$ .

Thus, the largest $\gamma$ -balanced independent set is approximately of size $\alpha_{\rm STAT}$ , which we refer to as the statistical threshold. Theorem 3.1 follows from a standard application of the first and the second moment method, see Section 4 for the proof.

Given this benchmark, a natural algorithmic question arises: can we find such independent sets efficiently? Motivated by the fact that the best known algorithm for the maximum independent set problem in $G(n,p)$ is an online greedy algorithm, we naturally investigate the performance of online algorithms.

3.1 Algorithmic Setting

The class of online algorithms we consider is formalized as follows.

Definition 3.2.

Let $G\sim G_{\mathrm{bip}}(n,p)$ have vertex set $L\cup R$ , where $|L|=|R|=n$ and $L\cap R=\varnothing$ . A randomized algorithm $\mathcal{A}$ with internal randomness determined by seed $\omega$ runs for $2n$ rounds and keeps track of sets $L_{t}\subseteq L$ and $R_{t}\subseteq R$ (initially $L_{0}=R_{0}=\varnothing$ ). At each round $t\in[2n]$ :

Based on $\omega$ and all information revealed so far, $\mathcal{A}$ randomly selects a vertex $v_{t}\in(L\cup R)\setminus(L_{t}\cup R_{t})$ and reveals the status of all edges $(v_{t},v)$ , where $v\in L_{t}\cup R_{t}$ . 2. 2.

Based on $\omega$ and all information revealed so far, $\mathcal{A}$ then decides if $\mathcal{A}_{t}(G)=\mathcal{A}_{t-1}(G)\cup\{v_{t}\}$ and updates the sets: (i) $L_{t+1}=L_{t}\cup\{v_{t}\}$ if $v_{t}\in L$ or (ii) $R_{t+1}=R_{t}\cup\{v_{t}\}$ if $v_{t}\in R$ .

Per Definition 3.2, the vertex arrival order is random, determined jointly by the algorithm’s internal randomness $\omega$ and the randomness of $G$ (its edges). If $v_{t}\in L$ (resp. $R$ ), then all edges from $v_{t}$ to vertices in $L$ (resp. $R$ ) are absent, so information arises only from edges to the opposite side of bipartition inspected so far. The algorithm may select multiple vertices from the same side in succession, potentially revealing no new information—for example, inspecting only $L$ in the first $n$ rounds and yielding $R_{n}=\varnothing$ .

Our results hold for the most general setting: (i) the algorithmic bound (Theorem 3.4) is independent of the arrival order, and (ii) the hardness result (Theorem 3.5) applies to all online arrival scenarios allowed by Definition 3.2.

Our focus is on online algorithms that return large independent sets with specificed probability, formalized as follows.

Definition 3.3.

For parameters $k>0$ and $\delta\in[0,1]$ , an online algorithm $\mathcal{A}$ operating according to Definition 3.2 is said to $(k,\delta)$ -optimize the $\gamma$ -balanced independent set problem in $G_{\mathsf{bip}}(n,p)$ if the following is satisfied when $G\sim G_{\mathsf{bip}}(n,p)$ :

[TABLE]

3.2 Algorithmic Results

Equipped with Definitions 3.2 and 3.3, we now present our algorithmic results. Recall $\alpha_{\rm COMP}=(1-\gamma)\alpha_{\rm STAT}=\log_{b}n/\gamma$ from (2). Our first result shows $(1-\epsilon)\alpha_{\rm COMP}$ is achievable for any $\epsilon>0$ .

Theorem 3.4.

For any $\epsilon>0$ and $p,\gamma\in(0,1)$ , there is an online algorithm $\mathcal{A}$ that $(k,\delta)$ -optimizes the $\gamma$ -balanced independent set problem in $G_{\mathsf{bip}}(n,p)$ , where

[TABLE]

See Section 5 for the proof. As noted earlier, the algorithm achieving $(1-\epsilon)\alpha_{\rm COMP}$ is online, implemented in two stages. In contrast, the classical greedy algorithm on $G(n,p)$ yields a straightforward online algorithm. The two-stage structure ensures that the global $\gamma$ -balancedness constraint is satisfied by the final output, even though the algorithm itself operates using only local information.

We next complement Theorem 3.4 with a sharp lower bound.

Theorem 3.5.

For any $\epsilon>0$ , there exists no online algorithm that $(k,\delta)$ -optimizes the balanced independent set problem in $G_{\mathsf{bip}}(n,p)$ , where

[TABLE]

We prove Theorem 3.5 through a refined version of the OGP framework adapted to the online setting, which leverages geometric properties of tuples of large $\gamma$ -balanced independent sets. See Section 6 for the details.

Taken together, Theorems 3.4 and 3.5 indicate the presence of factor- $1/(1-\gamma)$ statistical-computational gap with respect to online algorithms, with $\alpha_{\rm COMP}$ serving as the computational threshold for this model. As noted earlier, the same factor gap also appears in the context of sparse random bipartite graphs and LDP algorithms [PW24], further supporting the universality of this gap.

Remark 3.6.

Observe that while there exists an algorithm succeeding modulo an exponentially small probability below $\alpha_{\rm COMP}$ , even those with success probability $o(1)$ break down above $\alpha_{\rm COMP}$ . This is known as strong hardness, see [HS25]. We highlight that the probability guarantee in Theorem 3.5 is essentially the best possible: the probability that a randomly selected, $\gamma$ -balanced set of size $(1+\epsilon)\alpha_{\rm COMP}$ is an independent set is at most $\exp(-c\log_{b}^{2}n)$ for some constant $c>0$ .

3.3 Surpassing $\alpha_{\rm COMP}$ with Limited Future Queries

Our algorithmic lower bound shows that online algorithms, operating exclusively based on the information available up to round $t$ , cannot surpass $\alpha_{\rm COMP}$ . This provides strong evidence for the conjecture that $\alpha_{\rm COMP}$ is the true computational threshold for this model.

At the same time, prior work [GKW25] made an intriguing observation. For the largest independent set problem in $G(n,\tfrac{1}{2})$ , they showed that granting the algorithm access to a limited amount of additional information allows it to exceed the computational threshold.555The resulting algorithm, albeit being online, requires super-polynomial time. This naturally raises the following question: for the balanced independent set problem in $G_{\mathrm{bip}}(n,p)$ , can $\alpha_{\rm COMP}$ be surpassed if the algorithm is permitted a limited number of future queries at each round?

In this section, we show that the answer is yes: online algorithms augmented with a small number of future queries can indeed surpass $\alpha_{\rm COMP}$ . To make this precise, we extend the standard definition of an online algorithm (Definition 3.2) to allow future queries

Definition 3.7.

For $c>0$ , let $\mathcal{C}_{c}$ be the class of online algorithms operating according to Definition 3.2, with the following extension. At each round $t\in[2n]$ , the algorithm may, in addition to observing the edges $(v_{t},v)$ for $v\in L_{t}\cup R_{t}$ , query a (possibly random) set $S_{t}$ of vertex pairs $\{i,j\}$ —which may include vertices not yet revealed—and reveal the status of all edges in $S_{t}$ . The decision is then based on the combined information, where the number of future queries satisfy

[TABLE]

Note that $S_{t}$ may include pairs involving vertices from ${v_{t+1},\dots,v_{2n}}$ . For this reason, we refer to the edges in $S_{t}$ as future edges. The total amount of such future information is limited to $O(\log^{2}n)$ .

Our final main result is as follows.

Theorem 3.8.

For any $\epsilon>0$ and $p,\gamma\in(0,1)$ , there exists $c\coloneqq c(\gamma,\epsilon)$ and an online algorithm $\mathcal{A}\in\mathcal{C}_{c}$ that $(k,\delta)$ -optimizes the $\gamma$ -balanced independent set problem in $G_{\mathrm{bip}}(n,p)$ , where

[TABLE]

Our algorithm proceeds in three phases, described informally as follows. The first phase runs for $T=o(n)$ rounds and greedily constructs a $\gamma$ -balanced independent set $I_{T}=R_{T}\cup L_{T}$ with $R_{T}\subset R$ and $L_{T}\subset L$ , of size $c^{\prime}\log_{b}n/\gamma$ for a suitable constant $c^{\prime}\coloneqq c^{\prime}(\gamma,\epsilon)$ . The second phase is an exploration phase, where—using future queries—it identifies sets $W_{L}\subset L$ and $W_{R}\subset R$ not yet inspected, such that there are no edges (i) between $R_{T}$ and $W_{L}$ , and (ii) between $L_{T}$ and $W_{R}$ . In the third phase, it performs a brute-force search to identify a $\gamma$ -balanced independent set of size $(1+\epsilon-c^{\prime})\log_{b}n/\gamma$ inside $G_{\mathrm{bip}}(W_{L}\cup W_{R},p)$ , and augments this to $I_{T}$ .

Our analysis shows that the procedure articulated above can be implemented in an online fashion; see Section 7 for the details.

Importantly, our algorithm is not fully oblivious to the vertex arrival order. In fact, no online algorithm can produce a $\gamma$ -balanced independent set of size $(1+\epsilon)\log_{b}n$ (whp) while remaining completely oblivious to the arrival order. Suppose, for contradiction, that such an algorithm outputs a $\gamma$ -balanced independent set of size $(1+\epsilon)\alpha_{\rm COMP}$ . If the first $n$ arrivals all lie on the same side of the bipartition, then by the online constraint (decisions cannot be deferred), the algorithm must select at least $(1+\epsilon)\log_{b}n$ vertices from that side. This, however, precludes adding vertices from the opposite side, since the expected number of vertices there with no edge to the chosen $(1+\epsilon)\log_{b}n$ vertices is at most $n^{-\epsilon}$ . Thus, some dependence on arrival order is unavoidable. That said, our algorithm’s dependence on arrival order is fairly mild: the only step where order matters is in the first greedy phase (Algorithm 7), designed specifically to avoid such pathological situations.

Number of Future Queries

Our analysis in Section 7 shows that it suffices to take

[TABLE]

where $c^{\prime}$ is any arbitrary constant satisfying

[TABLE]

In the balanced case $(\gamma=\tfrac{1}{2}$ ), the condition on $c^{\prime}$ boils down to $c^{\prime}<1-\epsilon$ . In this case, one can choose $c(\frac{1}{2},\epsilon)$ such that $c(\frac{1}{2},\epsilon)\to 0$ as $\epsilon\to 0$ , e.g., by taking $c(\frac{1}{2},\epsilon)=6\epsilon$ . By contrast, for $\gamma<\tfrac{1}{2}$ , our analysis suggests that it is not possible to choose $c(\gamma,\epsilon)$ such that $c(\gamma,\epsilon)\to 0$ as $\epsilon\to 0$ . At first glance, this may seem surprising. However, note the following: as $\gamma\to\frac{\epsilon}{1+\epsilon}$ , we have $\alpha_{\rm COMP}\to\alpha_{\rm STAT}$ . In particular, as the statistical-computational gap is “smaller” for smaller values of $\gamma$ , one expects surpassing the gap to be somewhat “harder”. Indeed, our analysis reflects this: interpolating between $\gamma=1/2$ and $\gamma=\frac{\epsilon}{1+\epsilon}$ , we shift from $c(\gamma,\epsilon)=\Theta(\epsilon)$ to $c(\gamma,\epsilon)=\Theta(1/\epsilon)$ .666We remark that the distinction between $\gamma=1/2$ and $\gamma<1/2$ is prevalent in the analogous setting of $\gamma$ -balanced colorings. In fact, while the problem has been heavily studied for $\gamma=1/2$ [Cha23, FK10, Dha25, DW25], no results are known for $\gamma$ -balanced colorings for $\gamma<1/2$ (see the discussion in [DW25, Section 1.3] on the challenges involved).

4 Statistical Threshold: Proof of Theorem 3.1

In this section, we will prove Theorem 3.1. Let the vertex set of $G$ be $L\sqcup R$ with $|L|=|R|=n$ . Recall the definition of the random variable $Z_{\alpha}(\gamma)$ . We define two new variables $Z_{\alpha}^{\eta}(\gamma)$ for $\eta\in\{L,R\}$ denoting the number of $\gamma$ -balanced independent sets $I$ in $G_{\mathrm{bip}}(n,p)$ such that $\gamma$ proportion of $I$ lies in the partition determined by $\eta$ . Note that

[TABLE]

Furthermore, it is easy to see by symmetry that $Z_{\alpha}^{L}(\gamma)\stackrel{{\scriptstyle d}}{{=}}Z_{\alpha}^{R}(\gamma)$ and so it is enough to consider $Z_{\alpha}\coloneqq Z_{\alpha}^{L}(\gamma)$ .

We first prove (S1) by a simple first moment argument. Note the following for any $\alpha$ :

[TABLE]

For $\alpha\geq(1+\epsilon)\alpha_{\rm STAT}$ , we have

[TABLE]

as desired.

The remainder of this section is dedicated to the proof of (S2). Suppose $\alpha\leq(1-\epsilon)\alpha_{\rm STAT}$ . It is easy to verify from (4), using the inequality $\binom{n}{k}\geq(n/k)^{k}$ , that

[TABLE]

Moreover, as $Z_{\alpha}\geq Z_{\alpha^{\prime}}$ for $\alpha\leq\alpha^{\prime}$ it suffices to prove the claim when $\alpha=\alpha_{\epsilon}\coloneqq(1-\epsilon)\alpha_{\rm STAT}$ . We do so by showing $\mathbb{E}[Z_{\alpha_{\epsilon}}^{2}]/\mathbb{E}[Z_{\alpha_{\epsilon}}]^{2}=1+o(1)$ . The result then follows by the Paley-Zygmund inequality.

For notational convenience, set $Z\coloneqq Z_{\alpha_{\epsilon}}$ . Thus

[TABLE]

where $I_{(L^{\prime},R^{\prime})}$ is the indicator of the event that $(L^{\prime},R^{\prime})$ is a $\gamma$ -balanced independent set. Next,

[TABLE]

In what follows, we use the following parameterization:

[TABLE]

Clearly,

[TABLE]

Fix $(L^{\prime},R^{\prime})$ . The number of tuples $((L^{\prime},R^{\prime}),(L^{\prime\prime},R^{\prime\prime}))$ subject to (6) is

[TABLE]

For any such tuple, the quantity $\mathbb{E}[I_{(L^{\prime},R^{\prime})}I_{(L^{\prime\prime},R^{\prime\prime})}]$ depends solely on $(i_{1},i_{2})$ :

[TABLE]

where the term $(1-p)^{-i_{1}i_{2}}$ accounts for the double counted edges. Combining (5), (8), and (9), we obtain

[TABLE]

Combining this with (4), we arrive at

[TABLE]

Note the following as a result of the bounds in (7):

[TABLE]

and

[TABLE]

Applying these bounds to (10), we have

[TABLE]

where we use the fact that $\gamma=\Theta(1)$ and $b=1/(1-p)$ . To bound $q(i_{1},i_{2})$ , we consider cases.

Case 1:

$i_{1}=i_{2}=0$ . Clearly, $q(i_{1},i_{2})=1$ in this case. 2. Case 2:

$i_{1}=0$ and $i_{2}\geq 1$ or $i_{2}=0$ and $i_{1}\geq 1$ . As $\log\alpha_{\epsilon}=\Theta(\log\log n)$ , it follows that $q(i_{1},i_{2})=\exp\left(-\Omega(\log n)\right)$ in this case. 3. Case 3:

$i_{1},i_{2}\geq 1$ and $i_{1}i_{2}\leq\log_{b}n$ . In this case, we have

[TABLE]

once again. 4. Case 4:

$i_{1},i_{2}\geq 1$ and $i_{1}i_{2}\geq\log_{b}n$ . Observe that we can further modify $q(i_{1},i_{2})$ to get

[TABLE]

where we note that $1/i_{j}$ is well-defined since $i_{j}\geq 1$ for $j\in\{1,2\}$ . Using the bounds on $i_{1}$ and $i_{2}$ from (7), we control the harmonic mean of $i_{1}$ and $i_{2}$ as follows:

[TABLE]

Consequently,

[TABLE]

where we use the fact that $\log\alpha_{\epsilon}=\Theta(\log\log n)$ and $n$ is sufficiently large in terms of $\epsilon$ . With this, (12) is again upper bounded by $\exp\left(-\Omega(\log n)\right)$ .

Combining all of the above cases, (11) becomes:

[TABLE]

as $\gamma=\Theta(1)$ and $\log\alpha_{\epsilon}=\Theta(\log\log n)$ . Using the Paley-Zygmund inequality [AS16],

[TABLE]

completing the proof of (S2).

5 Achievability Result: Proof of Theorem 3.4

Recall the statistical threshold $\alpha_{\rm STAT}$ from Theorem 3.1 and (1). In this section, we give an online algorithm that finds an independent set of size at least $(1-\epsilon)\alpha_{\rm COMP}$ for arbitrary $\epsilon>0$ with high probability, where $\alpha_{\rm COMP}=(1-\gamma)\alpha_{\rm STAT}$ is as defined in (2). This gives a lower bound on the computational threshold.

Our algorithm will proceed in two stages:

In Stage One, we greedily find an independent set $I$ satisfying $\max\left\{|I\cap L|,\,|I\cap R|\right\}=(1-\epsilon)\log_{b}n$ . 2. 2.

At this point, by relabeling the partitions if necessary, we may assume $|I\cap L|=(1-\epsilon)\log_{b}n$ . During Stage Two, we only add vertices in $R$ to the independent set, stopping once $|I\cap R|=\frac{(1-\gamma)}{\gamma}(1-\epsilon)\log_{b}n$ .

Before we formally describe the algorithms for each stage, we make the following definition: for a given timestep $t$ and vertex $v$ , denote by $N_{t}(v)$ the set of all its neighbors in $\{v_{1},\ldots,v_{t-1}\}$ . Let us now formally describe our greedy algorithm, which constitutes Stage $1$ of our procedure.

Algorithm 1 Stage One

1:Initialize $I_{0},L_{0},R_{0}=\varnothing$ and $t=1$ .

2:while $\max\left\{|I_{t-1}\cap L|,\,|I_{t-1}\cap R|\right\}<(1-\epsilon)\log_{b}n$ do

3: Sample the random vertex $v_{t}$ .

4: if $N_{t}(v_{t})\cap I_{t-1}=\varnothing$ then

5: Set $I_{t}=I_{t-1}\cup\{v_{t}\}$ .

6: else

7: $I_{t}=I_{t-1}$ .

8: end if

9: if $v_{t}\in L$ then

10: $L_{t}=L_{t-1}\cup\{v_{t}\}$ , $R_{t}=R_{t-1}$ .

11: else

12: $L_{t}=L_{t-1}$ , $R_{t}=R_{t-1}\cup\{v_{t}\}$ .

13: end if

14: Update $t=t+1$ .

15:end while

Let $T_{f}$ be the random variable denoting the number of iterations of the while loop of Algorithm 5, i.e.,

[TABLE]

The key result for Stage One is the following lemma, which shows that $T_{f}$ is sublinear whp irrespective of the vertex arrival order.

Lemma 5.1.

$\mathbb{P}\left[T_{f}>n^{1-\epsilon/2}\right]=\exp\left(-\Omega\left(n^{\epsilon/2}\right)\right)$ .

We defer the proof of Lemma 5.1 to §5.1. Let us now formally describe Stage Two of our online algorithm. Note that at time $T_{f}$ , the contribution from one of the partitions to the constructed independent set $I_{T_{f}}$ is $(1-\epsilon)\log_{b}n$ . By relabeling the partitions if necessary, we may assume this partition is $L$ . For the second stage, we employ the following algorithm.

Algorithm 2 Stage Two

1:for $t=T_{f}+1,\ldots,2n$ do

2: Sample the random vertex $v_{t}$ .

3: if $v_{t}\in L$ then

4: $L_{t}=L_{t-1}\cup\{v_{t}\}$ , $R_{t}=R_{t-1}$ .

5: else

6: $L_{t}=L_{t-1}$ , $R_{t}=R_{t-1}\cup\{v_{t}\}$ .

7: if $|I_{t-1}\cap R_{t}|<\frac{(1-\gamma)}{\gamma}\,(1-\epsilon)\log_{b}n$ and $N_{t}(v_{t})\cap I_{t-1}=\varnothing$ then

8: Set $I_{t}=I_{t-1}\cup\{v_{t}\}$ .

9: else

10: $I_{t}=I_{t-1}$ .

11: end if

12: end if

13:end for

Namely, if $v_{t}\in L$ , we do not add it to the independent set. If $v_{t}\in R$ , we add it to the independent set only if (i) $|I_{t-1}\cap R_{t}|$ is less than $(1-\gamma)(1-\epsilon)\alpha_{\rm COMP}$ , and (ii) $v_{t}$ has no neighbors in $I_{t-1}$ . This truncation ensures the $\gamma$ -balancedness constraint.

The key result for Stage Two is the following lemma.

Lemma 5.2.

$\mathbb{P}\left[|I_{2n}\cap R|<\frac{(1-\gamma)}{\gamma}(1-\epsilon)\log_{b}n\,\bigl{\lvert}\,T_{f}\leq n^{1-\epsilon/2}\right]=\exp\left(-\Omega\left(n^{\epsilon/2}\right)\right)$ .

We defer the proof of Lemma 5.2 to §5.2. Before we prove these key lemmas, let us complete the proof of our achievability result.

Proof of Theorem 3.4.

Consider running our two-stage algorithm with input $G_{\mathrm{bip}}(n,p)$ . Let $I$ be the output and let $T_{f}$ be the number of iterations of Algorithm 5. Using the inequality $\mathbb{P}[A]\leq\mathbb{P}[B]+\mathbb{P}[A|B^{c}]$ , we have

[TABLE]

Note that conditionally on $\{T_{f}\leq n^{1-\epsilon/2}\}$ , the event $|I|<(1-\epsilon)\alpha_{\rm COMP}$ implies the event that $|I\cap R|<\frac{1-\gamma}{\gamma}(1-\epsilon)\log_{b}n$ , where recall that by relabeling the partitions if necessary, we may assume that $I\cap L$ has size $(1-\epsilon)\log_{b}n$ at time $T_{f}$ . Therefore, combining Lemmas 5.1 and 5.2 completes the proof. ∎

5.1 Stage One: Proof of Lemma 5.1

Recall Algorithm 5. To prove Lemma 5.1, we will analyze an alternate process, which does not necessarily produce an independent set, but is easier to analyze and can be coupled with our procedure.

Note that by definition, Algorithm 5 terminates at time $t=T_{f}$ . From this time point, we continue the algorithm as follows. Let $\bar{I}_{T_{f}}\coloneqq I_{T_{f}}$ . At each time step $t\geq T_{f}+1$ , we consider i.i.d. random variables $Y_{t}\sim\mathrm{Ber}(n^{-1+\epsilon})$ for $t\geq T_{f}+1$ . We update

[TABLE]

As remarked before, $\bar{I}_{t}$ need not be an independent set for any $t\geq T_{f}+1$ .

Observe that, by definition of $T_{f}$ , for any $t\leq T_{f}$ , the probability with which we add $v_{t}$ to $I_{t}$ is

[TABLE]

where we recall that $b=1/(1-p)$ . Consider an i.i.d. sequence of random variables $Z_{t}\sim\mathrm{Ber}(n^{-1+\epsilon})$ for $t\geq 1$ . The observation in (13) leads us to consider the following simple process:

•

Initialize $\tilde{I}_{0}=\varnothing$ .

•

At each time step $t\geq 1$ , update as

[TABLE]

Consider the processes $(\bar{S}_{t})_{t\geq 1}$ and $(\tilde{S}_{t})_{t\geq 1}$ , where for any $t\geq 1$ , we respectively have $\bar{S}_{t}=|\bar{I}_{t}|$ and $\tilde{S}_{t}=|\tilde{I}_{t}|$ . As a consequence of (13), we note that

[TABLE]

where $\mathcal{F}_{t}$ (resp. $\mathcal{F}^{*}_{t}$ ) is the sigma algebra generated by all the information up to (and including) timestep $t$ for the process $(\bar{S}_{t})_{t\geq 1}$ (resp. $(\tilde{S}_{t})_{t\geq 1}$ ).

Furthermore, as all edges are included independently in $G_{\mathrm{bip}}(n,p)$ , we can couple the processes $(\bar{S}_{t})_{t\geq 1}$ and $(\tilde{S}_{t})_{t\geq 1}$ on the same probability space using (14) as follows:

•

Let $\mathcal{Y}_{0}=0$ . For each $t\geq 1$ , given $\mathcal{F}_{t}$ , consider independent random variables $\mathcal{X}_{t}\sim\mbox{\sf Ber}\left(\dfrac{n^{-1+\epsilon}}{1-q_{t}}\right)$ .

•

For any $t\geq 1$ , given $\mathcal{Y}_{t}$ , define $\mathcal{Y}_{t+1}$ as follows.

[TABLE]

Observe that for any $t\geq 1$ , by construction $\mathcal{Y}_{t}\leq\bar{S}_{t}$ , and furthermore, the variables $\mathcal{X}_{t}$ make sure that marginally, $\mathcal{Y}_{t}\stackrel{{\scriptstyle d}}{{=}}\tilde{S}_{t}$ .

Next, we note that if we define the analogous stopping time

[TABLE]

for the set process $(\bar{I}_{t})_{t\geq 1}$ , then in fact $\bar{T}_{f}=T_{f}$ , as for any $t\leq T_{f}$ , the construction of the processes $(I_{t})_{t\geq 1}$ and $(\bar{I}_{t})_{t\geq 1}$ are the same. In particular, to conclude the lemma, it is enough to show that

[TABLE]

Since the event $\{|\bar{I}_{t}|\geq 2(1-\epsilon)\log_{b}n\}$ implies that either $|\bar{I}_{t}\cap L|\geq(1-\epsilon)\log_{b}n$ or $|\bar{I}_{t}\cap R|\geq(1-\epsilon)\log_{b}n$ , we have

[TABLE]

where we introduce

[TABLE]

Since $\mathcal{Y}_{t}\leq\bar{S}_{t}$ and $\mathcal{Y}_{t}\stackrel{{\scriptstyle d}}{{=}}\tilde{S}_{t}$ , the RHS in (15) is at most

[TABLE]

where analogously

[TABLE]

Recall the process defining $\tilde{S}_{t}=|\tilde{I}_{t}|$ . Observe that irrespective of whether $v_{t}\in L$ or $v_{t}\in R$ , we always have

[TABLE]

where we recall $(Z_{t})_{t\geq 1}$ is an i.i.d. sequence with distribution $\mbox{\sf Ber}(n^{-1+\epsilon})$ . As a consequence, we can bound

[TABLE]

Note that $Z_{1}+\dots+Z_{n^{1-\epsilon/2}}\sim\mbox{\sf Bin}(n^{1-\epsilon/2},n^{-1+\epsilon})$ has mean $n^{\epsilon/2}$ . Furthermore, we have

[TABLE]

for $n$ sufficiently large. A standard Chernoff bound (see, e.g., [AS16]) yields

[TABLE]

completing the proof.

5.2 Stage Two: Proof of Lemma 5.2

Let us now analyze Algorithm 5 conditioning on the event $T_{f}\leq n^{1-\epsilon/2}$ . For brevity, we avoid stating this conditioning at every step of the proof. Once again, we will consider an alternate process $\bar{I}$ . Note that during a single iteration of the for loop of Algorithm 5, we do nothing if $v_{t}\in L$ or $|I_{t-1}\cap R|=\frac{(1-\gamma)}{\gamma}(1-\epsilon)\log_{b}n$ . Suppose the latter condition first occurs at timestep $T_{b}$ . From this time point, we continue the algorithm as follows. Let $\bar{I}_{T_{b}-1}\coloneqq I_{T_{b}-1}$ . At each time step $t\geq T_{b}$ , we do the following: if $v_{t}\in L$ , we do nothing; if $v_{t}\in R$ , we add $v_{t}$ to $\bar{I}_{t-1}$ if $N_{t}(v_{t})\cap\bar{I}_{t-1}=\varnothing$ .

Note that as we do nothing for vertices in $L$ and since $|I_{T_{f}}\cap L|=(1-\epsilon)\log_{b}n$ , we have the following for each vertex $v\in R_{f}$ , where $R_{f}\coloneqq R\setminus R_{T_{f}}$ :

[TABLE]

In particular,

[TABLE]

where we use the assumption that $T_{f}\leq n^{1-\epsilon/2}$ . By a simple Chernoff bound (see, e.g., [AS16]), we have

[TABLE]

By the definition of $\bar{I}_{t}$ , we have

[TABLE]

From here, we may conclude that

[TABLE]

where we use the fact that $n^{\epsilon/2}\gg\frac{(1-\gamma)}{\gamma}(1-\epsilon)\log_{b}n$ as $n\to\infty$ .

6 Impossibility Result: Proof of Theorem 3.5

In this section we prove that the class of online algorithms falling under Definition 3.2 fails to produce an independent set of size at least $(1+\epsilon)\alpha_{\rm COMP}$ , for any $\epsilon>0$ , with suitably high probability. This shows that the factor- $1/(1-\gamma)$ statistical-computational gap cannot be bridged by online algorithms. To achieve this result, we study correlated random graph families. Our proof goes by contradiction. Roughly speaking, under the assumption that an online algorithm $\mathcal{A}$ indeed finds an independent set of size at least $(1+\epsilon)\alpha_{\rm COMP}$ , we analyze $\mathcal{A}$ over multiple correlated random graph families to exhibit the existence of a particular structure. The contradiction is then derived by arguing that the probability of such a structure existing must satisfy certain inequalities, which we prove fail to hold.

Note that the online algorithm we consider in Section 5 is randomized through the arrival of the vertices $(v_{t})_{t\geq 1}$ . We first argue that in Theorem 3.5, it is enough to consider algorithms that are deterministic instead. To do so, let us formally denote an algorithm $\mathcal{A}$ as

[TABLE]

where $\Omega$ is some abstract probability space capturing the randomness coming from the algorithm $\mathcal{A}$ , $\{0,1\}^{n^{2}}$ is the vector indicating which edges are present in the random graph, and the output vector indicates which vertices are present in the constructed independent set by the algorithm $\mathcal{A}$ . With a slight abuse of notation, for a bipartite graph $G$ on vertex bipartitions $L$ and $R$ , we let $G$ denote both the graph itself as well as the vector $E\in\{0,1\}^{n^{2}}$ indicating which edges are present.

Lemma 6.1 (Reduction to deterministic algorithms).

There exists $\omega^{*}\in\Omega$ such that the deterministic algorithm $\mathcal{A}:\{0,1\}^{n^{2}}\to\{0,1\}$ defined as $\mathcal{A}(\cdot)=\mathcal{A}(\cdot,\omega^{*})$ satisfies

[TABLE]

Proof.

Since

[TABLE]

and since by definition

[TABLE]

we note that the claimed inequality must hold for some $\omega^{*}\in\Omega$ . ∎

Recalling Definition 3.3, we quickly note that as a consequence of Lemma 6.1, to prove Theorem 3.5, it is enough to show that no deterministic online algorithm $(k,\delta)$ -optimizes the balanced independent set problem in $G_{\mathrm{bip}}(n,p)$ for the specified values of $k$ and $\delta$ . Therefore, for the rest of this section, we only consider deterministic online algorithms.

6.1 Correlated Random Graph Families

Consider a random graph $G\sim G_{\mathrm{bip}}(n,p)$ . We will use $G$ as a base graph to define our correlated graph families.

Consider any online algorithm $\mathcal{A}$ , and run it on the base graph $G$ . Let $E_{\mathcal{A}}(T)$ be the set of all the edges of $G$ queried by the algorithm $\mathcal{A}$ up-to and including timestep $T$ for $1\leq T\leq 2n$ , and let $V_{\mathcal{A}}(T)$ be the set of vertices exposed by $\mathcal{A}$ in the first $T$ steps of the algorithm.

For any $m\geq 1$ , we define a sequence of random graphs

[TABLE]

as follows:

•

The graph $G_{1}^{(T)}$ is a copy of $\mathbb{G}$ .

•

For any $2\leq i\leq m$ and any $e\in E_{\mathcal{A}}(T)$ , the status of the edge $e$ in $G^{(T)}_{i}$ is exactly the same as it is in $G$ , i.e., it is present in all the graphs $\{G^{(T)}_{i}\,:\,2\leq i\leq m\}$ if and only if it is present in $G$ , and absent in all the graphs $\{G^{(T)}_{i}\,:\,2\leq i\leq m\}$ otherwise.

•

For any $2\leq i\leq m$ and for any edge $e\not\subseteq V_{\mathcal{A}}(T)$ , the status of the edge $e$ is independently decided for each graph $G^{(T)}_{i}$ . In other words, for any $e\not\subseteq V_{\mathcal{A}}(T)$ ,

[TABLE]

where the collection $\{\chi^{(T,i)}_{e}\,:\,e\not\subseteq V_{\mathcal{A}}(T),\,1\leq T\leq 2n,\,2\leq i\leq m\}$ is a collection of i.i.d. $\mbox{\sf Ber}(p)$ random variables.

We immediately observe the following:

Remark 6.2.

**

•

If we consider the entire graph array $(G^{(T)}_{i})_{1\leq T\leq 2n,1\leq i\leq m}$ , then each entry of both the first column $(G^{(T)}_{1})_{1\leq T\leq 2n}$ and the last row $(G^{(2n)}_{i})_{1\leq i\leq m}$ is the same as $G$ .

•

For any $1\leq T\leq 2n$ and $1\leq i\leq m$ , marginally the distribution of $G^{(T)}_{i}$ is the same as $G$ .

•

Since $\mathcal{A}$ is deterministic, by construction, the behavior of the first $T$ steps of the algorithm $\mathcal{A}$ is exactly the same on all the graphs $G^{(T)}_{1},\ldots,G^{(T)}_{m}$ .

6.2 Forbidden tuples of independent sets

Consider an online algorithm $\mathcal{A}$ . We wish to bound the probability that $\mathcal{A}$ produces an independent set of size at least $(1+\epsilon)\alpha_{\rm COMP}$ . Fix $\mu=\epsilon^{2}/2$ and define the stopping time

[TABLE]

We analyze the algorithm $\mathcal{A}$ on the graphs $G^{(\tau)}_{1},\ldots,G^{(\tau)}_{m}$ . Define $\zeta=L$ (resp. $\zeta^{\prime}=R$ ) if $|I_{\tau}\cap L|>|I_{\tau}\cap R|$ and $R$ otherwise (resp. $\zeta^{\prime}=L$ otherwise). Note that $\zeta\in\{L,R\}$ is random and depends on the edges as revealed by the algorithm in the first $\tau$ steps. We define the successful event

[TABLE]

where $\mathcal{E}_{i,T}\coloneqq\left\{|\mathcal{A}(G_{i}^{T})|\geq(1+\epsilon)\alpha_{\rm COMP}\right\}$ . The key results en route to the proof of Theorem 3.5 provide lower and upper bounds on $\mathbb{P}[\mathcal{S}]$ . We state these results here and prove them in Sections 6.2.1 and 6.2.2, respectively.

Proposition 6.3.

Let $\mathcal{E}$ denote the event that $|\mathcal{A}(G)|\geq(1+\epsilon)\alpha_{\rm COMP}$ . Then $\mathbb{P}[\mathcal{S}]\geq\mathbb{P}[\mathcal{E}]^{m}$ .

Proposition 6.4.

Let $m=C\epsilon^{-2}$ for some sufficiently large constant $C>0$ . Then,

[TABLE]

Let us now combine the above propositions to prove our impossibility result.

Proof of Theorem 3.5.

Suppose there exists an online algorithm $\mathcal{A}$ that $((1+\epsilon)\alpha_{\rm COMP},\,\delta)$ -optimizes the $\gamma$ -balanced independent set problem in $G_{\mathrm{bip}}(n,p)$ . Fix $m=C\epsilon^{-2}$ for $C$ sufficiently large. By Propositions 6.3 and 6.4, we must have

[TABLE]

as desired. ∎

6.2.1 Lower Bound: Proof of Proposition 6.3

In this section, we will prove Proposition 6.3 by a careful application of Jensen’s inequality and conditional independence. Recall the definition of $E_{\mathcal{A}}(T)$ . We have

[TABLE]

where we use the fact that $\{\tau=T\}$ is determined by $E_{\mathcal{A}}(T)$ as $\mathcal{A}$ is deterministic. Note that by the observations made in Remark 6.2, the algorithm $\mathcal{A}$ is identical for the first $T$ steps on the graphs $G_{1}^{(T)},\ldots,G_{m}^{(T)}$ . With this in hand, we have

[TABLE]

where the last step follows since the random variables $\chi_{e}^{T,i}$ are i.i.d. Plugging this in above, we have

[TABLE]

Observing that $\sum_{T}{\mathbb{I}\left\{\tau=T\right\}}=1$ and ${\mathbb{I}\left\{\tau=i\right\}}{\mathbb{I}\left\{\tau=j\right\}}=0$ for $i\neq j$ , we can further simplify the above to

[TABLE]

where we use Jensen’s inequality in the final step. Once again, since $\sum_{T}{\mathbb{I}\left\{\tau=T\right\}}=1$ , we have

[TABLE]

Plugging this into (18) completes the proof.

6.2.2 Upper Bound: Proof of Proposition 6.4

In this section, we will prove Proposition 6.4. Recall the the definitions of the successful event $\mathcal{S}$ from (17), the array $(G_{i}^{(T)})_{1\leq T\leq 2n,1\leq i\leq m}$ from Section 6.1, the stopping time $\tau$ from (16), and the random parts $\zeta,\zeta^{\prime}$ of the vertex bipartition. Note that on the event $\mathcal{S}$ , for any $1\leq i\leq m$ , $|I_{i}|\geq(1+\epsilon)\alpha_{\rm COMP}$ , where $I_{i}=\mathcal{A}(G^{(\tau)}_{i})$ . Recall also that for any $1\leq T\leq 2n$ , $E_{\mathcal{A}}(T)$ is the set of edges queried by the algorithm $\mathcal{A}$ in the first $T$ steps, and $V_{\mathcal{A}}(T)$ denotes the set of vertices exposed in the first $T$ steps.

Observation 6.5.

We note that on the successful event $\mathcal{S}$ , the following statements hold:

•

By the construction of $(G^{(T)}_{i})_{1\leq T\leq 2n,1\leq i\leq m}$ , the sets $I_{i}\cap V_{\mathcal{A}}(\tau)$ are identical for $1\leq i\leq m$ .

•

$I_{i}\cap V_{\mathcal{A}}(\tau)\cap\zeta$ * has size $(1-\mu)\log_{b}n$ for all $1\leq i\leq m$ .*

•

$I_{i}\cap V_{\mathcal{A}}(\tau)\cap\zeta^{\prime}$ * has the same size for all $1\leq i\leq m$ .*

It is convenient at this point to introduce the following set:

[TABLE]

Next, let us define certain sets which will be convenient for us to decompose the successful event $\mathcal{S}$ . Recall $\alpha_{\rm COMP}=\frac{\log_{b}n}{\gamma}$ and $\mu=\epsilon^{2}/2$ . Additionally, for any set $X$ , we denote its power set by $\mathcal{P}(X)$ .

Definition 6.6 (Forbidden $m$ -tuples).

For any $\epsilon>0$ , $m\geq 1$ , $\mathbf{a}=(a_{1},\ldots,a_{m})\in A$ , $\eta\in\{L,R\}$ , and $1\leq T\leq 2n$ , we define the set $\chi^{(\eta)}(m,T)\subset\mathcal{P}(L\cup R)^{m}$ to be the set of all $m$ -tuples $(I_{1},\ldots,I_{m})$ , where each $I_{i}\subseteq L\cup R$ has the following properties:

•

For each $1\leq i\leq m$ , $I_{i}$ is a $\gamma$ -balanced independent set in $G_{i}^{(T)}$ .

•

$|I_{i}|=a_{i}$ * for each $1\leq i\leq m$ .*

•

The sets $I_{i}\cap V_{\mathcal{A}}(T)$ are identical for $1\leq i\leq m$ .

•

$|I_{i}\cap V_{\mathcal{A}}(T)\cap\eta|=(1-\mu)\log_{b}n$ * and $|(I_{i}\cap V_{\mathcal{A}}(T))\setminus\eta|<(1-\mu)\log_{b}n$ for all $1\leq i\leq m$ .*

Let us further define $X^{\eta}_{m,T}(\mathbf{a})$ to be the size of the set $\chi^{\eta}_{m,T}(\mathbf{a})$ .

Recalling Observation 6.5, and keeping Definition 6.6 in mind, we conclude that

[TABLE]

Thus to conclude Proposition 6.4, using (19) and a union bound, it is enough to obtain an upper bound on the sum

[TABLE]

To this end, we introduce a new random variable. For a fixed $\beta\in\mathbb{N}$ , $\mathbf{a}\in A$ and $\eta\in\{L,R\}$ , let $\chi^{(\eta)}_{m,T}(\mathbf{a},\beta)$ be the set of $m$ -tuples $(I_{1},\ldots,I_{m})$ , where for each $1\leq i\leq m$ , $I_{i}$ is a $\gamma$ -balanced independent set in $G^{(T)}_{i}$ with the following properties:

•

$|I_{i}|=a_{i}$ for each $1\leq i\leq m$ .

•

The sets $I_{i}\cap V_{\mathcal{A}}(T)$ are identical for $1\leq i\leq m$ .

•

$|I_{i}\cap V_{\mathcal{A}}(T)\cap\eta|=(1-\mu)\log_{b}n$ and $|(I_{i}\cap V_{\mathcal{A}}(T))\setminus\eta|=\beta$ for all $1\leq i\leq m$ .

We further upper bound (20) by

[TABLE]

where $X^{(\eta)}_{m,T}(\mathbf{a},\beta)=|\chi^{(\eta)}_{m,T}(\mathbf{a},\beta)|$ . The following key lemma provides a bound on $\mathbb{P}[X^{(\eta)}_{m,T}(\mathbf{a},\beta)\geq 1]$ .

Lemma 6.7.

For any fixed $\mathbf{a}\in A$ , $0\leq\beta\leq(1-\mu)\log_{b}n$ , $\eta\in\{L,R\}$ , and $1\leq T\leq 2n$ , we have

[TABLE]

Before we prove the above lemma, we note that it implies our desired result. Indeed, we have

[TABLE]

It follows that (20) is at most $\exp\left(-\Omega(\log^{2}_{b}n)\right)$ . Recalling (19), this completes the proof of Proposition 6.4.

Proof of Lemma 6.7.

Consider an arbitrary collection of $\gamma$ -balanced sets $(I_{1},\ldots,I_{m})\in\chi^{(\eta)}_{m,T}(\mathbf{a},\beta)$ . Recall that by definition of $\chi^{(\eta)}_{m,T}(\mathbf{a},\beta)$ , we have $I_{i}\cap V_{\mathcal{A}}(T)$ is identical for each $i$ . With this in mind, we define the following sets:

[TABLE]

and for each $I\in\mathcal{I}$ and $1\leq i\leq m$

[TABLE]

With these definitions in hand and recalling Remark 6.2 and Observation 6.5, we have

[TABLE]

where

[TABLE]

Consider a fixed $I_{i}\in\mathcal{I}_{i}(I)$ . Let $\theta_{i}=\gamma\,a_{i}$ and $\theta_{i}^{\prime}=(1-\gamma)a_{i}$ . We have two cases to consider depending on which side $\zeta\in\{L,R\}$ contains $\gamma$ proportion of the balanced independent set $I\cup I_{i}$ .

Case 1:

$\zeta=\eta$ . Then, we have

[TABLE] 2. Case 2:

$\zeta\neq\eta$ . Then, we have

[TABLE]

Combining both cases, by a simple counting argument we have that $\sum_{I_{i}\in\mathcal{I}_{i}(I)}p(I,I_{i})$ is at most

[TABLE]

Plugging the above into (21), we obtain

[TABLE]

Recall that $b=1/(1-p)$ . The above can be simplified to

[TABLE]

Note that since $\gamma\leq 1/2$ , we have

[TABLE]

With this in hand, we may simplify further to obtain

[TABLE]

where we plug in $\mu=\epsilon^{2}/2$ . Recall that $m=C\epsilon^{-2}$ . In particular, for $C$ sufficiently large, the above is $\exp\left(-\Omega(\log_{b}^{2}n)\right)$ , as desired. ∎

7 Future Queries: Proof of Theorem 3.8

Fix an arbitrary constant

[TABLE]

where $\epsilon<\frac{\gamma}{1-\gamma}$ . Indeed, otherwise there exists no $\gamma$ -balanced independent set of size $(1+\epsilon)\alpha_{\rm COMP}$ per Theorem 3.1. Note that as $(1+\epsilon)(1-\gamma)\geq 1-\gamma$ and $\gamma\leq\frac{1}{2}$ , we have

[TABLE]

In what follows, we prove the existence of such an online algorithm with

[TABLE]

As noted, our algorithm proceeds in three phases.

Phase I: Greedy Rounds

In phase I, we find a $\gamma$ -balanced independent set of size $c^{\prime}\log_{b}n/\gamma$ for $c^{\prime}$ arising in (22). Proceeding analogously to the proof of Theorem 3.4, we first run Algorithm 5 until time $T_{1}$ where

[TABLE]

Plugging $1-c^{\prime}$ in place of $\epsilon$ in Lemma 5.1, we deduce

[TABLE]

The bound (23) ensures that (24) is not vacuous. Note that at time $T_{1}$ , one of the partitions to the independent set $I_{T_{1}}$ is of size $c^{\prime}\log_{b}n$ . By relabeling the partitions if necessary, we again assume this partition is $L$ . For the second stage, we employ the following modification of Algorithm 5. Set $T_{2}=n^{c^{\prime}+\delta}$ where $0<\delta<1-c^{\prime}$ is arbitrary.

Algorithm 3 Phase I: Part II

1:for $t=T_{1}+1,\ldots,T_{1}+T_{2}$ do

2: Sample a random vertex $v_{t}\in R\setminus R_{t-1}$ .

3: $L_{t}=L_{t-1}$ , $R_{t}=R_{t-1}\cup\{v_{t}\}$ .

4: if $|I_{t-1}\cap R|<c^{\prime}\log_{b}n$ and $N_{t}(v_{t})\cap I_{t-1}=\varnothing$ then

5: Set $I_{t}=I_{t-1}\cup\{v_{t}\}$ .

6: else

7: $I_{t}=I_{t-1}$ .

8: end if

9:end for

Lemma 7.1.

$\displaystyle\mathbb{P}\left[\left|I_{T_{1}+T_{2}}\cap R\right|=c^{\prime}\log_{b}n\right]=1-\exp\left(-\Omega(n^{\delta})\right)$ .

Proof.

For $\{v_{T_{1}+1},\dots,v_{T_{1}+T_{2}}\}\subset R$ , let $N$ denote the number of $i$ such that $T_{1}+1\leq i\leq T_{1}+T_{2}$ for which there are no edges between $v_{i}$ and $I_{T_{1}}$ . We have

[TABLE]

as $I_{i}$ are iid Bernouli random variables with parameter $n^{-c^{\prime}}$ . Clearly, it suffices to show that

[TABLE]

as we only add vertices as long as $|I_{t-1}\cap R|<c^{\prime}\log_{b}n$ . As $\mathbb{E}[N]=n^{\delta}$ , we have

[TABLE]

by a standard Chernoff bound (see, e.g., [AS16]), completing the proof. ∎

Combining (24) and Lemma 7.1 through a union bound, we have the following with probability $1-\exp\left(-\Omega(n^{\min\{\delta,(1-c^{\prime})/2\}})\right)$ for $T=T_{1}+T_{2}$ : (i) $T=o(n)$ , (ii) $I_{T}=I^{(L)}_{T}\cup I^{(R)}_{T}$ is a $\gamma$ -balanced independent set of size $c^{\prime}\log_{b}n/\gamma$ where

[TABLE]

Namely, a $\gamma$ -balanced independent set of size $c^{\prime}\log_{b}n/\gamma$ is found in $o(n)$ time with high probability. This concludes Phase I of our algorithm. Note that for $t\leq T=T_{1}+T_{2}$ , $S_{t}=\varnothing$ —i.e., no future queries have been made.

In what follows, we condition on the outcome of the greedy rounds.

Phase II: Exploration Phase

We condition on the outcome of Phase I and fix a $\theta\in(0,1)$ with

[TABLE]

Observe that $\theta$ is well defined if

[TABLE]

which is equivalent to (22). At round $T+1$ , we do the following:

•

Sample a random vertex $v_{T+1}\in[2n]\setminus(L_{T}\cup R_{T})$ . If $v_{T+1}\in R$ , update $R_{T+1}=R_{T}\cup\{v_{T+1}\}$ and $L_{T+1}=L_{T}$ ; otherwise update $R_{T+1}=R_{T}$ and $L_{T+1}=L_{T}\cup\{v_{T+1}\}$ .

•

Query $S_{T+1}$ involving vertices $\{i,j\}$ with $i\in I^{(R)}_{T}$ and $j\in L\setminus L_{T}$ and reveal the edge status of all pairs $(i,j)$ .

•

Using the edges in the step above, identify a $\bar{W}_{L}\subset L\setminus L_{T}$ of size $|\bar{W}_{L}|=n^{\theta}$ for which $E[\bar{W}_{L},I_{T}^{(R)}]=\varnothing$ . Lemma 7.2 below justifies the existence of $\bar{W}_{L}$ whp.

At the end of round $T+1$ , we set $I_{T+1}\coloneqq I_{T}$ . Similarly, at round $T+2$ , we do the following:

•

Sample a random vertex $v_{T+2}\in[2n]\setminus(L_{T+1}\cup R_{T+1})$ . If $v_{T+2}\in R$ , update $R_{T+2}=R_{T+1}\cup\{v_{T+2}\}$ and $L_{T+2}=L_{T+1}$ ; otherwise update $R_{T+2}=R_{T+1}$ and $L_{T+2}=L_{T+1}\cup\{v_{T+2}\}$ .

•

Query $S_{T+2}$ involving vertices $\{i,j\}$ with $i\in I^{(L)}_{T}$ and $j\in R\setminus R_{T+1}$ and reveal the edge status of all pairs $(i,j)$ .

•

Using the edges in the step above, identify a $\bar{W}_{R}\subset R\setminus R_{T+1}$ of size $|\bar{W}_{R}|=n^{\theta}$ for which $E[I_{T}^{(L)},\bar{W}_{R}]=\varnothing$ . Lemma 7.2 justifies the existence of $\bar{W}_{R}$ whp.

We then set $I_{T+2}\coloneqq I_{T+1}$ .

Lemma 7.2.

For $\theta$ satisfying (25), such sets $\bar{W}_{L}$ and $\bar{W}_{R}$ indeed exist with probability at least $1-\exp\left(-n^{\Theta(1)}\right)$ .

Proof.

Let

[TABLE]

Note that for $v\in L\setminus L_{T}$

[TABLE]

Since we deterministically have $|L_{T}|,|R_{T}|\leq T$ , and $T=o(n)$ by conditioning, $N_{L}$ stochastically dominates ${\rm Bin}(n-T,n^{-c^{\prime}(1-\gamma)/\gamma})$ , which has mean $n^{1-c^{\prime}(1-\gamma)/\gamma}(1+o(1))$ (see, e.g., [Roc24, Chapter 4]). Proceeding identically to the proof of Lemma 7.1,

[TABLE]

as $\theta$ satisfies (25). Similarly, define

[TABLE]

Clearly, for $v\in R\setminus R_{T+1}$

[TABLE]

As $|R\setminus R_{T+1}|=|R|-|R_{T+1}|\geq n-T-1$ , and therefore $N_{R}$ stochastically dominates ${\rm Bin}(n-T-1,n^{-c^{\prime}})$ which has mean $n^{1-c^{\prime}}(1+o(1))$ . Proceeding identically as above, we obtain that

[TABLE]

Noting that such $\bar{W}_{L}$ and $\bar{W}_{R}$ with sizes $n^{\theta}$ exist if and only if $N_{L},N_{R}\geq n^{\theta}$ , we conclude the proof by combining (26) and (27) through a union bound. ∎

Phase III: Brute-Force Step

At round $t=T+3$ , we:

•

Sample a vertex $v_{T+3}\in[2n]\setminus(L_{T+2}\cup R_{T+2})$ . If $v_{T+3}\in R$ , update $R_{T+3}=R_{T+2}\cup\{v_{T+3}\}$ and $L_{T+3}=L_{T+2}$ ; otherwise update $R_{T+3}=R_{T+2}$ and $L_{T+3}=L_{T+2}\cup\{v_{T+3}\}$ .

•

(Pre-Processing) Set $W_{L}\coloneqq\bar{W}_{L}\setminus\{v_{T+1},v_{T+2},v_{T+3}\}$ and $W_{R}\coloneqq\bar{W}_{R}\setminus\{v_{T+1},v_{T+2},v_{T+3}\}$ . Note that $|W_{L}-W_{R}|\leq 3$ . Removing at most three vertices from the larger set if necessary, we may assume that $|W_{L}|=|W_{R}|=n^{\theta}+O(1)$ .

•

(Brute-Force Search) Query $S_{T+3}$ involving vertices $\{i,j\}$ where $i\in W_{L}$ and $j\in W_{R}$ and reveal the edge status of all pairs $(i,j)$ . Using this information, identify a $\gamma$ -balanced independent set $\mathcal{J}=J_{L}\cup J_{R}$ in $G[W_{L}\cup W_{R}]$ $G_{\mathrm{bip}}(n^{\theta}+O(1),p)$ ) of size $(1+\epsilon-c^{\prime})\log_{b}n/\gamma$ , where

[TABLE]

Lemma 7.3 justifies the existence of $\mathcal{J}$ whp. We note that this step has quasi-polynomial running time $n^{O(\log n)}$ as mentioned in Section 3.3.

Set $I_{T+3}\coloneqq I_{T}$ .

Lemma 7.3.

The bipartite random graph $G[W_{L}\cup W_{R}]\sim G_{\mathrm{bip}}(n^{\theta}+O(1),p)$ with vertex set $W_{L}\cup W_{R}$ contains a $\gamma$ -balanced independent set of size $(1+\epsilon-c^{\prime})\log_{b}n/\gamma$ .

Proof.

Applying Theorem 3.1 to $G_{\mathrm{bip}}(n^{\theta}+O(1),p)$ , it suffices to verify that

[TABLE]

which holds due to (25). ∎

Final Phases

For rounds $T+4\leq t\leq 2n$ :

•

We randomly sample a vertex $v_{t}\in[2n]\setminus(L_{t-1}\cup R_{t-1})$ . If $v_{t}\in R$ , update $R_{t}=R_{t-1}\cup\{v_{t}\}$ and $L_{t}=L_{t-1}$ . Otherwise, set $R_{t}=R_{t-1}$ and $L_{t}=L_{t-1}\cup\{v_{t}\}$ .

•

If $v_{t}\in\mathcal{J}$ , set $I_{t+1}=I_{t}\cup\{v_{t}\}$ . Else, set $I_{t+1}=I_{t}$ .

The algorithm above is online and constructs a $\gamma$ -balanced independent set of size $(1+\epsilon)\alpha_{\rm COMP}$ . We finally verify the constraint (3) on the number of queries arising in Definition 3.7.

Number of Future Queries

Note that $S_{t}=\varnothing$ except $t\in\{T+1,T+2,T+3\}$ . We have:

[TABLE]

This completes the proof of Theorem 3.8. range

pages10

rangepages10

rangepages-1

rangepages9

rangepages6

rangepages1

rangepages32

rangepages14

rangepages9

rangepages56

rangepages15

rangepages10

rangepages30

rangepages5

rangepages10

rangepages26

rangepages10

rangepages33

rangepages67

rangepages12

rangepages31

rangepages51

rangepages1

rangepages8

rangepages24

rangepages30

rangepages12

rangepages38

rangepages28

rangepages169

rangepages13

rangepages10

rangepages11

rangepages14

rangepages1

rangepages22

rangepages35

rangepages27

rangepages36

rangepages31

rangepages-1

Bibliography74

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[AC 08] Dimitris Achlioptas and Amin Coja-Oghlan “Algorithmic barriers from phase transitions” In 2008 49th Annual IEEE Symposium on Foundations of Computer Science , 2008, pp. 793–802 IEEE
2[AR 06] Dimitris Achlioptas and Federico Ricci-Tersenghi “On the solution-space geometry of random constraint satisfaction problems” In Proceedings of the thirty-eighth annual ACM symposium on Theory of computing , 2006, pp. 130–139
3[Ajt 96] Miklós Ajtai “Generating hard instances of lattice problems” In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing , 1996, pp. 99–108
4[AS 16] Noga Alon and Joel H Spencer “The probabilistic method” John Wiley & Sons, 2016
5[BPW 18] Afonso S Bandeira, Amelia Perry and Alexander S Wein “Notes on computational-to-statistical gaps: predictions using statistical physics” In ar Xiv preprint ar Xiv:1803.11132 , 2018
6[BGT 10] Mohsen Bayati, David Gamarnik and Prasad Tetali “Combinatorial approach to the interpolation method and scaling limits in sparse random graphs” In Proceedings of the forty-second ACM symposium on Theory of computing , 2010, pp. 105–114
7[BGG 25] Shankar Bhamidi, David Gamarnik and Shuyang Gong “Finding a dense submatrix of a random matrix. Sharp bounds for online algorithms” In ar Xiv preprint ar Xiv:2507.19259 , 2025
8[BBB 21] Enric Boix-Adserà, Matthew Brennan and Guy Bresler “The Average-Case Complexity of Counting Cliques in Erdös-Rényi Hypergraphs” In SIAM Journal on Computing SIAM, 2021, pp. FOCS 19–39

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Sharp Online Hardness for Large Balanced Independent Sets

Abstract

Contents

1 Introduction

Dense Random Graphs

Conjecture 1.1**.**

1.1 Random Bipartite Graphs

Definition 1.2**.**

Dense Bipartite Graphs

1.2 Summary of Main Results

Theorem 1.3** (Informal, see Theorem 3.1).**

Theorem 1.4** (Informal, see Theorem 3.4).**

Theorem 1.5** (Informal, see Theorem 3.5).**

Conjecture 1.6**.**

Future Queries

Theorem 1.7** (Informal, see Theorem 3.8).**

1.3 Proof Overview

Achievability result

Impossibility result

1.4 Open Problems

One-Stage Algorithm

Hypergraphs

Optimization Problems with Global Constraints

2 Statistical-Computational Gaps and OGP: Prior Work

2.1 Computational Gaps in Random Optimization Problems

OGP for Online Algorithms

3 Main Results

Theorem 3.1**.**

3.1 Algorithmic Setting

Definition 3.2**.**

Definition 3.3**.**

3.2 Algorithmic Results

Theorem 3.4**.**

Theorem 3.5**.**

Remark 3.6**.**

3.3 Surpassing αCOMP\alpha_{\rm COMP}αCOMP​ with Limited Future Queries

Definition 3.7**.**

Theorem 3.8**.**

Number of Future Queries

4 Statistical Threshold: Proof of Theorem 3.1

5 Achievability Result: Proof of Theorem 3.4

Lemma 5.1**.**

Lemma 5.2**.**

Proof of Theorem 3.4.

5.1 Stage One: Proof of Lemma 5.1

5.2 Stage Two: Proof of Lemma 5.2

6 Impossibility Result: Proof of Theorem 3.5

Lemma 6.1** (Reduction to deterministic algorithms).**

Proof.

6.1 Correlated Random Graph Families

Remark 6.2**.**

6.2 Forbidden tuples of independent sets

Proposition 6.3**.**

Proposition 6.4**.**

Proof of Theorem 3.5.

6.2.1 Lower Bound: Proof of Proposition 6.3

6.2.2 Upper Bound: Proof of Proposition 6.4

Observation 6.5**.**

Definition 6.6** (Forbidden mmm-tuples).**

Lemma 6.7**.**

Proof of Lemma 6.7.

7 Future Queries: Proof of Theorem 3.8

Phase I: Greedy Rounds

Lemma 7.1**.**

Proof.

Phase II: Exploration Phase

Lemma 7.2**.**

Proof.

Phase III: Brute-Force Step

Lemma 7.3**.**

Proof.

Final Phases

Number of Future Queries

Conjecture 1.1.

Definition 1.2.

Theorem 1.3 (Informal, see Theorem 3.1).

Theorem 1.4 (Informal, see Theorem 3.4).

Theorem 1.5 (Informal, see Theorem 3.5).

Conjecture 1.6.

Theorem 1.7 (Informal, see Theorem 3.8).

Theorem 3.1.

Definition 3.2.

Definition 3.3.

Theorem 3.4.

Theorem 3.5.

Remark 3.6.

3.3 Surpassing $\alpha_{\rm COMP}$ with Limited Future Queries

Definition 3.7.

Theorem 3.8.

Lemma 5.1.

Lemma 5.2.

Lemma 6.1 (Reduction to deterministic algorithms).

Remark 6.2.

Proposition 6.3.

Proposition 6.4.

Observation 6.5.

Definition 6.6 (Forbidden $m$ -tuples).

Lemma 6.7.

Lemma 7.1.

Lemma 7.2.

Lemma 7.3.