COMBA: Cross Batch Aggregation for Learning Large Graphs with Context Gating State Space Models

Jiajun Shen; Yufei Jin; Yi He; xingquan Zhu

arXiv:2602.17893·cs.LG·February 23, 2026

COMBA: Cross Batch Aggregation for Learning Large Graphs with Context Gating State Space Models

Jiajun Shen, Yufei Jin, Yi He, xingquan Zhu

PDF

Open Access 4 Reviews

TL;DR

COMBA introduces a novel method combining graph context gating and cross batch aggregation to enable efficient learning on large graphs using state space models, outperforming existing approaches.

Contribution

The paper presents COMBA, a new approach that adapts state space models for large graph learning through innovative context gating and cross batch aggregation techniques.

Findings

01

Significant performance improvements over baseline methods.

02

Theoretical proof of lower error with cross-batch aggregation.

03

Scalability to large graphs demonstrated on benchmark datasets.

Abstract

State space models (SSMs) have recently emerged for modeling long-range dependency in sequence data, with much simplified computational costs than modern alternatives, such as transformers. Advancing SMMs to graph structured data, especially for large graphs, is a significant challenge because SSMs are sequence models and the shear graph volumes make it very expensive to convert graphs as sequences for effective learning. In this paper, we propose COMBA to tackle large graph learning using state space models, with two key innovations: graph context gating and cross batch aggregation. Graph context refers to different hops of neighborhood for each node, and graph context gating allows COMBA to use such context to learn best control of neighbor aggregation. For each graph context, COMBA samples nodes as batches, and train a graph neural network (GNN), with information being aggregated…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 0Confidence 4

Strengths

- scaling state-space models to large-scale graphs is an important questions to study - the proposed method shows competitive performance to previous graph state space models

Weaknesses

- the major weakness of this paper is its presentation. The notation is highly ambiguous, and many key implementation details are missing. I strongly encourage the authors to substantially revise the notation and exposition, as the current version is extremely difficult to follow. For example, - in equation (1), $A$ is the adjancency matrix and $\text{Gen}(G, A)=A^k$ is used to represent the binary matrices that indicate whether two nodes are connected by a path of exactly length $k$. It is

Reviewer 02Rating 2Confidence 4

Strengths

- The paper addresses the important problem of learning on large graphs, which still is an open-challenge in graph presentation learning. - The paper is clear and well written, despite several typos are present in the text. - Figures and pseudocodes help the reader understand the pipeline and the overall architecture

Weaknesses

Technical aspects lack clarity or are insufficiently detailed: - The cost of computing multi-hop adjacencies $A_{b_n}^k$ can be expensive on large graphs. Authors should clarify whether $A_{b_n}^k$ is precomputed at preprocessing time and eventually provide runtimes. - line 213, it is unclear whether the GNN (i.e., its weights $W_{b_n}$) is shared across batches and across hops within a layer. - line 221, the window considers values from $k-w$ to $k+w$, thus its size should be $2w+1$. Authors

Reviewer 03Rating 2Confidence 3

Strengths

1. **Novel Motivation and Problem Formulation:** The paper does an excellent job of identifying a key challenge in applying SSMs to graphs: the fundamental mismatch between the 1D sequential dependency of SSMs and the multi-hop neighborhood dependency of graphs. 2. **Interesting Core Mechanisms:** Both of the paper's core contributions are well-motivated. * "Cross Batch Aggregation" is a practical approach to scaling GNNs, and the theoretical justification (Theorem 1) for its error reduct

Weaknesses

1. **Lacking Efficiency Baselines:** The paper's main motivation is to "tackle large graph learning" and scale SSMs. However, the complexity analysis (Section 5.3, Figure 4) is limited *only* to the proposed COMBA model. There is **zero** comparison against any baseline in terms of wall-clock training time, inference time, or memory usage. * The authors state that baselines like Nagphormer and Graph Mamba-1/2 ran "OOM" (Out of Memory) on Ogbn-product, which supports their memory-efficiency

Reviewer 04Rating 2Confidence 5

Strengths

1. Scaling SSMs to large graphs is a relevant and timely research direction, given the success of SSMs in sequence modeling. 2. Comprehensive Evaluation on six datasets of varying sizes (from ~10K to ~2.4M nodes) with consistent improvements over baselines. 3. The batch-based approach with cross-batch information sharing is intuitive and addresses real scalability concerns.

Weaknesses

1. The motivation for introducing SSMs in graph learning is not clearly articulated. The original Mamba paper proposed SSMs as a means to achieve linear complexity while maintaining performance in sequence modeling. However, this work directly applies SSMs to graphs without explaining the expected benefits compared to Graph Transformers. I suggest the authors explicitly clarify why SSMs are adopted in the graph domain and what advantages (e.g., efficiency, stability, scalability) they bring, ide

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Machine Learning in Healthcare · Explainable Artificial Intelligence (XAI)