FedGT: Federated Node Classification with Scalable Graph Transformer
Zaixi Zhang, Qingyong Hu, Yang Yu, Weibo Gao, Qi Liu

TL;DR
FedGT introduces a scalable federated graph transformer that effectively captures local and global information, addresses subgraph heterogeneity, and enhances privacy in distributed graph learning.
Contribution
The paper proposes FedGT, a novel federated graph transformer with hybrid attention and dynamic global nodes, improving scalability, heterogeneity handling, and privacy in subgraph federated learning.
Findings
FedGT achieves superior performance on 6 datasets.
The hybrid attention reduces complexity to linear.
Global node alignment improves personalization.
Abstract
Graphs are widely used to model relational data. As graphs are getting larger and larger in real-world scenarios, there is a trend to store and compute subgraphs in multiple local systems. For example, recently proposed \emph{subgraph federated learning} methods train Graph Neural Networks (GNNs) distributively on local subgraphs and aggregate GNN parameters with a central server. However, existing methods have the following limitations: (1) The links between local subgraphs are missing in subgraph federated learning. This could severely damage the performance of GNNs that follow message-passing paradigms to update node/edge features. (2) Most existing methods overlook the subgraph heterogeneity issue, brought by subgraphs being from different parts of the whole graph. To address the aforementioned challenges, we propose a scalable \textbf{Fed}erated \textbf{G}raph \textbf{T}ransformer…
Peer Reviews
Decision·Submitted to ICLR 2024
(1) Leverages graph transformer architecture within subgraph FL for the first time in the federated graph learning literature. (2) The algorithm is compatible with local DP. (3) Experimentally shows that Transformers are useful for subgraph federated learning. (4) Theoretical analysis of global attention being able to capture and approximate information in the whole subgraph is provided.
(1) How Graph Transformer deals with the missing links is unclear. (2) The assumption that nodes are equally distributed to the global nodes seems unrealistic due to graph partitioning. (3) Theorem is not rigorous as it is a known fact that more nodes less error [1] (4) Local LDP does not guarantee privacy for sensitive node features, edges, or neighborhoods on distributed graphs [2,3]. Using LDP does not reflect an actual privacy guarantee for this case. [1] Kim, Hyunjik, George Papamakari
1. The paper is easy to read, and generally well-written. 2. The idea of using the Graph Transformer to address the issue of missing links across clients is well-motivated.
1. How to aggregate global nodes is not clearly illustrated. On page 6, the authors state, “the global nodes are first aligned with optimal transport and then averaged similar to Equation 8”. However, it is unclear which optimal transport method is applied and how the similarity between global nodes from different clients is calculated. The authors should clarify whether the normalized similarity α_ij used for model parameters is also employed for global nodes or if a different similarity calcul
1.The paper is well-written and organized. The details of the models are described clearly and are convincing. 2.The limitations of applying GNNs for subgraph federated learning are clearly illustrated in Figure 1 and Figure 4 in appendix. The motivation for leveraging graph transformers is easy to understand. 3.The authors proposed a series of effective modules to tackle the challenges, including scalable graph transformers, personalized aggregation, and global nodes. The contribution is signif
1.The authors are suggested to clearly discuss the case studies in the main paper. 2.Leveraging local differential privacy mechanisms to protect privacy in FL is not new. 3.Please provide more explanations of the assumptions in Theorem 1.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Privacy-Preserving Technologies in Data
MethodsAttention Is All You Need · Sparse Evolutionary Training · Laplacian EigenMap · Linear Layer · Dropout · Layer Normalization · Multi-Head Attention · Laplacian Positional Encodings · Byte Pair Encoding · Residual Connection
