S2Aligner: Pair-Efficient and Transferable Pre-Training for Sparse Text-Attributed Graphs

Yuhan Wang; Haopeng Zhang; Yibo Ding; Jiaqi Yu; Xinyu Zhao; Yuhang Liu; Ziwei Zhang; Xiao Wang; and Ruijie Wang

arXiv:2605.18579·cs.LG·May 21, 2026

S2Aligner: Pair-Efficient and Transferable Pre-Training for Sparse Text-Attributed Graphs

Yuhan Wang, Haopeng Zhang, Yibo Ding, Jiaqi Yu, Xinyu Zhao, Yuhang Liu, Ziwei Zhang, Xiao Wang, and Ruijie Wang

PDF

TL;DR

S2Aligner is a novel pre-training framework for sparse text-attributed graphs that decouples semantic and structural alignment, improving transferability and robustness across domains.

Contribution

It introduces a sparsity-aware, structure-enhanced LLM-based pre-training method that effectively handles sparse graph-text data and reduces transfer bias.

Findings

01

Outperforms existing methods across various graph domains and sparsity levels.

02

Effectively reduces cross-domain generalization gaps.

03

Enhances transferability of graph foundation models.

Abstract

Pre-training on text-attributed graphs (TAGs) is central to building transferable graph foundation models, where LLM-as-Aligner methods align graph and text representations through the semantic knowledge of large language models. However, these methods usually assume that node texts provide sufficient and reliable supervision, an assumption often violated in real-world sparse TAGs. When textual anchors are missing, noisy, or uneven across domains, graph structures must be aligned with weak semantic evidence, leading to unreliable structure-semantics correspondence and sparsity-induced transfer bias. This paper presents S2Aligner, a sparsity-aware and structure-enhanced LLM-as-Aligner framework for graph-text pre-training on sparse TAGs. The key idea is to decouple semantic alignment from structural modeling, allowing topology-aware signals to enhance alignment without contaminating the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.