Modality-free Graph In-context Alignment
Wei Zhuo, Siqiang Luo

TL;DR
This paper introduces MF-GIA, a modality-free, promptable graph encoder framework that enables few-shot, cross-domain graph reasoning without parameter updates, improving generalization to unseen domains.
Contribution
MF-GIA is a novel framework that aligns heterogeneous graph features without modality assumptions, using gradient fingerprints and prompt-aware attention for few-shot learning.
Findings
Achieves superior few-shot performance across diverse graph domains.
Demonstrates strong generalization to unseen domains.
Operates without modality-specific encoders or parameter updates.
Abstract
In-context learning (ICL) converts static encoders into task-conditioned reasoners, enabling adaptation to new data from just a few examples without updating pretrained parameters. This capability is essential for graph foundation models (GFMs) to approach LLM-level generality. Yet current GFMs struggle with cross-domain alignment, typically relying on modality-specific encoders that fail when graphs are pre-vectorized or raw data is inaccessible. In this paper, we introduce Modality-Free Graph In-context Alignment (MF-GIA), a framework that makes a pretrained graph encoder promptable for few-shot prediction across heterogeneous domains without modality assumptions. MF-GIA captures domain characteristics through gradient fingerprints, which parameterize lightweight transformations that align pre-encoded features and indexed labels into unified semantic spaces. During pretraining, a dual…
Peer Reviews
Decision·ICLR 2026 Oral
- It doesn’t assume raw text or a specific encoder. as long as features are vectors, the domain-conditioned aligner can normalize them, which makes it more practical for real graph platforms. - Adaptation is done by in-context attention + generated aligners, so deployment on new graphs is lightweight and gradient-free. - Using a domain embedding to steer FiLM-style transforms lets the model cope with cross-graph distribution shift (different feature spaces, different label vocab)
- Forcing all graphs to a fixed width via SVD could destroy domain-specific structure and cross-domain comparability before alignment. - A single small-step gradient from a shared init can be noisy/sensitive to loss scaling; the theory assumes smoothness/Lipschitzness and gives an upper bound but not tight guarantees for practical separability. - Pretraining uses only four node-classification datasets; broader modalities/tasks would better justify foundation-level generality. - The episodic
S1: The use of gradient fingerprints to derive domain embeddings and guide feature/label alignment is novel and well-motivated. S2: The model meets all three desired criteria—modality-free, post-training free, and cross-domain alignment—which most prior works fall short of. S3: MF-GIA significantly outperforms competitive baselines on both node and edge classification tasks in few-shot settings, including unseen domains and tasks. S4: The architecture, especially the use of Dual Prompt-Aware
To be honest, I did not find very serious weaknesses in this submission. Below are just some suggestions that may lead to a more solid work. - Suggestion 1:The robustness of gradient fingerprints across noisy or low-quality domains is not deeply examined. - Suggestion 2: While results suggest episodic training is beneficial, more direct comparison with alternative meta-learning strategies would strengthen the claim. - Suggestion 3: While the framework is motivated by practical constraints (e.g
1. The proposed method achieves good performance on the few-shot node classification task across 5 datasets in m-way k-shot settings. 2. This paper provides the theoretical analysis for the proposed feature-alignment. 3. The presentation of this paper is good and the paper is easy to follow.
1. In Theorem 3.1, the authors fail to explicitly define how Wasserstein distance $W_2(\cdot,\cdot)$ is used to quantify the distance between two domains. This omission makes it unclear what assumptions are made about the underlying distributions, and whether the bound holds under general conditions. Furthermore, in Definition 3, the authors do not provide a formal definition of the graph distance metric $d_g(G_i, G_j)$, leaving readers uncertain about what graph properties are used for similari
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning · Machine Learning in Healthcare
