TL;DR
TRN-R1-Zero introduces a reinforcement learning framework for zero-shot reasoning on text-rich networks, effectively integrating textual semantics with relational structure without task-specific supervision.
Contribution
It presents a novel RL-based training method for LLMs that enhances relational reasoning in TRNs without supervised fine-tuning or chain-of-thought data.
Findings
Outperforms prior methods on multiple TRN benchmarks.
Achieves zero-shot inference on edge- and graph-level tasks.
Demonstrates robustness and generalization across domains.
Abstract
Zero-shot reasoning on text-rich networks (TRNs) remains a challenging frontier, as models must integrate textual semantics with relational structure without task-specific supervision. While graph neural networks rely on fixed label spaces and supervised objectives, recent large language model (LLM)-based approaches often overlook graph context or depend on distillation from larger models, limiting generalisation. We propose TRN-R1-Zero, a post-training framework for TRN reasoning trained solely via reinforcement learning. TRN-R1-Zero directly optimises base LLMs using a Neighbour-aware Group Relative Policy Optimisation objective that dynamically adjusts rewards based on a novel margin gain metric for the informativeness of neighbouring signals, effectively guiding the model toward relational reasoning. Unlike prior methods, TRN-R1-Zero requires no supervised fine-tuning or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
