FloydNet: A Learning Paradigm for Global Relational Reasoning
Jingcheng Yu, Mingliang Zeng, Qiwei Ye

TL;DR
FloydNet introduces a global, dynamic programming-inspired architecture for relational reasoning, surpassing message-passing GNNs in capturing long-range dependencies and achieving state-of-the-art results on algorithmic and combinatorial tasks.
Contribution
The paper presents FloydNet, a novel architecture that employs a global relational tensor and learned DP operators, offering a more powerful reasoning paradigm than traditional message-passing GNNs.
Findings
Achieves near-perfect scores on CLRS-30 benchmark.
Finds exact solutions for TSP significantly better than heuristics.
Matches 3-WL expressive power and aligns with k-FWL hierarchy.
Abstract
Developing models capable of complex, multi-step reasoning is a central goal in artificial intelligence. While representing problems as graphs is a powerful approach, Graph Neural Networks (GNNs) are fundamentally constrained by their message-passing mechanism, which imposes a local bottleneck that limits global, holistic reasoning. We argue that dynamic programming (DP), which solves problems by iteratively refining a global state, offers a more powerful and suitable learning paradigm. We introduce FloydNet, a new architecture that embodies this principle. In contrast to local message passing, FloydNet maintains a global, all-pairs relationship tensor and learns a generalized DP operator to progressively refine it. This enables the model to develop a task-specific relational calculus, providing a principled framework for capturing long-range dependencies. Theoretically, we prove that…
Peer Reviews
Decision·Submitted to ICLR 2026
- The idea of applying the Floyd-Warshall dynamic programming algorithm and its alignment for graph algorithms is intuitive and appealing. - The experimental analysis is well-conducted. - The proposed PivotAttention is likely a better application of attention mechanisms on graph structures than conventional graph transformers.
1. The proposed approach is subsumed within the K-GNN approach, albeit with a dynamic programming formulation based on the Floyd-Warshall algorithm combined with an attention mechanism. Therefore, it has the same expressivity as K-WL, as noted in the paper. Given this, the additional insights provided by this modification are not entirely clear. While the attention mechanism is useful, how does it impact higher-order aggregations as used in this approach? Is there any intuitive understanding of
- The paper is generally well written and easy to follow. - You propose a novel solution that appears to provide clear advantages in the evaluated settings.
Caveat: This paper lies well beyond my personal expertise. Other reviewers’ points should be clearly prioritized. **Major Weaknesses:** - **W1** Scalability: your method’s cubic cost is a clear disadvantage for larger, real-world, and especially sparse graphs. This should be discussed more explicitly. As far as I can tell, the graphs you evaluate on are mostly complete and rather small. It would help to show how your architecture scales to much larger, more sparse graphs and to discuss both tra
FloydNet occupies a distinct middle ground between GNNs, Graph Transformers, and higher-order variants. While the model hinges on softmax attention and lacks a message-passing mechanism, it preserves the permutation-equivariance property and lacks a positional encoding. The motivation for the model is compelling and it is presented clearly and elegantly. As far as I am aware, the 2WL equivalence is mathematically sound. This claim effectively situates it within the literature of graph transfor
The $O(N^3)$ complexity is a fundamental problem, especially as the construction requires computing a tensor of size $N \times N \times N \times d$. Furthermore, the paper provides relatively little time and memory benchmarking, besides a few noted OOM issues on the CLRS-30 benchmark. It is unclear whether the baseline models are proper comparisons to FloydNet. For instance, the TSP comparisons are only relative to a single non-learned benchmark, despite the existing of numerous other neural ap
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Multimodal Machine Learning Applications · Topic Modeling
