Fixed Aggregation Features Can Rival GNNs
Celia Rubio-Madrigal, Rebekka Burkholz

TL;DR
This paper demonstrates that fixed, non-trainable aggregation features can perform on par with or better than complex GNNs across multiple benchmarks, challenging the necessity of trainable neighborhood aggregations.
Contribution
Introducing Fixed Aggregation Features (FAFs), a training-free method transforming graph tasks into tabular problems, enabling the use of standard tabular models for graph learning.
Findings
FAFs rival or outperform state-of-the-art GNNs on 12 out of 14 benchmarks.
Mean aggregation often suffices, reducing the need for deep GNNs.
Theoretical connection to Kolmogorov-Arnold representations explains the effectiveness of fixed aggregations.
Abstract
Graph neural networks (GNNs) are widely believed to excel at node representation learning through trainable neighborhood aggregations. We challenge this view by introducing Fixed Aggregation Features (FAFs), a training-free approach that transforms graph learning tasks into tabular problems. This simple shift enables the use of well-established tabular methods, offering strong interpretability and the flexibility to deploy diverse classifiers. Across 14 benchmarks, well-tuned multilayer perceptrons trained on FAFs rival or outperform state-of-the-art GNNs and graph transformers on 12 tasks -- often using only mean aggregation. The only exceptions are the Roman Empire and Minesweeper datasets, which typically require unusually deep GNNs. To explain the theoretical possibility of non-trainable aggregations, we connect our findings to Kolmogorov-Arnold representations and discuss when mean…
Peer Reviews
Decision·Submitted to ICLR 2026
- challenges the established view of needing learned aggregations in GNNs with convincing arguments; in this respect, it can be considered innovative. - the tabular representation is interpretable with standard tools (e.g. SHAP), which clearly adds value. - the experiments are adequate in both in width (14 datasets) and depth (large set of hyperparameters), the design is fair and reproducible - some insights (e.g. GNNs may overfit later aggregations) are definitely thought-provoking and might he
- The way it is presented, the main finding of the paper appears to be confined to (transductive) node classification. A more broader characterization (e.g. an extension to graph classification or to inductive node-classification) would increase the significance of this work. - The work (and especially its recommendation on using FAF baselines and reassess benchmarks) is connected with previous work on properly benchmarking GNNs for graph classification. In particular, [1] proposes simple basel
1. An introduction of non-learnable multi-hop feature aggregation as a preprocessing step is very simple, intuitive and practical approach to augment the original node features with graph-based information. 2. A discussion of the problems in optimization procedure of GNNs and theoretical properties of FAF, which also explains why a standard MLP on top of simple graph-based aggregations can perform on par with standard GNNs. 3. An extensive empirical study showing that FAF enables to achieve ne
As the main weakness of current version, I see the choice of graph datasets for experiments. There is a recently introduced GraphLand benchmark [1] that provides both classification and regression tasks from industrial applications, includes both homophilous and heterophilous graph datasets, and contains rich heterogeneous tabular node features. Moreover, this work introduces Neighborhood Feature Aggregation (NFA) that seems to be a specific instance of FAF using mean, max and sum aggregations o
- The main idea is simple but useful, its empirical performance can sometimes be quite impressive. - Experiments are conducted on a vast range of datasets, strong baseline models with adequate hyperparameter search spaces are used. - The paper raises timely questions regarding the adequacy of standard node classification benchmarks for the evaluation of complex models.
- The interpretability example with the minesweeper dataset (section 3.1) is wrong due to a misunderstanding of what the node features are. As described in [1], where the dataset was proposed, the node features use one-hot encoding for the number of neighboring mines, not binary encoding (see page 7 of [1], the Minesweeper paragraph). Due to this mistake, the explanation that the authors provide for how the model uses the features is entirely wrong. I do not consider this a serious issue, as it
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Graph Theory and Algorithms · Explainable Artificial Intelligence (XAI)
