A Systematic Evaluation Protocol of Graph-Derived Signals for Tabular Machine Learning
Mario Heidrich, Jeffrey Heidemann, R\"udiger Buchkremer, Gonzalo Wandosell Fern\'andez de Bobadilla

TL;DR
This paper introduces a comprehensive, reproducible evaluation protocol for assessing the effectiveness and robustness of graph-derived signals in tabular machine learning, demonstrated through a large-scale fraud detection case study.
Contribution
It proposes a unified evaluation framework with statistical rigor for analyzing graph-derived signals, enabling reliable identification of effective signals across diverse scenarios.
Findings
Certain signal categories consistently improve performance
Robustness varies across different graph signals
Insights into signals' behavior under data perturbations
Abstract
While graph-derived signals are widely used in tabular learning, existing studies typically rely on limited experimental setups and average performance comparisons, leaving the statistical reliability and robustness of observed gains largely unexplored. Consequently, it remains unclear which signals provide consistent and robust improvements. This paper presents a taxonomy-driven empirical analysis of graph-derived signals for tabular machine learning. We propose a unified and reproducible evaluation protocol to systematically assess which categories of graph-derived signals yield statistically significant and robust performance improvements. The protocol provides an extensible setup for the controlled integration of diverse graph-derived signals into tabular learning pipelines. To ensure a fair and rigorous comparison, it incorporates automated hyperparameter optimization, multi-seed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Imbalanced Data Classification Techniques · Spam and Phishing Detection
