Multi-RF Fusion with Multi-GNN Blending for Molecular Property Prediction
Zacharie Bugaud

TL;DR
This paper introduces a novel ensemble method combining Random Forests on molecular fingerprints with deep-ensembled GNNs, achieving state-of-the-art results in molecular property prediction without external data or pre-training.
Contribution
It presents a multi-RF fusion approach with multi-GNN blending that outperforms existing methods on the ogbg-molhiv benchmark.
Findings
Achieved a test ROC-AUC of 0.8476, ranking #1 on the OGB leaderboard.
Optimal max_features setting improved AUC by 0.008.
Averaging GNN predictions across seeds reduces seed variance significantly.
Abstract
Multi-RF Fusion achieves a test ROC-AUC of 0.8476 +/- 0.0002 on ogbg-molhiv (10 seeds), placing #1 on the OGB leaderboard ahead of HyperFusion (0.8475 +/- 0.0003). The core of the method is a rank-averaged ensemble of 12 Random Forest models trained on concatenated molecular fingerprints (FCFP, ECFP, MACCS, atom pairs -- 4,263 dimensions total), blended with deep-ensembled GNN predictions at 12% weight. Two findings drive the result: (1) setting max_features to 0.20 instead of the default sqrt(d) gives a +0.008 AUC gain on this scaffold split, and (2) averaging GNN predictions across 10 seeds before blending with the RF eliminates GNN seed variance entirely, dropping the final standard deviation from 0.0008 to 0.0002. No external data or pre-training is used.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Machine Learning in Bioinformatics
