Clever Materials: When Models Identify Good Materials for the Wrong Reasons

Kevin Maik Jablonka

arXiv:2602.17730·physics.chem-ph·February 23, 2026

Clever Materials: When Models Identify Good Materials for the Wrong Reasons

Kevin Maik Jablonka

PDF

Open Access

TL;DR

This paper reveals that machine learning models for materials discovery often rely on bibliographic confounding rather than chemical understanding, highlighting the need for better validation methods.

Contribution

It demonstrates that models trained on standard descriptors can predict publication metadata well, suggesting that many datasets do not confirm chemical reasoning.

Findings

01

Models predict author, journal, and year above chance

02

Bibliographic fingerprints can rival chemical descriptors in prediction

03

Current datasets often allow non-chemical explanations for success

Abstract

Machine learning can accelerate materials discovery. Models perform impressively on many benchmarks. However, strong benchmark performance does not imply that a model learned chemistry. I test a concrete alternative hypothesis: that property prediction can be driven by bibliographic confounding. Across five tasks spanning MOFs (thermal and solvent stability), perovskite solar cells (efficiency), batteries (capacity), and TADF emitters (emission wavelength), models trained on standard chemical descriptors predict author, journal, and publication year well above chance. When these predicted metadata ("bibliographic fingerprints") are used as the sole input to a second model, performance is sometimes competitive with conventional descriptor-based predictors. These results show that many datasets do not rule out non-chemical explanations of success. Progress requires routine falsification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Big Data and Digital Economy