Universally Converging Representations of Matter Across Scientific Foundation Models
Sathya Edamadaka, Soojung Yang, Ju Li, Rafael G\'omez-Bombarelli

TL;DR
This paper demonstrates that diverse scientific models learn highly aligned internal representations of matter, revealing a universal structure that improves understanding and transferability across different scientific domains and modalities.
Contribution
It systematically shows representational convergence across nearly sixty models spanning multiple scientific modalities, establishing a benchmark for universality in scientific foundation models.
Findings
Models trained on similar data have highly aligned representations.
Performance improvement correlates with convergence in representation space.
Universal representations are limited by training data and inductive biases.
Abstract
Machine learning models of vastly different modalities and architectures are being trained to predict the behavior of molecules, materials, and proteins. However, it remains unclear whether they learn similar internal representations of matter. Understanding their latent structure is essential for building scientific foundation models that generalize reliably beyond their training domains. Although representational convergence has been observed in language and vision, its counterpart in the sciences has not been systematically explored. Here, we show that representations learned by nearly sixty scientific models, spanning string-, graph-, 3D atomistic, and protein-based modalities, are highly aligned across a wide range of chemical systems. Models trained on different datasets have highly similar representations of small molecules, and machine learning interatomic potentials converge in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Advanced Graph Neural Networks · Computational Drug Discovery Methods
