MultiPUFFIN: A Multimodal Domain-Constrained Foundation Model for Molecular Property Prediction of Small Molecules
Idelfonso B. R. Nogueira, Carine M. Rebelloa, Mumin Enis Leblebici, Erick Giovani Sperandio Nascimento

TL;DR
MultiPUFFIN is a multimodal foundation model that predicts multiple thermophysical properties of small molecules with thermodynamic consistency, using domain-informed biases and a multi-source dataset, outperforming larger models with less data.
Contribution
It introduces a novel multimodal, domain-constrained model for molecular property prediction that ensures thermodynamic consistency and handles missing data effectively.
Findings
Achieves mean R^2 of 0.716 on challenging test set.
Outperforms ChemBERTa-2 despite using 2000x less data.
Effectively predicts temperature-dependent properties.
Abstract
Predicting physicochemical properties across chemical space is vital for chemical engineering, drug discovery, and materials science. Current molecular foundation models lack thermodynamic consistency, while domain-informed approaches are limited to single properties and small datasets. We introduce MultiPUFFIN, a domain-constrained multimodal foundation model addressing both limitations simultaneously. MultiPUFFIN features: (i) an encoder fusing SMILES, graphs, and 3D geometries via gated cross-modal attention, alongside experimental condition and descriptor encoders; (ii) prediction heads embedding established correlations (e.g., Wagner, Andrade, van't Hoff, and Shomate equations) as inductive biases to ensure thermodynamic consistency; and (iii) a two-stage multi-task training strategy.Extending prior frameworks, MultiPUFFIN predicts nine thermophysical properties simultaneously. It…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · Crystallography and molecular interactions
