MolSight: Molecular Property Prediction with Images
Aaditya Baranwal, Akshaj Gupta, Shruti Vyas, and Yogesh S Rawat

TL;DR
MolSight demonstrates that simple 2D molecular images processed by vision models can effectively predict molecular properties, outperforming complex methods on multiple benchmarks with lower computational cost.
Contribution
This work is the first large-scale systematic study of vision-based molecular property prediction using 2D images and introduces a chemistry-informed curriculum for improved performance.
Findings
Achieves top results on 5 out of 10 benchmarks.
Single 2D images suffice for competitive property prediction.
Outperforms multi-modal methods with 80x lower FLOPs.
Abstract
Every molecule ever synthesised can be drawn as a 2D skeletal diagram, yet in modern property prediction this universally available representation has received less focus in favour of molecular graphs, 3D conformers, or billion-parameter language models, each imposing its own computational and data-engineering overhead. We present , the first systematic large-scale study of vision-based Molecular Property Prediction (MPP). Using 10 vision architectures, 7 pre-training strategies, and molecule images, we evaluate performance across 10 downstream tasks spanning physical-property regression, drug-discovery classification, and quantum-chemistry prediction. To account for the wide variation in structural complexity across pre-training molecules, we further propose a : five structural complexity descriptors partition the corpus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
