MolSight: Molecular Property Prediction with Images

Aaditya Baranwal; Akshaj Gupta; Shruti Vyas; and Yogesh S Rawat

arXiv:2605.10157·cs.CV·May 12, 2026

MolSight: Molecular Property Prediction with Images

Aaditya Baranwal, Akshaj Gupta, Shruti Vyas, and Yogesh S Rawat

PDF

TL;DR

MolSight demonstrates that simple 2D molecular images processed by vision models can effectively predict molecular properties, outperforming complex methods on multiple benchmarks with lower computational cost.

Contribution

This work is the first large-scale systematic study of vision-based molecular property prediction using 2D images and introduces a chemistry-informed curriculum for improved performance.

Findings

01

Achieves top results on 5 out of 10 benchmarks.

02

Single 2D images suffice for competitive property prediction.

03

Outperforms multi-modal methods with 80x lower FLOPs.

Abstract

Every molecule ever synthesised can be drawn as a 2D skeletal diagram, yet in modern property prediction this universally available representation has received less focus in favour of molecular graphs, 3D conformers, or billion-parameter language models, each imposing its own computational and data-engineering overhead. We present $MolSight$ , the first systematic large-scale study of vision-based Molecular Property Prediction (MPP). Using 10 vision architectures, 7 pre-training strategies, and $2 M$ molecule images, we evaluate performance across 10 downstream tasks spanning physical-property regression, drug-discovery classification, and quantum-chemistry prediction. To account for the wide variation in structural complexity across pre-training molecules, we further propose a $chemistry-informed curriculum$ : five structural complexity descriptors partition the corpus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.