QuantumChem-200K: A Large-Scale Open Organic Molecular Dataset for Quantum-Chemistry Property Screening and Language Model Benchmarking
Yinqi Zeng, Renjie Li

TL;DR
QuantumChem-200K is a comprehensive large-scale dataset of over 200,000 organic molecules with detailed quantum-chemical properties, enabling AI-driven screening and discovery of photoinitiators and related materials.
Contribution
The paper introduces QuantumChem-200K, the first extensive dataset with diverse quantum-chemical properties for organic molecules, and demonstrates fine-tuning of a language model for property prediction.
Findings
Fine-tuned language model outperforms baselines in property prediction.
QuantumChem-200K enables high-throughput screening of photoinitiators.
Benchmarking shows improved accuracy for TPA and ISC predictions.
Abstract
The discovery of next-generation photoinitiators for two-photon polymerization (TPP) is hindered by the absence of large, open datasets containing the quantum-chemical and photophysical properties required to model photodissociation and excited-state behavior. Existing molecular datasets typically provide only basic physicochemical descriptors and therefore cannot support data-driven screening or AI-assisted design of photoinitiators. To address this gap, we introduce QuantumChem-200K, a large-scale dataset of over 200,000 organic molecules annotated with eleven quantum-chemical properties, including two-photon absorption (TPA) cross sections, TPA spectral ranges, singlet-triplet intersystem crossing (ISC) energies, toxicity and synthetic accessibility scores, hydrophilicity, solubility, boiling point, molecular weight, and aromaticity. These values are computed using a hybrid workflow…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhotopolymerization techniques and applications · Nonlinear Optical Materials Studies · Machine Learning in Materials Science
