The PubChemQC Project: a large chemical database from the first principle calculations
Maho Nakata

TL;DR
The PubChemQC Project has created a large, publicly accessible database of over 1.5 million molecules with optimized geometries and excited states using ab initio calculations, aiming to facilitate molecular search and discovery.
Contribution
This work introduces a comprehensive, ab initio calculated molecular database derived from PubChem data, without relying on experimental data, enabling advanced molecular analysis.
Findings
Over 1.53 million molecular entries included
Data covers optimized geometries and excited states
Database is publicly available for research use
Abstract
In this research, we have been constructing a large database of molecules by {\it ab initio} calculations. Currently, we have over 1.53 million entries of 6-31G* B3LYP optimized geometries and ten excited states by 6-31+G* TDDFT calculations. To calculate molecules, we only refer the InChI (International Chemical Identifier) representation of chemical formula by the International Union of Pure and Applied Chemistry (IUPAC), thus, no reference to experimental data. These results are open to public at http://pubchemqc.riken.jp/. The molecular data have been taken from the PubChem Project (http://pubchem.ncbi.nlm.nih.gov/) which is one of the largest in the world (approximately 63 million molecules are listed) and free (public domain) database. Our final goal is, using these data, to develop a molecular search engine or molecular expert system to find molecules which have desired…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Various Chemistry Research Topics · Molecular spectroscopy and chirality
