The BOS-TMC Dataset: DFT Properties of 159k Experimentally Characterized Transition Metal Complexes Spanning Multiple Charge and Spin States
Aaron G. Garrison, Jacob W. Toney, Tatiana Nikolaeva, Roland G. St. Michel, Christopher J. Stein, and Heather J. Kulik

TL;DR
The BOS-TMC dataset provides extensive DFT properties for 159,000 experimentally characterized transition metal complexes across multiple charge and spin states, supporting machine learning and benchmarking.
Contribution
This work introduces a large, diverse dataset of transition metal complexes with multiple spin states and properties, curated with an iterative charge assignment process, and reports over 2.9 million properties.
Findings
Dataset is larger and more diverse than prior TMC datasets.
Properties are computed with PBE0/def2-TZVP and include multiple electronic and atomic features.
Sensitivity analysis shows variation in properties depending on exchange-correlation functional.
Abstract
We present the Boston Open-Shell Transition Metal Complex (BOS-TMC) dataset, a set of density functional theory (DFT) properties for 159k experimentally characterized mononuclear transition metal complexes (TMCs) in multiple spin states with a range of formal charges derived from the Cambridge Structural Database (CSD). To curate this set, we carried out an iterative procedure to confidently assign overall TMC charge. From this information, we then obtained properties in up to three spin states, i.e., low-, intermediate-, and high-spin for 3d metals and low- and intermediate-spin for 4d and 5d metals, depending on compatibility with the metal electron configuration, for a total of 343.8k TMC/spin combinations. At odds with prior sets, we preserved experimental heavy-atom coordinates in these structures during optimization. We report all properties using PBE0/def2-TZVP single-point…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
