The BOS-Lig Dataset: Accurate Ligand Charges from a Consensus Approach for 66,810 Experimentally Synthesized Ligands
Roland G. St. Michel, Ryan J. Jang, Aaron G. Garrison, Ilia Kevlishvili, and Heather J. Kulik

TL;DR
The BOS-Lig dataset provides accurately assigned ligand charges for over 66,000 ligands from transition metal complexes, enabling better computational screening and ligand design.
Contribution
A novel iterative charge-balancing workflow that reliably assigns ligand charges and links them to functional applications in a large, experimentally grounded dataset.
Findings
Confidently assigned charges to 66,810 ligands from 126,985 complexes.
Linked ligands to application areas like reactivity and redox chemistry.
Developed a workflow that propagates charge assignments across complex environments.
Abstract
Understanding ligand properties is essential for computational high-throughput screening of transition metal complexes. However, ligand properties such as net charge and other information such as their application area are often absent or inconsistently recorded in crystallographic datasets. Here, we construct a ligand dataset from 126,985 mononuclear transition metal complexes curated from the Cambridge Structural Database. Using an iterative charge-balancing workflow that combines complex charges, metal oxidation states, and consensus across crystallographic observations, we confidently assign net charges to 66,810 ligands among 94,581 identified unique ligand structures to curate the Boston Open-Shell Ligand (BOS-Lig) dataset. The workflow assigns ligand charges in homoleptic complexes first and then iteratively propagates these assignments across heteroleptic environments, allowing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
