SimXRD-4M: Big Simulated X-ray Diffraction Data Accelerate the Crystal Symmetry Classification
Bin Cao, Yang Liu, Zinan Zheng, Ruifeng Tan, Jia Li, Tong-yi Zhang

TL;DR
This paper introduces SimXRD, the largest simulated X-ray diffraction dataset, to enhance machine learning models for crystal symmetry classification, addressing data scarcity and revealing distributional properties of crystal symmetries.
Contribution
The creation of SimXRD, a large-scale open-source simulated XRD dataset, and its evaluation on various neural network models for crystal symmetry classification.
Findings
Crystal symmetry follows a long-tailed distribution.
Existing neural networks struggle with rare crystal classes.
SimXRD enables improved ML model training for crystallography.
Abstract
Spectroscopic data, particularly diffraction data, contain detailed crystal and microstructure information and thus are crucial for materials discovery. Powder X-ray diffraction (XRD) patterns are greatly effective in identifying crystals. Although machine learning (ML) has significantly advanced the analysis of powder XRD patterns, the progress is hindered by a lack of training data. To address this, we introduce SimXRD, the largest open-source simulated XRD pattern dataset so far, to accelerate the development of crystallographic informatics. SimXRD comprises 4,065,346 simulated powder X-ray diffraction patterns, representing 119,569 distinct crystal structures under 33 simulated conditions that mimic real-world variations. We find that the crystal symmetry inherently follows a long-tailed distribution and evaluate 21 sequence learning models on SimXRD. The results indicate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsX-ray Diffraction in Crystallography · Machine Learning in Materials Science · Enzyme Structure and Function
