# Decoding the unseen: unsupervised anomaly detection in metal–organic frameworks for discovery beyond the norm

**Authors:** Hosein Alimardani, Shayan Abaei, Mehrdad Asgari

PMC · DOI: 10.1039/d5sc06431g · Chemical Science · 2026-02-24

## TL;DR

CHEM-AD is a machine learning tool that finds unusual metal–organic frameworks (MOFs) to expand material design and improve dataset reliability.

## Contribution

CHEM-AD introduces a label-free, CPU-efficient autoencoder-based pipeline for detecting anomalous MOFs using engineered descriptors.

## Key findings

- CHEM-AD identifies 488 anomalous MOFs from 26,025 entries, featuring unique topologies and extreme pore metrics.
- Anomalies are primarily driven by connectivity features and show multivariate deviation via PCA and Mahalanobis distances.
- The pipeline categorizes anomalies into plausible candidates, chemically resolvable issues, and structural artifacts.

## Abstract

The discovery of chemically novel or structurally anomalous metal–organic frameworks (MOFs) is essential for expanding reticular design space and enhancing dataset reliability. We present CHEM-AD (Chemically Unusual Metal–organic Frameworks via Autoencoder-based Detection), a label-free, CPU-efficient pipeline that detects anomalous MOFs using 81 engineered descriptors (32 geometric/chemical/topological scalars plus a 49-dimensional metal-composition encoding). A compact symmetric autoencoder (∼1.8 × 105 trainable parameters) learns the latent distribution of typical MOFs and assigns anomaly scores based on reconstruction error. Applied to 26 025 entries from MOFxDB, CHEM-AD identifies 488 outliers (∼1.87%) featuring distinctive topologies, unusual pore metrics (PLD: 2.56–29.48 Å; LCD: 4.89–63.59 Å), and extreme densities (0.057–4.27 g cm−3). These anomalies consistently occupy peripheral clusters in PCA embeddings and exhibit substantial Mahalanobis distances from normal MOFs, indicating multivariate deviation. Feature attribution reveals connectivity (e.g., edge/node counts, degree dispersion) as the primary driver of anomalies, followed by window-limited geometry and linker–metal composition. We categorize results into three groups: (A) topologically unusual yet plausible candidates, (B) anomalies with chemically resolvable issues, and (C) likely structural artifacts. The full pipeline executes in under six minutes on standard CPUs and does not require 3D structure fitting or graph parsing. CHEM-AD generalizes to other porous materials, providing a scalable framework for discovery, database curation, and robust preprocessing in materials informatics.

fo Discover chemically novel MOFs using CHEM-AD, an unsupervised machine learning pipeline. We leverage autoencoders for structural anomaly detection to expand reticular design and enhance material dataset reliability.

## Full-text entities

- **Chemicals:** Metal-organic Frameworks (MESH:D000073396), metal (MESH:D008670)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12952661/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12952661/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/PMC12952661/full.md

---
Source: https://tomesphere.com/paper/PMC12952661