# Anomaly-Detection-Driven Screening of Thermodynamic Stability from Composition Descriptors Alone

**Authors:** Keisuke Makino, Yudai Yamaguchi, Naoto Tanibata, Hayami Takeda, Ryo Kobayashi, Masayuki Karasuyama, Masanobu Nakayama

PMC · DOI: 10.1021/acs.jpclett.5c03772 · The Journal of Physical Chemistry Letters · 2026-02-10

## TL;DR

This paper introduces a new method using machine learning to predict material stability based solely on composition data.

## Contribution

The novel contribution is an autoencoder-based anomaly detector that uses composition-only descriptors to assess thermodynamic stability.

## Key findings

- The reconstruction error (RMSE) increases with thermal destabilization, indicating a correlation with energy above hull.
- The RMSE also increases proportionally with charge imbalance in dummy oxides, capturing charge neutrality without explicit charge data.
- Element pairs with smaller spdf products tend to have lower RMSE, though tantalum-containing pairs deviate from this trend.

## Abstract

Materials informatics tends to rely on existing structural
database
searches that constrain exploration by omitting unregistered compositions.
In this study, an autoencoder-based anomaly detector was developed
using composition-only descriptors as input features. The model was
trained on thermodynamically stable phasesdefined as those
on the convex hull with an energy above hull (ΔE

hull
) of 0 eV/atomas well as
nearly stable phases with ΔE

hull
 < 0.01 eV/atom, sourced from the Materials Project inorganic
database. The reconstruction error (RMSE) was used as the anomaly
score. It was shown that the RMSE increased systematically with apparent
thermal destabilizationthat is, increasing energy above hull.
It was also shown that for 50,000 dummy oxides with an intentionally
perturbed charge balance, the RMSE increased in proportion to the
magnitude of the total charge imbalance, indicating that departures
from charge neutrality could be captured even without explicit charge
information. Feature-importance analysis suggested that element pairs
(two-element combinations) were the principal factors governing the
reconstruction RMSE. Accordingly, for 50,000 charge-compensated virtual
oxides, we ordinally encoded the valence-shell type as s = 1, p =
2, d = 3, and f = 4 and defined, for each element pair, a coarse indicator
given by the product of the two codes (hereafter, the spdf product).
For each pair, we evaluated the correspondence between the median
RMSE (taken over all compositions containing that pair) and its spdf
product and obtained an approximately monotonic relationshipthat
is, pairs with smaller spdf products tended to have a lower median
RMSE, whereas those with larger spdf products tended to have higher
values. By contrast, pairs containing Ta consistently deviated downward
from this relationship (i.e., exhibited a lower RMSE), suggesting
that not only the spdf product but other descriptor information could
also influence the assessment of synthesizability.

## Full-text entities

- **Diseases:** Anomaly (MESH:D000013)
- **Chemicals:** oxides (MESH:D010087)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12927008/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12927008/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/PMC12927008/full.md

---
Source: https://tomesphere.com/paper/PMC12927008