# Enhancing Reverse Design Ability of Functional Materials Based on Data Quality Management: Taking Biomedical Zinc Alloy as an Example

**Authors:** Xujie Gong, Xue Jiang, Shiyu Huang, Yize Wang, Lishen Ding, Yanjing Su, Yu Yan

PMC · DOI: 10.3390/ma18204729 · Materials · 2025-10-15

## TL;DR

This paper introduces a data quality management strategy to improve the design of biodegradable zinc alloys for biomedical use, achieving high tensile strength through optimized datasets.

## Contribution

A novel data quality management strategy using recursive screening to address redundancy, outliers, and inconsistencies in multi-source materials data.

## Key findings

- ID-optimized datasets achieved 482 MPa tensile strength in a zinc alloy, near state-of-the-art performance.
- Data quality management improved predictive accuracy in high-performance regions over data volume or density.
- Six multicomponent zinc alloys were successfully designed and fabricated using the proposed strategies.

## Abstract

Biodegradable zinc alloys have shown great potential in the biomedical field, but are limited by their poor mechanical properties. Alloying is essential for improving mechanical properties, yet designing multicomponent zinc alloys remains challenging due to complex elemental interactions. Notably, while data-driven active learning approaches offer new strategies for zinc alloy design, data quality issues such as redundancy, outliers, and inconsistencies in multi-source heterogeneous data hinder modeling accuracy and interpretability. In this work, we proposed a data quality management strategy based on recursive screening, targeting three key data problems, namely, redundant data (RD), outlier data (OD), and inconsistent target data (ID). Case studies on hydrogen embrittlement, phase-change refrigeration materials, and matbench_expt_gap datasets showed that, in the aforementioned data-driven research, RD optimized data distribution but risked precision loss in high-performance regions; OD enhanced minority alloy features but risked overfitting; and ID preserved high-performance data, boosting extrapolation but risking underfitting. Six multicomponent zinc alloys were designed and fabricated using these strategies. Experiments showed ID-optimized datasets achieving 482 MPa—near state-of-the-art performance. The highest tensile strength of 482 MPa was obtained in the alloy Zn-1.2Al-0.8Mg-0.45Li-0.3Mn (at%), designed via the ID-optimized dataset. The study revealed that in inverse design, predictive accuracy in high-performance regions outweighs data volume or density, underscoring the value of data quality management for multi-source materials development.

## Full-text entities

- **Chemicals:** hydrogen (MESH:D006859), Zn (MESH:D015032), Zinc Alloy (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12565697/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12565697/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/PMC12565697/full.md

---
Source: https://tomesphere.com/paper/PMC12565697