# Taxonomical modeling and classification in space hardware failure reporting

**Authors:** Daniel Palacios, Terry R. Hill

PMC · DOI: 10.1038/s41598-026-36813-7 · 2026-01-21

## TL;DR

This paper introduces a new automated method to classify and analyze space hardware failure reports using machine learning and NLP techniques.

## Contribution

The novel contribution is combining LDA and BERT to create an automated taxonomical model for space hardware failure data.

## Key findings

- Combining LDA and BERT improves the classification and knowledge extraction from failure reports.
- The model helps identify trends and correlations in large sets of unstructured text data.
- The study discusses limitations of alternative methods like causal rule mining and deep neural networks.

## Abstract

NASA Johnson Space Center has collected more than 54,000 space hardware failure reports. Obtaining engineering processes trends or root cause analysis by manual inspection is impractical. Fortunately, novel data science tools in Machine Learning and Natural Language Processing (NLP) can be utilized to perform text mining and knowledge extraction. In NLP the use of taxonomies (classification trees) are key to the structuring of text data, extracting knowledge and important concepts from documents, and facilitating the identification of correlations and trends within the data set. Usually, these taxonomies and text structures live in the heads of experts in their specific field. However, when an expert is not available, taxonomies and ontologies are not found in data bases, or the field of study is too broad, this approach can enable and provide structure to the text content of a record set. In this paper an automated taxonomical model is presented by the combination of Latent Dirichlet Allocation (LDA) algorithms and Bidirectional Encoder Representations from Transformers (BERT). Additionally, the limitations and outcomes of causal relationship rule mining models, commercial tools, and deep neural networks are also discussed.

## Full-text entities

- **Genes:** F3 (coagulation factor III, tissue factor) [NCBI Gene 2152] {aka CD142, TF, TFA}
- **Diseases:** DR defect (MESH:D000013)
- **Chemicals:** INDRA (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12894956/full.md

---
Source: https://tomesphere.com/paper/PMC12894956