# Enhancing the classification of isolated theropod teeth using machine learning: a comparative study

**Authors:** Carolina S. Marques, Emmanuel Dufourq, Soraia Pereira, Vanda F. Santos, Elisabete Malafaia

PMC · DOI: 10.7717/peerj.19116 · PeerJ · 2025-03-26

## TL;DR

This study uses machine learning to improve the classification of isolated theropod teeth, comparing different methods to handle class imbalance and enhance accuracy.

## Contribution

The paper introduces a novel comparative analysis of multi-class classification methods for theropod teeth across varying taxonomic levels and balancing techniques.

## Key findings

- Some classification models are more sensitive to class imbalance than others.
- Different standardization and oversampling techniques affect classification performance.
- Trained models and standardizations are made publicly available for future research.

## Abstract

Classifying objects, such as taxonomic identification of fossils based on morphometric variables, is a time-consuming process. This task is further complicated by intra-class variability, which makes it ideal for automation via machine learning (ML) techniques. In this study, we compared six different ML techniques based on datasets with morphometric features used to classify isolated theropod teeth at both genus and higher taxonomic levels. Our model also intends to differentiate teeth from different positions on the tooth row (e.g., lateral, mesial). These datasets present different challenges like over-representation of certain classes and missing measurements. Given the class imbalance, we evaluate the effect of different standardization and oversampling techniques on the classification process for different classification models. The obtained results show that some classification models are more sensitive to class imbalance than others. This study presents a novel comparative analysis of multi-class classification methods for theropod teeth, evaluating their performance across varying taxonomic levels and dataset balancing techniques. The aim of this study is to evaluate which ML methods are more suitable for the classification of isolated theropod teeth, providing recommendations on how to deal with imbalanced datasets using different standardization, oversampling, and classification tools. The trained models and applied standardizations are made publicly available, providing a resource for future studies to classify isolated theropod teeth. This open-access methodology will enable more reliable cross-study comparisons of fossil records.

## Full-text entities

- **Genes:** CBL (Cbl proto-oncogene) [NCBI Gene 867] {aka C-CBL, CBL2, FRA11B, NSLL, RNF55}, CHR [NCBI Gene 1125]
- **Diseases:** MG (MESH:D009157), DC (MESH:D003784), Ceratosaurus-like (MESH:C537419), CH (MESH:C000719188)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11954464/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11954464/full.md

## References

58 references — full list in the complete paper: https://tomesphere.com/paper/PMC11954464/full.md

---
Source: https://tomesphere.com/paper/PMC11954464