# Evaluation of a text-mining application for the rapid analysis of free-text wildlife necropsy reports

**Authors:** Stefan Saverimuttu, Kate McInnes, Kristin Warren, Lian Yeap, Stuart Hunter, Brett Gartrell, An Pas, James Chatterton, Bethany Jackson

PMC · DOI: 10.1371/journal.pone.0337720 · 2025-11-25

## TL;DR

A text-mining tool called DEE was tested to quickly analyze wildlife necropsy reports, showing promise for improving data retrieval in conservation and health research.

## Contribution

The study evaluates a novel text-mining application for wildlife necropsy data, highlighting its performance and limitations in a real-world context.

## Key findings

- DEE achieved mean F1-scores between 0.63 and 0.93 for identifying clinicopathologic findings in necropsy reports.
- Findings with limited terminological variance, like external oiling, showed the highest performance and consistency.
- The study suggests that capturing terminological variance is crucial for improving the tool's broader applicability.

## Abstract

The ability to efficiently derive insights from wildlife necropsy data is essential for advancing conservation and One Health objectives, yet close reading remains the mainstay of knowledge retrieval from ubiquitous free-text clinical data. This time-consuming process poses a barrier to the efficient utilisation of such valuable resources. This study evaluates part of a bespoke text-mining application, DEE (Describe, Explore, Examine), designed for extracting insights from free-text necropsy reports housed in Aotearoa New Zealand’s Wildbase Pathology Register. A pilot test involving nine veterinary professionals assessed DEE’s ability to quantify the occurrence of four clinicopathologic findings (external oiling, trauma, diphtheritic stomatitis, and starvation) across two species datasets by comparison to manual review. Performance metrics—recall, precision, and F1-score—were calculated and analysed alongside tester-driven misclassification patterns. Findings reveal that while DEE (and the principals underlying its function) offers time-efficient data retrieval, its performance is influenced by search term selection and the breadth of vocabulary which may describe a clinicopathologic finding. Those findings characterized by limited terminological variance, such as external oiling, yielded the highest performance scores and the most consistency across application testers. Mean F1-scores across all tested findings and application testers was 0.63–0.93. Results highlight the utility and limitations of term-based text-mining approaches and suggests that enhancements to automatically capture this terminological variance may be necessary for broader implementation. This pilot study highlights the potential of relatively simple, rule-based text-mining approaches to derive insights natural language wildlife data in the support of One Health goals.

## Full-text entities

- **Diseases:** diphtheritic stomatitis (MESH:D013280), trauma (MESH:D014947)

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12646400/full.md

---
Source: https://tomesphere.com/paper/PMC12646400