Comprehensive representation of health-related phenotypes in one million dogs using topic modelling of electronic health records
Peter-John Mäntylä Noble, Sean Oliver Farrell, Noura Al-Moubayed, Alan David Radford

TL;DR
This paper uses machine learning to analyze a million dogs' health records, uncovering known and new disease patterns.
Contribution
A novel application of BERTopic for extracting health-related phenotypes from veterinary clinical notes at scale.
Findings
BERTopic successfully identified known breed predispositions to diseases like hypoadrenocorticism and diabetes.
The method revealed potential novel patterns in disease phenotypes across a large population of dogs.
The approach enables rapid and scalable interrogation of clinical datasets for diverse health insights.
Abstract
Historically, veterinary studies screening for breed, age and sex predisposition to disease have relied on collating small-scale studies of clinical datasets. The availability of larger datasets through groups such as the Small Animal Veterinary Surveillance Network (SAVSNET) promise access to information regarding a wide range of clinical presentations at scale, however, methodological limitations surrounding the extraction of specific disease information or screening for disease predispositions result in a substantial reduction in the number of animals studied. These studies often address very focused hypotheses - only leveraging a small fraction of the intrinsic value of the data at any one time. Here, we implemented an unsupervised machine learning methodology, creating a representation of a large volume of clinical notes collected by SAVSNET from veterinary practices across the UK.…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsZoonotic diseases and public health · Human-Animal Interaction Studies · Data-Driven Disease Surveillance
