Comparing Manual and Automated Spatial Tracking of Captive Spider Monkeys Using Heatmaps
Silje Marquardsen Lund, Frej Gammelgård, Jonas Nielsen, Laura Liv Nørgaard Larsen, Ninette Christensen, Sisse Puck Hansen, Trine Kristensen, Henriette Høyer Ørneborg Rodkjær, Shanthiya Manoharan Sivagnanasundram, Bianca Østergaard Thomsen, Sussie Pagh, Thea Loumand Faddersbøll

TL;DR
This study shows that automated tracking using computer vision can reliably monitor spider monkeys' enclosure use and activity, matching manual observations while saving time and reducing bias.
Contribution
The study demonstrates the reliability of automated pose estimation (SLEAP) for tracking spider monkeys' enclosure use, offering a scalable alternative to manual observation.
Findings
Manual and automated tracking showed strong agreement (83–99% overlap) in identifying core activity areas.
Both methods produced comparable estimates of time spent being active, with no significant differences.
Automated tracking using SLEAP is reliable and efficient for monitoring animal welfare in zoos.
Abstract
Zookeepers and researchers often monitor welfare by recording how animals use their enclosures, as space use and activity levels provide insights into whether animals are stimulated and engaged. Traditionally, this is performed by manually observing and recording positions, but this can be slow and prone to bias. In this study, we tested whether pose estimation, using a tool called SLEAP, could automatically track two spider monkeys and produce comparable results. We compared manual observations with automated tracking by generating heatmaps of space use and measuring time spent being active. Both methods showed strong agreement, with the monkeys spending most of their time around climbing structures. Automated tracking, therefore, offers a reliable way to save time and improve the consistency of welfare monitoring in zoological institutions. Animal welfare assessments increasingly aim…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5- —Aalborg Zoo Conservation Foundation (AZCF)
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrimate Behavior and Ecology · Animal Behavior and Welfare Studies · Animal Nutrition and Physiology
1. Introduction
Black-headed spider monkeys (Ateles fusciceps Gray, 1866), hereafter referred to as spider monkeys, are diurnal, arboreal primates inhabiting the rainforests of Colombia, Ecuador and Panama [1,2]. They exhibit a fission–fusion social structure, forming dynamic subgroups during the day to reduce competition for food [3,4]. They are omnivorous and primarily feed on ripe fruits and supplement with leaves, insects and other foods [2,5]. Spider monkeys are listed in the Convention on International Trade in Endangered Species’ Appendix II [6]. This designation indicates that, while this species is not currently at risk of extinction, it is vulnerable to the impacts of unregulated trade [7], and the International Union for Conservation of Nature categorizes this species as endangered [1,8].
In response to conservation and welfare goals, zoological institutions, hereafter referred to as zoos, increasingly seek to promote natural behaviors and activity patterns and stimulate full enclosure use [9,10]. Assessing enclosure use reveals how animals engage with the space and resources available to them. Uneven space use or limited locomotor activity can indicate husbandry or design issues that constrain welfare, whereas diverse movement patterns and balanced enclosure use are linked to more positive welfare states [10,11]. Quality of Life (QoL) assessments similarly emphasize the importance of monitoring locomotion, exploration and the use of enclosure features as indicators of behavioral health and engagement, since both inactivity and stereotypic pacing may signal compromised welfare [9]. Together, these approaches highlight that activity and enclosure use are not only measures of physical health and stimulation but also vital indicators of whether zoo environments are enabling animals to thrive. However, these assessments often rely on behavioral observations, through video footage or live observations, but such assessments are time-consuming, subjective, and difficult to scale across numerous individuals [11,12,13].
Machine learning (ML) in the form of computer vision technology, particularly pose estimation, can offer promising solutions for automated, standardized behavior monitoring [12,14]. These methods can reduce observer bias, increase temporal resolution, and allow continuous data collection even in multi-animal settings [12,15]. Pose estimation uses trained neural networks to locate defined body parts within each video frame, generating spatial coordinates that can be used to reconstruct postures and movements [14,16]. Tools such as SLEAP, DeepLabCut and others have demonstrated high accuracy in both laboratory and field settings, enabling fine-scale quantification of behavior, activity tracking, and space use [14,16,17,18]. However, automated approaches depend on stable camera placement and sufficient visibility, which can create blind spots if not optimized. These approaches are increasingly recognized as valuable for welfare monitoring, as they provide both spatial and postural data that can inform enrichment and enclosure design [12,15,16].
In this study, we compare manual enclosure-use tracking and automated pose estimation to evaluate spatial activity in a mother–son pair of spider monkeys at Aalborg Zoo (Denmark). By assessing the overlap between methods, we explore the potential of computer vision for improving the efficiency and consistency of welfare monitoring.
2. Materials and Methods
Two spider monkeys housed at Aalborg Zoo were observed for this study: a female born in 1997 and her male offspring born in 2009. Observations focused exclusively on the indoor enclosure due to camera limitations. The indoor enclosure (approx. 97 m^2^) included climbing structures such as a central tree, five wooden shelves, fire hoses, and ropes. Visitor viewing was separated by netting, vegetation, and fencing.
We collected video data on six non-consecutive days between 26 October and 16 November 2024, in three sessions a day around feeding time during the zoo’s off-season. The total daily filming duration ranged from 52 to 90 min (median: 62 min). We positioned a wide-lens camera (Kitvision Escape HD5 and 4 KW (Christchurch, Dorset, BH23 4FL, UK)) to provide the most optimal coverage of the indoor enclosure.
Six observers carried out the manual observations, using the program ZooMonitor (version 4.1) [19]. Each observation session was scored by a subset of observers, and not all observers recorded data for all dates. To ensure consistency in scoring, a concordance test was conducted prior to data collection, following Altmann (1974), with a minimum agreement threshold of 85% [20]. We recorded the animals’ location using interval focal sampling [20] in 12 s intervals on a top-down schematic drawing of the enclosure, resulting in approximately 260–450 possible locations recorded per day per focal. The 12 s interval was chosen to balance temporal resolution and feasibility, based on pilot testing. A heatmap was produced from observations. We denoted coordinates for the focal sampling using a custom 2D Habitat Map in ZooMonitor and then visualized as heatmaps using R (version 4.4.3) [21].
To compare manual scoring with automated approaches, we applied computer vision using pose estimation. Pose estimation involves training a neural network to detect and track specific body parts of the animals in each video frame, thereby generating coordinates that can be used to reconstruct their movements and enclosure use. For this purpose, SLEAP (version 1.4.1a2) [14] was used to create a model trained on 313 frames from the footage collected on 26 October 2024. We trained the model to recognize four body parts to track the individuals. This was performed with the use of a skeleton with five nodes: head, shoulder, hip, tail1, and tail2 (Figure 1). We built and trained the model using SLEAP’s prediction-assisted labeling workflow until the model consistently produced accurate predictions across multiple frames and diverse conditions. The final best model was reached after 40 epochs with a validation loss of 0.000745. We reduced the frame rate of the videos to 1 fps and converted them to grayscale for analysis, resulting in approximately 3120–5400 possible locations recorded per day per focal. We used the resulting pose data to generate heatmaps for comparison with manual scoring. We produced these heatmaps in R [21] using the following packages: tidyverse (version 2.0.0) [22], ggpointdensity (version 0.2.0.9000) [23], and jpeg (version 0.1-11) [24].
Furthermore, we calculated the daily active time for both methods. In ZooMonitor, we considered consecutive 12 s samples as “movement” if x–y changes exceeded 7 pixels. In SLEAP, we reduced each frame pose to the best-scored hip/shoulder point, with movement defined as changes > 30 pixels. For each method, the durations of movement were summed. We compared the methods using a paired t-test.
3. Results
Spatial activity patterns of the two spider monkeys within the indoor enclosure were visualized using heatmaps generated from both manual observations (ZooMonitor) and automated pose estimation (SLEAP) (Figure 2). A total of 12 heatmaps are presented: six pairs corresponding to six observation dates. Each pair displays the manually scored top-down heatmap alongside the automatically generated front-facing heatmap from SLEAP, arranged side by side to facilitate direct comparison.
Overall, the heatmaps from both methods revealed similar spatial use patterns, with high-occupancy zones consistently centered around the enclosure’s central climbing structures and the door with feeding gutters. To facilitate direct comparison, the enclosure was divided into four regions (R1–R4), and the distribution of locations across these regions is summarized in Table A1. Comparison between the methods showed a high degree of agreement for most days, with percentage overlap values ranging from 44.75% to 99.00% and Pearson correlation coefficients: 0.237 ≤ r ≤ 1.000 across observation days (Table 1). Minor discrepancies appeared on some observation dates, most notably on October 27th, where a major discrepancy was found. Further inspection revealed that a major camera shift occurred partway through that day, which affected the alignment of the two methods. To account for this, the data was split into pre- and post-shift segments (Figure A1, Table A2 and Table A3). When analyzed separately, the overlap between methods was 95.02% and 97.51%, with r = 0.99 and r = 1.00, respectively.
The analysis of activity calculated from the tracking results from the two methods, SLEAP and ZooMonitor, generally produced comparable estimates of the individuals’ activity across the observed days (Figure 3). On some days, the ZooMonitor method measured higher activity than SLEAP (27 October, 31 October and 16 November), while the SLEAP method measured higher activity on other days (28 October, 1 November, 9 November). The paired t-test resulted in no significant difference between the two methods (t = 0.062, df = 5, p-value = 0.952).
4. Discussion
This study compared manual enclosure-use tracking with automated pose estimation using SLEAP to evaluate spatial activity in a mother–son pair of spider monkeys. Our results demonstrate a strong overall agreement between methods, supported by high overlap percentages and correlation coefficients, which indicates that computer vision can reliably capture enclosure-use patterns that are traditionally assessed through direct observation. Importantly, both methods identified the same hotspots around the central climbing structures and the door with feeding gutters.
A further dimension of comparison was overall activity levels (Figure 3). Here, both methods produced similar estimates of time spent in movement, with only slight differences between them. Although daily variation occurred, sometimes ZooMonitor estimated higher activity and sometimes SLEAP did, the lack of systematic bias suggests that automated tracking can reliably replicate activity measures. Because both inactivity and abnormal locomotion, such as stereotypic pacing, can signal compromised welfare, these results strengthen the case for computer vision as a valid monitoring tool.
A major advantage of pose estimation was the ability to generate fine-scale, continuous data that minimized observer bias and revealed subtle positional shifts not easily detected with interval sampling. By contrast, manual heatmaps tended to produce more diffuse activity zones, reflecting both the coarser temporal resolution and the subjective estimation of the placement of points on schematic maps. Furthermore, manual scoring remains limited by observer fatigue and reaction time. This highlights the potential for automated methods to provide a more precise and scalable approach to welfare monitoring, especially in settings where large datasets are required.
Despite the overall strong agreement, some discrepancies occurred, highlighting that camera placement and perspective can strongly influence outcomes and must be carefully managed in future applications. Broader coverage, multiple angles, or automated correction methods could help mitigate such issues. Additionally, pose estimation accuracy can be affected by factors such as lighting conditions or temporary occlusions. However, training models on diverse examples and environmental conditions can substantially improve robustness to these challenges.
The findings also raise the question of whether manual tracking remains necessary when reliable automated tools are available. While SLEAP successfully reproduced enclosure-use patterns and overall activity levels, manual observations still hold value for recording nuanced behaviors, social interactions, and context-specific welfare indicators that pose estimation cannot yet capture alone. Future work should therefore integrate pose-based behavior classification and multi-individual tracking, extending beyond space use to assess how animals interact with their environment, such as enrichment. As demonstrated by Hayden et al. (2022) [17], who successfully used pose estimation to quantify specific behaviors in primates. Such extensions would allow welfare monitoring, through computer vision, to move beyond spatial measures toward richer behavioral profiles. While our results demonstrate that automated pose estimation can replicate enclosure-use patterns and activity levels in this case study, further research is needed to assess its generalizability across species, individuals, and settings.
In summary, our results demonstrate that automated pose estimation has the potential to serve as a valid and efficient alternative to manual enclosure-use monitoring. While manual methods may remain important for capturing behavioral nuance, computer vision provides scalable, objective data that can strengthen welfare assessments by enabling continuous, fine-grained monitoring of activity and enclosure use.
5. Conclusions
This study demonstrates that computer vision, using pose estimation with SLEAP, can effectively replicate spatial tracking data obtained through manual observations in ZooMonitor. Both enclosure-use patterns and activity levels were captured consistently, highlighting automated tracking as a reliable and scalable method for monitoring welfare-relevant indicators in spider monkeys housed in Aalborg Zoo. Manual observations remain valuable for capturing behavioral nuance, but integrating AI-based methods can enhance the efficiency, consistency, and resolution of welfare assessments. Future work should expand on behavior classification and test generalizability across individuals, environments, and contexts to fully realize the potential of computer vision in zoo-based welfare monitoring.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Moscoso P. Link A. Defler T. de la Torre S. Cortés-Ortiz L. Méndez-Carvajal P.G. Shanee S. Ateles Fusciceps (Amended Version of 2020 Assessment). The IUCN Red List of Threatened Species 2021: E. T 135446 A 191687087 International Union for Conservation of Nature (IUCN)Gland, Switzerland 202110.2305/IUCN.UK.2021-1.RLTS.T 135446 A 191687087.en · doi ↗
- 2Campbell C.J. Aureli F. Chapman C.A. Ramos-Fernández G. Matthews K. Russo S.E. Suarez S. Vick L. Terrestrial Behavior of Ateles Spp.Int. J. Primatol.2005261039105110.1007/s 10764-005-6457-1 · doi ↗
- 3Aureli F. Schaffner C.M. Boesch C. Bearder S.K. Call J. Chapman C.A. Connor R. Fiore A.D. Dunbar R.I.M. Henzi S.P. Fission—Fusion Dynamics: New Research Frameworks Curr. Anthropol.20084962765410.1086/586708 · doi ↗
- 4Price E.E. Stoinski T.S. Group Size: Determinants in the Wild and Implications for the Captive Housing of Wild Mammals in Zoos Appl. Anim. Behav. Sci.200710325526410.1016/j.applanim.2006.05.021 · doi ↗
- 5Dew J.L. Foraging, Food Choice, and Food Processing by Sympatric Ripe-Fruit Specialists: Lagothrix lagotricha poeppigii and Ateles belzebuth belzebuth Int. J. Primatol.2005261107113510.1007/s 10764-005-6461-5 · doi ↗
- 6UNEP-WCMC Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) Appendices I, II and III Available online: https://cites.org/eng/app/appendices.php(accessed on 3 July 2025)
- 7Ramos-Fernández G. Wallace R.B. Spider Monkey Conservation in the Twenty-First Century: Recognizing Risks and Opportunities Spider Monkeys Campbell C.J. Cambridge University Press Cambridge, UK 2008351376978-0-521-86750-4
- 8Link A. Di Fiore A. Spehar S.N. Wrangham R.W. Muller M.N. Female-Directed Aggression and Social Control in Spider Monkeys Sexual Coercion in Primates and Humans Harvard University Press Cambridge, MA, USA London, UK 20151571830-674-03324-8
