Reliability and precision of thermal imaging measurements to study animal behaviour and welfare

Debottam Bhattacharjee; Marianne A. Mason; Alan G. McElligott

PMC · DOI:10.7717/peerj.20861·February 25, 2026

Reliability and precision of thermal imaging measurements to study animal behaviour and welfare

Debottam Bhattacharjee, Marianne A. Mason, Alan G. McElligott

PDF

Open Access

TL;DR

This study examines how reliable and precise thermal imaging is for measuring goat facial temperatures, finding that mean and maximum temperatures are consistent within sessions but vary across days.

Contribution

The study provides empirical evidence on the reliability and precision of thermal imaging for measuring animal surface temperatures, offering guidelines for its use in animal welfare research.

Findings

01

Mean and maximum surface temperatures showed high repeatability and precision across five measurements in all facial regions.

02

Minimum temperatures were more variable, with lower repeatability and precision.

03

Surface temperatures varied significantly across days, with session effects accounting for up to 85.85% of nasal temperature variation.

Abstract

The use of infrared thermal imaging has become increasingly popular in animal behaviour, health, and welfare research over the last decade. Yet, there is a lack of consensus regarding how this technique should be best applied when measuring peripheral temperatures in animals, including which regions of interest to favour. This fundamental issue necessitates checking the reliability and precision of thermal imaging data when taking repeated measurements, both over short and relatively long time windows. Using goats (Capra hircus) as a model, we investigated two subcategories of reliability, short-term repeatability (measurements taken in the same session) and reproducibility (over multiple sessions), as well as the precision of surface temperatures in two facial regions. We collected data from 20 goats over five measurement sessions, which took place on consecutive days. During each…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species8

Capra hircus Sus scrofa domesticus(domestic pig · subspecies)Sus scrofa(pig · species)Ovis aries(domestic sheep · species)Homo sapiens(human · species)Gallus gallus(bantam · species)Bos taurus(bovine · species)Equus caballus(domestic horse · species)

Chemicals1

Water

Diseases4

inflammation fever dry aggression

Figures2

Click any figure to enlarge with its caption.

Experimental set-up for thermal measurements.(A) Schematic of the stable where experiments took place. The thermal imaging camera was positioned 2.06 m from the front of Pen B, where subjects were held during testing. (B) Photograph demonstrating the set-up of the thermal imaging camera, laptop, and temperature logger. (C) View of experimental set-up indicating relative locations of the subject and thermal imaging camera, along with other equipment.

Thermal image showing regions of interest (ROI) placement: left eye, right eye, and nose tip (colour palette: “Fusion”).The red and blue triangles indicate the position of the pixel with the maximum and minimum temperature value, respectively, for each ROI.

Funding3

—Farm Sanctuary
—Ede & Ravenscroft
—University of Roehampton

Keywords

Animal cognitionEmotionsGoat–Capra hircusInfrared thermographyNon-invasive physiologySurface temperatureThermal cameraUngulates

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEffects of Environmental Stressors on Livestock · Animal Behavior and Welfare Studies · Infrared Thermography in Medicine

Full text

Introduction

Infrared thermal imaging is an approach increasing in prominence within the fields of animal behaviour, physiology, welfare, and veterinary research. The applications of this technology is based on the principle that all terrestrial objects with an absolute temperature exceeding zero kelvin generate radiant heat in the infrared region of the electromagnetic spectrum (Knízková et al., 2007; Speakman & Ward, 1998). Levels of infrared radiation emitted by objects can be detected via thermal imaging and used to generate a visual representation, allowing users to observe and quantify minute spatial and temporal disruptions in an object’s surface temperature (Knízková et al., 2007). When the object of interest is an endothermic animal, local fluctuations in blood flow, metabolic activity, tissue conductivity, and environmental heat exchange create a dynamic network of graded temperature zones occurring across the periphery of the skin (McCafferty, Gallon & Nord, 2015; Godyń, Herbut & Angrecka, 2019). By concentrating measurements on particular regions of interest (ROIs), generally areas where fur is thinner or absent, such as the eyes or nose, it is possible to investigate differences in skin temperature both between and within individuals, over time (Tattersall, 2016). Thus, these ROIs can help identify internal physiological and emotional processes. For instance, diversion of blood away from peripheral blood vessels caused by stress acts to infuse core muscles and organs required in ‘fight or flight’ responses and redistributes heat, causing a drop in peripheral temperature. This redistribution of body heat during emotional experiences has been demonstrated in positive (e.g., Proctor & Carder, 2015; Proctor & Carder, 2016; Tamioso et al., 2017; Wongsaengchan et al., 2023) as well as negative contexts (e.g., Stewart et al., 2008a; Stewart et al., 2008b; Herborn et al., 2015; Herborn et al., 2018; Uddin, McNeill & Phillips, 2023; Bhattacharjee et al., 2024; Ramirez Montes de Oca et al., 2024; see review by Ghezzi et al., 2024). Further, skin temperature changes caused by local increases in metabolism and blood flow can be indicative of inflammation in underlying tissues or differences in reproductive receptivity (Kominsky, Campbell & Colgan, 2010; LokeshBabu et al., 2018; Façanha et al., 2018; Byrne et al., 2019; Mota-Rojas et al., 2021). While the non-invasive application of thermal imaging is advantageous (Tattersall, 2016), the sensitivity of peripheral temperature and thermal imaging itself to a suite of different factors can make this technology difficult to apply with precision in practice.

Thermal imaging offers a valuable non-invasive window into an animal’s internal physiological and emotional state (McCafferty, 2007). As surface temperature is shaped by autonomic regulation of blood flow and metabolic heat production, it can provide insights into processes such as stress, arousal, and thermoregulation that are otherwise difficult to measure without restraint or use of invasive sensors (cf. McCafferty, Gallon & Nord, 2015; Godyń, Herbut & Angrecka, 2019). Reliable surface temperature measures, therefore, allow researchers to distinguish biologically meaningful variation, e.g., a true change in arousal, from simple measurement error. Consequently, improving the reliability of thermal imaging measures has direct implications for the interpretation of emotional and welfare-related states in animals. Notably, animal surface temperatures are affected by several factors. Endogenous factors, such as individual, breed, sex, age, level of physical activity, and skin and coat characteristics, can lead to variations in surface temperatures (thickness and colour; Bartolomé et al., 2013; Rizzo et al., 2017; Jørgensen, Mejdell & Bøe, 2020; Jansson et al., 2021; Mota-Rojas et al., 2021). Environmental conditions, such as ambient temperature, humidity, and wind speed, also affect peripheral temperatures, either directly or via internal homeostatic mechanisms (Church et al., 2014; Jansson et al., 2021). Further extraneous factors affecting the accuracy of temperature readings include distance, and angle of the subject in relation to the camera, the camera model, the ROI used, its size, and the particular measure chosen, i.e., mean, maximum, or minimum temperature (Church et al., 2014; Howell, Dudek & Soroko, 2020; Ijichi et al., 2020; Playà-Montmany & Tattersall, 2021; Uddin et al., 2020; Kim & Cho, 2021). Particularly when measuring animal emotional responses, there is a distinct lack of consistency across investigations regarding ROIs chosen and specific temperature measures favoured. These have included, mean and minimum nasal temperatures (Proctor & Carder, 2015; Proctor & Carder, 2016; Kano et al., 2016; Brügger, Willems & Burkart, 2021; Bhattacharjee et al., 2024), maximum eye temperatures (Stewart et al., 2008a; Stewart et al., 2008b; Bartolomé et al., 2019), temperatures in specific orbital regions (e.g., lacrimal caruncle of the eye: Dai et al., 2015) and other regions, such as maximum temperatures of a chicken’s (Gallus gallus domesticus) comb and wattle (Herborn et al., 2015), and a pig’s (Sus scrofa domesticus) back (Boileau et al., 2019). Choosing an appropriate ROI thus depends not only on surface accessibility (e.g., fur coverage) but also on physiological factors that facilitate rapid changes in blood flow and may vary across body regions and species. In summary, establishing biologically meaningful ROIs and assessing reliability and precision of thermal imaging measures are therefore central for best practices in animal behaviour and emotion research.

To assess reliability, the degree of similarity between repeated measures taken from the same ROI and the subject must be determined (Bartlett & Frost, 2008; Fernández-Cuevas et al., 2015). In humans, high levels of repeatability or repeated measurements taken under identical conditions, and reproducibility or measurements taken in variable conditions, i.e., the two subcategories of reliability (cf. Bartlett & Frost, 2008), have been found in medical settings (Ammer, 2008; Petrova et al., 2018). By comparison, ambient conditions for animals can be more variable, e.g., on farms, in captivity, or even open environmental conditions (Cilulko et al., 2013), leading to substantial challenges when taking thermal imaging measurements. Also, animals can be less compliant than humans; making obtaining accurate thermal imaging measurements in animals challenging. Furthermore, there are differences in the recording and extraction procedures based on which thermal imaging data measures are processed and analysed. Even after controlling for these technical and sampling factors, variation in reproducibility over time may persist. Such residual variation may not entirely be the product of measurement error but moreover, reflect biologically meaningful within-individual fluctuations in physiological or affective states (Proctor & Carder, 2015; Proctor & Carder, 2016; Ermatinger, Brügger & Burkart, 2019; Uddin, McNeill & Phillips, 2023; Jao et al., 2025). Thus, variation in animal affective states can only be captured after appropriately controlling for the sampling and analytical factors related to the thermal imaging procedure.

Although less commonly employed, video analysis offers advantages over extracting temperature data from a series of still images collected manually using a handheld device (McManus et al., 2022). Video recording substantially increases the number of measurement frames that researchers can collect in a short time frame. This, in turn, enhances the precision of animal surface temperature estimates (Byrne et al., 2017; Scoley, Gordon & Morrison, 2019; McManus et al., 2022), defined in the current investigation as the temperature deviation within which the mean temperature measured from one to five thermal images is expected to fall in relation to the mean of five image replicates 95% of the time, as well as enabling a finer-grained analysis of skin temperature changes over time (e.g., for measuring respiration rate: Stewart et al., 2017; Jorquera-Chavez et al., 2019; and time courses of emotional responses: Stewart et al., 2008a; Stewart et al., 2008b; Herborn et al., 2015). Given that video footage can be captured from fixed positions, i.e., next to objects that animals regularly interact with, such as feeders and automated milking systems (Hoffmann et al., 2013), it minimises the need for human proximity. This latter point is important as the presence of experimenters may affect animal physiological responses, potentially undermining the validity and generality of results obtained (Moe et al., 2017; Cannas et al., 2018). However, assessing reliability and other practical aspects of this data collection method remain limited (Hoffmann et al., 2013; Cuthbertson, Tarr & González, 2019; Jorquera-Chavez et al., 2019).

Attempts to investigate the reliability of thermal imaging measures have been made previously. For example, in cattle (Bos taurus; Byrne et al., 2017; Scoley, Gordon & Morrison, 2019), sheep (Ovis aries; Byrne et al., 2019), and horses (Equus caballus; Howell, Dudek & Soroko, 2020). Yet, these findings indicate that reliability, and even the most informative ROI or temperature metric, can differ across species due to differences in anatomy, coat structure, and thermal physiology. As such, reliability cannot be assumed to generalise between species and needs to be established empirically. Goats (Capra hircus) represent a valuable yet under-studied model species in this context. Although thermal imaging has been applied in goats across specific physiological settings (e.g., Façanha et al., 2018; Bartolomé et al., 2019; Giannetto et al., 2020), comprehensive assessments of measurement reliability are lacking. Given goats’ increasing prominence in emotion and cognition research (Briefer, Oxley & McElligott, 2015a; Briefer, Tettamanti & McElligott, 2015b; Baciadonna et al., 2019; Mason et al., 2024; Mason et al., 2025) and the importance of robust physiological indicators in such studies, identifying reliable ROIs and temperature measures will facilitate the valid use of thermal imaging in welfare and behavioural science.

Here we evaluated the short-term repeatability (within-session) and multi-day reproducibility (across five consecutive days) of mean, maximum, and minimum eye and nasal temperatures in goats extracted from thermal imaging videos. Although lower reproducibility across days may reflect meaningful physiological or affective variation, identifying stable measurement parameters is crucial for interpreting such variation. Thus, our objective was to determine which ROI–measure combinations provides the most reliable indicators of surface temperature in goats, and thus informing future applications in behaviour, emotion, and welfare research, especially for small ungulates.

Materials & Methods

Ethics statement

All animal care and testing protocols were in line with ASAB guidelines for the use of animals in research (ASAB Ethical Committee/ABS Animal Care Committee, 2024). Further approval was granted through an ethical amendment (03.21) made to the project ‘goat perception of human cues’ (Ref. LSC 19/ 280) by the University of Roehampton’s Life Sciences Ethics Committee. All procedures were non-invasive, goats were unrestrained while being tested and tested in visual, acoustic and limited physical contact with other goats to avoid complete social isolation, showing no obvious signs of stress during experimental trials.

Study site & sample population

Experiments were conducted between 5th and 23rd July 2021 at Buttercups Sanctuary for Goats in Kent, UK (51°13′15.7″N 0°33′05.1″E; http://www.buttercups.org.uk/). During this period, the sanctuary was open to the public. It featured a large outdoor paddock to which goats had access during the daytime, before being housed individually or in small groups within a large stable complex at night (mean pen size = 3.5 m^2^). Goats had hay, grass, and water available ad libitum throughout the day, and were supplemented with commercial concentrate contingent on age and condition. We selected subjects based on their ease of handling and, as goats were tested in groups of three, on whether they showed evidence of affiliative relationships with two other goats as indicated by staff at the sanctuary (to minimise instances of aggression). Our final sample size comprised 20 goats (10 intact females and 10 castrated males) of various breeds and ages (see Table S1).

Experimental set-up

Experiments were carried out within a familiar stable that goats could freely access during the day. We kept the windows closed to minimise drafts and avoid direct sunlight falling on the subjects. Both the windows and the two skylights directly above pens A, B, and C were covered (Fig. 1). To improve consistency in imaging conditions between subjects and over repeated measurements, trials mainly took place within pen B, with other members of the subject’s group being placed in adjacent pens A and C. This was not always possible, and for two groups, some group members were placed in pen D to prevent agonistic interactions with the subject. Distractions from surrounding objects also meant two subjects were imaged in pen A so that their attention could be successfully directed towards the camera. Pens featured a hayrack to encourage subjects to spend time at the front of the pen (hay availability shown to elicit minimal emotional arousal in goats at the same study site: Briefer, Tettamanti & McElligott, 2015b). Water was available in pens A, C, and D, but not in B, to reduce the potential effect of the presence of moisture on the muzzle on goat nasal temperatures.

Experimental set-up for thermal measurements.(A) Schematic of the stable where experiments took place. The thermal imaging camera was positioned 2.06 m from the front of Pen B, where subjects were held during testing. (B) Photograph demonstrating the set-up of the thermal imaging camera, laptop, and temperature logger. (C) View of experimental set-up indicating relative locations of the subject and thermal imaging camera, along with other equipment.

The thermal imaging camera, FLIR A655sc (with a 25° lens), was mounted on a tripod and placed directly in front of the subject in pen B, with its height adjusted according to the subject’s head. This mid- to high-end camera has a spectral range of between 7.5–14 µm, a resolution of 640 × 480 pixels, a <0.03 °C thermal sensitivity, and ± 2 °C accuracy. During trials, the camera worked in tandem with the software FLIR ResearchIR Max v. 4.40.9.30 to record thermal imaging videos and save them to an external laptop (HP ProBook 650 G4) connected via a USB and Ethernet cable. Goats tended to spend more time near the front of their pen, and the camera was placed at a distance of 2.06 m away from the front of pen B. Due to the thermal imaging camera’s narrow field of view (25° × 19°, 31° diagonal), an approximately two metre subject-camera distance was chosen to minimise instances where the subject went out of frame, requiring the camera position and angle to be readjusted. At this distance, the mean number of pixels ± SD within each ROI was 134.79 ± 49.61 for the nasal region (range = 35–293), 164.18 ± 52.85 for the left eye (range = 51–358), and 144.27 ± 49.69 for the right eye (range = 42–316), which we deemed appropriate for analysis.

Experimental preparation & testing procedure

Portions of this text were previously published as part of a preprint (Bhattacharjee, Mason & McElligott, 2025; doi: 10.1101/2025.07.31.668027).

We tested goats in seven groups of three individuals each. Each group was tested in one session per day over five consecutive weekdays, except for one group whose final trial was carried out on Monday of the following week. Two to three groups of goats were tested per week over three weeks, with the order in which we tested groups on any given day being randomised, as was the order in which goats were tested within groups.

It is worth noting that at the study site and during the day, stables are usually open for cleaning and routine maintenance. Goats (when not grazing outdoors) therefore, could generally enter and would have likely been familiar with the stable where testing took place. Nevertheless, before commencement of testing, a group of goats were guided one by one into test pens using a small food incentive (dry pasta). Each goat was placed in a separate pen, with the goat to be tested in pen B. Once the last group member had been placed in their designated pen, we waited 10 min before testing to minimise the effects of physical activity, human disturbance, and direct sunlight exposure on temperature readings, as well as giving animals the opportunity to habituate to the test set-up. After 10 min had passed, we filmed the subject’s face continuously for four minutes. Mistakes made during recording meant that although most trials were recorded at a rate of 6.25 FPS, 12 out of 100 trials had a higher frame rate (maximum 24.97 FPS). Throughout a four-minute trial, the first experimenter silently maintained the goat’s attention (without food or tactile reinforcement) while the second experimenter manually focused the thermal imaging camera. If a subject moved out of frame during their trial, we quickly adjusted the camera’s position or height and, if necessary, its angle accordingly. After four minutes had passed, the subject was moved to an adjacent pen, and the next goat to be tested was moved into pen B. To minimise the effect of disturbance and movement artefacts on temperature measures, once the goats had been rearranged, we waited five minutes before we began the next trial. Once all three subjects had been tested, they were released from the stable.

Video processing & image analysis

A single coder extracted temperature data from the thermal imaging videos using ResearchIR. Firstly, as suitable frames were not always available (e.g., the goat was not looking at the camera), we broke each four-minute video into approximately 12 s segments and one still image was extracted for each of the five consecutive segments. Three hundred and seventy one of the 460 (or 80.65%) frames were collected within the first 60 s of the experimental trial, but the measurement period was shifted to later if we could not find suitable frames to analyse. Although 86.96% of the frames were collected within one minute and 30s, the latest frame analysed was extracted from three minutes 58.55s into an experimental trial. It must be noted that a goat’s level of arousal may have varied over the experimental period as they lost interest in the experimenter attempting to direct their attention towards the thermal imaging camera which could have affected resulting surface temperature measurements collected. A frame was considered suitable if all ROIs (left eye, right eye, and nose) were fully visible, the image was in focus, and the subject’s head was oriented at the camera, with images where the snout exceeded a 45° angle from the nasal plane being excluded from analysis. Furthermore, as the pattern of inhalation and exhalation affects temperatures in the nasal region, especially when it was possible to visualise a goat’s breathing cycle, we prioritised extracting frames just before the onset of inhalation (where nasal temperature began to drop).

Once five frames had been extracted per video trial, each image was calibrated according to the ambient temperature and humidity at the time of its collection. We obtained this environmental data using an EasyLog™ USB-2-LC Humidity and Temperature Data Logger which recorded ambient temperature and humidity in the stable at 10s intervals during testing (to the nearest 0.5 °C; mean temperature ± SD = 25.22 °C ± 3.02, range = 21–31.5 °C; mean relative humidity ± SD = 62.50% ± 7.17, range = 41.5–75.5%). The emissivity of the image was set to 0.98, a value generally considered to reflect that of biological tissues (Steketee, 1973), while the distance was specified as 2.06 m and the reflective temperature, 20 °C. Once an image had been appropriately calibrated, we used the elliptical tool to manually position ROIs. To minimise noise in temperature measures, ellipses were drawn tightly around the eye, incorporating only a thin, hairless border around each (see Fig. 2). For the nose, only the tip was included, positioned in-between both nostrils. Once the ROIs had been identified, we extracted the mean, maximum, and minimum temperature from each ROI. If part of the ROI was obscured, it was not included in the analysis.

Thermal image showing regions of interest (ROI) placement: left eye, right eye, and nose tip (colour palette: “Fusion”).The red and blue triangles indicate the position of the pixel with the maximum and minimum temperature value, respectively, for each ROI.

To summarise, goats underwent five trials each, with five frames analysed per trial. We aimed to extract nine temperature measurements per frame, the mean, maximum, and minimum temperatures from each eye and the nose tip. These measurement indices were chosen based on their prior use in thermal imaging research (e.g., mean: Proctor & Carder, 2015; Bhattacharjee et al., 2024; maximum: Stewart et al., 2008a; Stewart et al., 2008b; Bartolomé et al., 2019; minimum: Kano et al., 2016; Ermatinger, Brügger & Burkart, 2019; Brügger, Willems & Burkart, 2021). Of the 21 goats tested, one was excluded for consistently not looking towards the thermal imaging camera during tests. Removing 46 instances where suitable frames could not be found, we included a total of 454 thermal images in the analysis.

Statistical analysis

Repeatability of goat surface temperatures within a single measurement session

We carried out all analyses using R version 4.2.1 (R Core Team, 2022), and these were based on analyses conducted by Byrne et al. (2017). Using the package rptR (Stoffel, Nakagawa & Schielzeth, 2017) we conducted intraclass correlation analyses (ICC) to investigate repeatability of temperature measurements taken for a particular combination of ROI (left eye, right eye and nose) and measure (mean, minimum or maximum) from a single goat, in a short time frame (approximately one minute) and therefore, under relatively consistent conditions. The temperature data used in these analyses were collected during each goat’s final measurement session on the fifth day of testing. We fitted random effects models, specifying individual identity as a random effect (temperature ∼1 + (1— goat ID)). Models were used to partition the variance explained by temperature differences between goats, which were compared against total variance, to calculate repeatability (R):

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*}R= \frac{{\sigma }_{g}^{2}}{{\sigma }_{g}^{2}+{\sigma }_{e}^{2}} (\mathrm{x}100) \end{eqnarray*}\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {\sigma }{\mathrm{g}}^{2} $\end{document}$ refers to the between-goat variance and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {\sigma }{\mathrm{e}}^{2} $\end{document}$ , the error variance. R was multiplied by 100 to calculate the percentage variance explained by between-goat differences in temperature measures. 95% confidence intervals were computed around R using 10,000 bootstrapping iterations, and we employed likelihood ratio tests to investigate whether the addition of goat identity as a random effect improved model fit relative to null models excluding it.

We used the following formula to calculate the coefficient of variation (CV), for each combination of ROI and temperature measure:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*}CV= \frac{{\sigma }_{\mathrm{g}}}{\mu } \mathrm{x}100 \end{eqnarray*}\end{document}

where σg was the standard deviation of between-goat differences in temperature, and µ was the overall mean of temperature measurements taken from the particular ROI and descriptive measure in question.

Residuals of all random effects models were visually inspected, and in combination with the DHARMa package, we evaluated how closely they conformed to assumptions of normality (Hartig, 2021). Error structures for temperature measurements taken from both eyes were approximately normally distributed, except for the mean and maximum temperatures for the right eye and the maximum temperature of the left eye. To address these deviations, we investigated the presence of influential observations or outliers using Cook’s Distance (D) in each model (car package: Fox & Weisberg, 2019). To meet assumptions of normality, we excluded two observations exceeding eight times the average Cook’s D, four observations exceeding five times the average Cook’s D, and the most extreme multivariate observation for the mean and maximum temperature of the right eye and maximum temperature in the left, respectively. More extreme deviations from normality were observed in goat nasal temperature measures, specifically with respect to the assumption of homoscedasticity in the model residual distribution. However, as temperature is a continuous variable, transformations were ineffective at improving normality, and because alternative error structures such as Poisson distributions are more sensitive to violations to the assumptions (e.g., overdispersion), we assumed a Gaussian distribution for all analyses (Knief & Forstmeier, 2021). It must be noted, although effects of heterogeneous residual variance tend not to bias variance estimates explained by random effects (from which we calculated repeatability), these estimates may become less precise (Schielzeth et al., 2020). Therefore, repeatability measures of nasal temperatures are likely to be less reliable than those of goat eye temperatures. Although Gaussian models tend to be more robust to violations against distributional assumptions, they are still sensitive to multivariate outliers with high leverage. After plotting residuals of preliminary models, we identified and removed outliers exceeding four times the average Cook’s D (mean nasal temperature: eight observations excluded; maximum temperature: four observations excluded; minimum temperature: seven observations excluded). While the exclusion of these influential outliers helped improve the conformation of error residuals to assumptions of normality, they might have increased repeatability estimates due to the removal of temperature values less in line with other observations.

Precision in temperature estimates

Precision was defined using the following formula:

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*}{P}_{\mathrm{n}}=1.96\mathrm{x}\sqrt{ \frac{{\sigma }_{\mathrm{e}}^{2}}{{n}_{\in (1,5)}} } \end{eqnarray*}\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {\sigma }_{\mathrm{e}}^{2} $\end{document}$ was the error variance, and n was the number of thermal images (1–5). The right side of this formula calculates the standard error, and altogether it estimates the 95% confidence interval range for which the mean temperatures measured from 1–5 images were expected to fall in relation to the mean of five replicate measurements (Field, 2014; Byrne et al., 2017).

Reproducibility of temperature measures taken over multiple sessions

To investigate how temperature measures in each ROI varied over five days, we again used ICC analyses, this time specifying the measurement session and the individual goat tested as random effects (temperature ∼1 + (1— measurement session) + (1— goat ID), n = 10,000 bootstraps):

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*}R= \frac{{\sigma }_{s}^{2}}{{\sigma }_{g}^{2}+{\sigma }_{s}^{2}+{\sigma }_{e}^{2}} (\mathrm{x}100) \end{eqnarray*}\end{document}

where $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {\sigma }{\mathrm{s}}^{2} $\end{document}$ refers to the variance in temperature readings between sessions, $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {\sigma }{\mathrm{g}}^{2} $\end{document}$ to that explained by inter-goat differences and $\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document}$ {\sigma }_{\mathrm{e}}^{2} $\end{document}$ , the error variance. As conditions for each measurement session were not uniform between goats (e.g., session 2 did not take place at the same time or day for each goat), we nested measurement sessions within individuals by giving every measurement session a unique ID (Schielzeth & Nakagawa, 2013). Given this nesting pattern, the effect of measurement session can be interpreted as the temperature variance explained by measurement session alone, combined with that explained by the interaction between goat identity and measurement session, or how readings from individual goats varied over the five days they were imaged (Schielzeth & Nakagawa, 2013). We used likelihood tests to investigate whether adding a measurement session as a random effect significantly improved model fit against reduced models excluding this variable.

To ensure temperatures measured in the eye were approximately normally distributed, we excluded observations exceeding 10 times Cook’s D when analysing the mean (six observations excluded), maximum (two observations excluded), and minimum temperatures of the left eye (nine observations excluded), as well as the mean of the right eye (seven observations excluded). In addition, we excluded the single most extreme observation for minimum temperatures measured in the right eye. When fitting linear random effects models to predict variance in nasal temperatures, deviations from normality were again more extreme, so a more stringent criterion of excluding observations exceeding four times Cook’s D was employed (for mean, maximum, and minimum nasal temperatures, 15, 16, and 15 outliers were excluded, respectively).

Results

Repeatability of goat surface temperatures within a single measurement session

Adding an individual-level random effect to investigate inter-goat differences improved the fit of models. This helped predict the surface temperature measured on each goat’s fifth day of testing for all combinations of descriptive measure and ROIs (all p < 0.0001: Table 1). Overall, eye temperatures were restricted to a narrower range of values than those measured in the nose tip (Table 2), with the latter showing greater variability in temperature around the sample mean for each measure (higher CVs: Table 1). This variability in nasal temperatures can largely be attributed to inter-individual differences between goats, with within-goat differences over five repeated measures (error variance) being comparable to those found in eye temperatures. Consequently, repeatability of nasal temperatures was slightly higher than in either eye, but was nonetheless excellent for mean and maximum temperatures across all ROIs (rule of thumb: <0.5 poor repeatability, 0.5−0.75 moderate repeatability, 0.75−0.9 good repeatability and >0.9 excellent repeatability: Koo & Li, 2016), with between 93.50% (maximum temperature of right eye) and 99.81% (mean nasal temperature) of the total variation in temperature being attributed to the individual goat tested (Table 1).

Table 1: Repeatability of goat surface temperatures within a single measurement session.Between-goat variance, within-goat error variance, coefficient of variation, proportion of variance explained by between-goat differences (repeatability: R), 95% confidence intervals for R, and p-values associated with the addition of between-goat differences versus null models excluding this effect for temperatures measured in each region of interest and descriptive measures on the fifth day of testing.

Table 2: Range of goat facial temperature measures within a single and across five sessions.Mean temperature, standard deviation, and temperature range for each goat facial region of interest and measures taken within a single session (day 5) and across all five sessions.

With respect to descriptive measures, variability was highest in minimum temperatures (larger temperature range and higher CV), which included greater variation both between and within goats, across repeated measures (Tables 1 and 2). Accordingly, minimum temperature values for the left and right eye showed lower repeatability, although it was still moderate to good, with 76.51% and 75.19% of total variation in surface temperatures, respectively, being explainable by the individual goat tested (Table 1). However, for nasal temperatures, although repeatability was slightly lower (and likely negligibly so), it was still excellent, with 99.47% of temperature variance explained by differences in surface temperature between goats. By comparison, differences in repeatability between mean and maximum temperatures were more negligible across all ROIs (confidence intervals feature substantial overlap); although repeatability was slightly higher and within-goats differences, slightly lower for the mean eye temperatures, with this especially being the case for the right eye. The opposite was true for nasal temperatures, with differences across repeated measures being slightly lower for maximum temperatures.

Precision in temperature estimates

The magnitude of standard error was consistently higher (i.e., precision was lower) for the minimum temperature measurements, meaning more replicate thermal images would be required to achieve a similar level of precision as when using mean or maximum temperatures, across all ROIs (Table 3). For example, if we were to extract the minimum temperature of a goat’s left eye from three images, we would be expected to obtain a mean temperature within ± 0.62 °C of the mean of five images 95% of the time. However, if we extracted the mean temperature in the same ROI from a single image, we would be expected to be within ± 0.26 °C of the mean of five images 95% of the time. Differences in precision between mean and maximum temperatures were less, but slightly higher for goat mean eye temperatures and the maximum temperature measured in the nose tip. The ROI and measure for which we observed the highest levels of precision was in the mean temperature of the right eye (0.231−0.103 °C), which also showed the least variability across repeated measurements (lowest error variance; Tables 1 and 3).

Table 3: Precision in goat facial temperature estimates.Precision, i.e., the standard error (°C) within which the mean of one to five replicate thermal images (P1-5) for each goat facial region of interest and measure is expected to fall in relation to the mean of five image replicates 95% of the time.

Reproducibility of temperature measurements taken over multiple sessions

Adding the random effect ‘measurement session’ increased the fit of models predicting variation in goat surface temperatures measured over five days for all combinations of ROIs and measures (all p < 0.0001; Table 4). The effect of measurement session (sessions 1 to 5) was strongest in the nose but still accounted for a substantial proportion of variation in eye temperatures. For instance, between-session effects explained around 80.83% and 85.85% of variance in mean and maximum nasal temperatures, respectively, and between 61.71% and 67.01% of variation in mean and maximum eye temperatures (between-session effects similar for both eyes).

Table 4: Reproducibility of surface temperature measurements taken over multiple sessions.Between-measurement session variance, coefficient of variation, and proportion of temperature variance associated with between-session effects (repeatability: R), 95% confidence intervals for session repeatability, p-values associated with the addition of between-session effects against models excluding it, proportional variance associated with individual-level effects only, and that associated with within-goat error variance for each region of interest and measure.

When considering across ROIs, minimum temperatures were more variable (larger temperature range; Table 2), including between measurement sessions and within a session, across repeated measures taken from the same goat (Table 4). For example, within-goat variation accounted for 17.10% of the total variation in minimum temperature of the right eye, but only 2.51% and 6.08% of that in mean and maximum temperatures in the same eye, respectively. Indeed, the proportion of unexplained variation was slightly larger for the maximum, compared to the mean temperatures across ROIs. However, although slight differences were observed between ROIs, the effect of measurement session explained a similar proportion of the total variation in the goat mean and maximum surface temperatures (confidence intervals featured substantial overlap), with the measurement session explaining less variation in minimum temperatures.

Discussion

The use of non-invasive thermal imaging technology in the fields of animal health and welfare research has increased over the last decade, but there remains a lack of consensus on how it should be best applied, including which regions and measures should be favoured when quantifying animal surface temperatures (reviews: Mota-Rojas et al., 2021; McManus et al., 2022). To address this knowledge gap, we used goats to investigate repeatability and precision of mean, maximum, and minimum eye and nasal temperatures taken from five thermal images collected in quick succession from videos (at 12-second intervals for approximately one minute). We also investigated the reproducibility of temperature estimates across five measurement sessions taking place over consecutive days. From these measurements, we assessed which combinations of ROI and temperature measure are the most reliable for quantifying differences in goat surface temperatures. Our results indicate that although replicate measurements taken from individual goats in the short term showed substantial repeatability and high precision, surface temperatures from individual goats were not readily reproducible across days.

Repeatability of goat surface temperatures within a single measurement session

Repeatability of five thermal images taken in quick succession was excellent for mean and maximum temperature across all ROIs, with between 93.50% (maximum temperature of the right eye) and 99.81% (mean nasal temperature) of variation being attributed to the individual goat. By contrast, minimum temperatures were less repeatable and more variable both between and within goats across repeated measures. Although both maximum and minimum temperatures were derived from single-pixel values, the latter were particularly vulnerable to underestimation due to factors such as debris, moisture, or small errors in ROI positioning that can produce artificially low readings (Metzner et al., 2014; Byrne et al., 2017; Uddin et al., 2020). By contrast, overestimation of surface temperature (which would affect maximum values) requires external energy input, such as direct solar radiation, which was absent in our study as testing was carried out indoors (see Jerem et al., 2018). Consequently, maximum temperatures tend to be more stable under controlled ambient conditions (cf. Jerem et al., 2015; Jerem et al., 2018). In line with similar investigations (cf. Metzner et al., 2014; Byrne et al., 2017; Uddin et al., 2020), we therefore advocate against the use of minimum temperature measures when conducting thermal imaging in goats, and more generally, in other species.

Given that the variation in mean and maximum temperatures is greater between goats than within the same goat over repeated measurements, both measures are appropriate for quantifying orbital region and nasal temperatures over a short period. However, temperature variation across repeated measures was slightly lower, and precision higher for mean over maximum eye temperatures (and especially in the right eye). This could suggest using the mean is more reliable for measuring eye temperatures, and as for nasal temperatures, given the slightly lower within-goats differences and higher precision observed in maximum temperatures, the latter may be more suitable. As mean temperatures are estimated from all pixels within a given ROI, one key consideration affecting repeatability is the consistency with which ROI boundaries are defined across thermal images (e.g., the ROI’s size, shape, and position; Cuthbertson, Tarr & González, 2019). By contrast, measuring maximum eye temperatures can often be achieved by extracting the maximum temperature of thermal images “as a whole” (e.g., Jerem et al., 2015), substantially reducing processing time. This is especially important for video analysis where a large number of measurement frames can be gathered in a short time (Hoffmann et al., 2013; Cuthbertson, Tarr & González, 2019). However, maximum temperatures like the minimum, are calculated from the value of a single pixel, so they are more sensitive to measurement inaccuracies than the mean. According to manufacturer specifications, temperatures measured by the camera used for the current investigation are expected to be within 2 °C of an object’s genuine surface temperature. These issues, to an extent, can be overcome by ensuring the highest quality thermal images, camera calibration to ambient conditions (Kim & Cho, 2021), and/ or through using data processing techniques to smooth raw temperature measures and reduce the influence of outliers (e.g., Cuthbertson, Tarr & González, 2019 using video footage). Although the current research can make recommendations regarding the suitability of mean and maximum temperatures, researchers must carefully weigh the advantages and disadvantages of each when deciding which measure best suits their needs.

When comparing ROIs, we found that the between-goat differences against total variation were greater for nasal temperatures than for either eye. However, given differences in the suitability of methods used to process and analyse nasal and eye temperature data, whether observed differences in repeatability are genuine is difficult to verify. Specifically, the repeatability estimates for temperatures of the nose tip may be less precise than those measured in the eyes (Schielzeth et al., 2020; Knief & Forstmeier, 2021), and the more stringent removal of outliers in the former likely boosted repeatability. What can be concluded, however, is that nasal temperatures were far more variable between goats, with eye temperatures being restricted to a narrower range of values.

Precision in temperature estimates

We found that minimum temperatures measured from one image can differ by as much as 1.07 °C from our gold standard, i.e., the average of five images, limiting its efficacy for making even broad distinctions, such as between stressed versus unstressed, or sick versus healthy goats. To achieve a similar precision in minimum temperatures as was found for mean and maximum temperatures, more images would be required. By contrast, a single image may be sufficient (although not recommended) to measure large temperature changes when using the mean and maximum measures. We were able to attain a maximum precision of ± 0.10 °C from five images using the mean temperature of the right eye. Given this region and measure also showed lower variability across repeated measures, this may highlight the suitability of this combination when measuring goat surface temperatures over a single session. More broadly, mean temperatures were slightly more precise when measuring goat eye temperatures, with maximum values performing slightly better for nasal temperatures. Goats have been reported to show an increase in eye temperature of approximately 1.1 °C following exposure to a stressor (Bartolomé et al., 2019) and differences of 1–2.2 °C in rectal temperature between febrile and non-febrile animals (Van Miert et al., 1984). Considering these effect sizes, the level of precision achieved for mean and maximum temperature measures should be sufficient to detect biologically meaningful changes in thermal responses to stress or disease. By contrast, the lower precision of minimum temperatures would limit their utility for detecting such subtle physiological changes, further supporting our recommendation to avoid using minimum temperature metrics in thermal imaging analyses. Although changes in core temperature may not be perfectly mirrored in peripheral regions, to better detect fever, emotional experiences, and other physiological processes uncertainty in surface temperature estimates must be reduced. Through averaging the temperatures across multiple thermal images collected in quick succession (assuming temperature is approximately stable over time), standard error will decrease proportionally with an increasing number of measurement frames, thereby increasing precision (Byrne et al., 2017). Nonetheless, this improvement eventually plateaus, and beyond a certain number of frames, additional measurements yield negligible gains in precision (cf. Jerem et al., 2015). Obtaining a large number of measurement frames in a short time window is feasible when collecting thermal imaging videos (Hoffmann et al., 2013; Cuthbertson, Tarr & González, 2019), but device memory and RAM capacity, as well as time needed to process video frames, can still limit the number of replicate measures it is practical to take. Nevertheless, the observed precision gained through measuring mean and maximum temperatures repeatedly over multiple thermal images may enable future researchers to effectively measure subtle surface temperature changes associated with less arousing emotional experiences (Proctor & Carder, 2015; Tamioso et al., 2017) and earlier stages of diseases in goats.

Reproducibility of temperature measurements taken over multiple sessions

We investigated the reproducibility of goat surface temperatures measured over five sessions on consecutive days. Across all regions, minimum temperatures showed the greatest variability and therefore the lowest reproducibility. This variation also included components not explained by between-goat differences or identifiable session effects, suggesting that minimum temperatures remain particularly vulnerable to noise arising from unknown sources (e.g., random measurement errors or subtle inconsistencies in ROI placement). Taken together, our findings highlight the limited suitability of minimum temperatures for thermal imaging applications, both in the short and longer term.

Unlike eye temperatures, which varied within a relatively narrow range, goat surface temperatures measured in the nose tip were highly variable. When measurements were taken across five days, a large proportion of the variation in the nasal temperatures could be attributed to between-session effects (74.61–85.85%), indicating that the day or context of measurement influenced surface temperatures. Although part of this pattern could reflect the stricter removal of outliers in nasal temperature data, it is also likely that the region is inherently sensitive to environmental and physiological conditions present at the time of imaging. Previous research has shown that nasal temperatures in livestock and other animals fluctuate with ambient temperature, humidity, subtle changes in posture, and emotional arousal (Church et al., 2014; Proctor & Carder, 2015; Proctor & Carder, 2016; Ijichi et al., 2020). By contrast, given the proximity of the orbital region to the brain and its ample blood supply, eye temperatures have a stronger association with core body temperatures compared to other peripheral regions (George et al., 2014; De Ruediger et al., 2018; Bleul, Hässig & Kluser, 2021; Kim & Cho, 2021), including in goats (preliminary study, eye-rectal temperature: r = 0.956; Marques et al., 2021). Although lower than for nasal temperatures, we found that a substantial proportion of variation in eye temperatures could be attributed to differences in surface temperatures between sessions (between 49.59–67.01%). While some of this variation could be associated with shifts in core temperature (in relation to, e.g., circadian rhythm: Giannetto et al., 2020) and factors relevant to imaging, the potential influence of internal or affective states of the animals cannot be disregarded (cf. Beausoleil, Stafford & Mellor, 2004; Proctor & Carder, 2015; Proctor & Carder, 2016; Ermatinger, Brügger & Burkart, 2019; Lees et al., 2020; Uddin, McNeill & Phillips, 2023; Jao et al., 2025).

The high variability in nose tip temperatures between goats limits the effectiveness of this ROI for detecting meaningful differences. Consequently, larger expected effect sizes or sample sizes will be required to reliably distinguish between treatment groups (Ledolter & Kardon, 2020). However, pre-existing variation in peripheral temperatures inherent to a sample of goats can, to an extent, be controlled for by focusing measures at the individual level. Indeed, for such a purpose, the sensitivity of a particular region to various external and internal parameters can be an asset. For example, temperatures in a chicken’s comb and wattle (which play key roles in thermoregulation), unlike eye temperatures, changed with stressor intensity, enabling finer-grained measurements of emotional responses (Herborn et al., 2015). Similarly, in ewes, more pronounced changes in temperature were observed in the muzzle relative to the eyes, making the former a more practical region for detecting ovulation (De Freitas et al., 2018). As well as across repeated measurements taken from a single subject, temperatures measured in the eye, especially, have been used to compare among groups of animals, in relation to, for example, exogenous factors, like breed and sex (Jansson et al., 2021), and screening febrile from non-febrile animals (e.g., Schaefer et al., 2007; De Diego et al., 2013; Bleul, Hässig & Kluser, 2021). However, given the importance of imaging conditions on the eye, as well as nasal temperatures, it suggests such comparisons should be made with caution.

Our results suggest a high repeatability in surface temperatures measured from a single subject in the short-term (within one session) where conditions were consistent, so intuitively to compare among goats it may be better to test multiple animals under similar conditions (e.g., through testing subjects in quick succession) and preferably over multiple sessions (Uddin et al., 2020). Tighter control over environmental conditions should not only be recommended for between-subjects designs, with one investigation finding that an animal’s baseline temperature influenced the magnitude of subsequent skin temperature changes following exposure to a stressor (Herborn et al., 2015). For practical reasons, we imaged goats at a distance of just over two metres and at 90° to the nasal plane (frontal view). However, as the distance between the camera and the target object increases, intervening gases absorb a greater proportion of the radiant heat emitted by that object, so less is detected (Okada, Takemura & Sato, 2013). As goats were imaged at a distance exceeding the often recommended value for thermal imaging research (≈ 1m), and at a probably less than optimal angle (recommended angle: 90° to the sagittal plane), measured surface temperatures would likely have been lower than actual temperatures, and less accurate (Okada, Takemura & Sato, 2013; Church et al., 2014; Jorquera-Chavez et al., 2019; Ijichi et al., 2020). Moreover, as subjects were free to move, the exact distance and angle between camera and subject varied between goats and across repeated measurements, introducing a less systematic source of variation into temperature estimates. Additional considerations include how best to define eye temperature, in particular. In the eye, the posterior border of the eyelid and lacrimal caruncle are especially known to be richly supplied by a dense array of capillary beds (Kim & Cho, 2021; Mota-Rojas et al., 2021). Indeed, studies analysing temperature patterns across distinct anatomical structures of the orbital region (e.g., medial canthus and lacrimal sac) have shown that specific regions of interest (ROIs) exhibit stronger correlations with rectal temperature than measurements taken over the entire eye (Kim & Cho, 2021; Shu et al., 2022). In line with this, the consistently observed differences in precision between temperatures measured in the right and left eye may also have been influenced by small positional (e.g., individual posture) or illumination differences, rather than by physiological asymmetry (but see Baruzzi et al., 2018). In summary, imaging conditions, as well as a subject’s internal affective state may explain the observed variation in our findings and should be carefully considered in future investigations to enhance reproducibility.

Conclusion

We found that the mean and maximum surface temperatures measured in the eyes and nose tips of goats were highly repeatable, at least in the short term. In addition, these temperature measures showed high levels of precision, with a single image potentially being enough to make broad distinctions, such as sick from healthy, or stressed from unstressed animals; although using more than one image is recommended to enhance precision. However, given the strong influence of measurement sessions, goat surface temperatures were not readily comparable across days, highlighting the importance of ambient imaging conditions, as well as a subject’s internal affective state on temperature estimates. Moreover, the external validity of our findings may be limited by species specificity and the relatively short-term nature of the measurements. Therefore, researchers using thermal imaging in small ruminants should consider focusing measurements at the individual level, and/or further refining the methodology used here (e.g., using a more optimal measurement distance and angle), as well as exerting tighter control over ambient conditions and perhaps using a more precise ROI (e.g., localised to a specific orbital region). Given the non-invasive nature of thermal imaging and the importance of animal body surface temperatures as indicators of animal health and welfare, investigations like ours are becoming increasingly important to identify approaches that effectively exploit this technology to its fullest potential. Therefore, an appropriate use of the thermal imaging technique holds considerable value in the fields of livestock production systems, precision livestock farming, transportation of animals, animal shelters, and behaviour and cognitive research settings.

Supplemental Information

10.7717/peerj.20861/supp-1Supplemental Information 1Description of study subjectsSubject name, sex, breed, age, number of years at the study site, group membership and week tested (1, 2 or 3).

10.7717/peerj.20861/supp-2Supplemental Information 2R-Script providing step-by-step analytical procedure used in the research

10.7717/peerj.20861/supp-3Supplemental Information 3Data generated from the research

10.7717/peerj.20861/supp-4Supplemental Information 4ARRIVE Checklist

Bibliography89

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Ammer K 2008 The glamorgan protocol for recording and evaluation of thermal images of the human body Thermology International 18125144
2ASAB Ethical Committee/ABS Animal Care Committee 2024 Guidelines for the ethical treatment of nonhuman animals in behavioural research and teaching Animal Behaviour 207IXI 10.1016/S 0003-3472(23)00317-2 · doi ↗
3Baciadonna L Briefer EF Favaro L Mc Elligott AG 2019 Goats distinguish between positive and negative emotion-linked vocalisations Frontiers in Zoology 1612510.1186/s 12983-019-0323-z 31320917 PMC 6617626 · doi ↗ · pubmed ↗
4Bartlett JW Frost C 2008 Reliability, repeatability and reproducibility: analysis of measurement errors in continuous variables Ultrasound in Obstetrics and Gynaecology 3146647510.1002/uog.525618306169 · doi ↗ · pubmed ↗
5BartoloméE Azcona F Cañete Aranda M Perdomo-González DI Ribes-Pons J Terán EM 2019 Testing eye temperature assessed with infrared thermography to evaluate stress in meat goats raised in a semi-intensive farming system: a pilot study Archives Animal Breeding 6219920410.5194/aab-62-199-201931807630 PMC 6852872 · doi ↗ · pubmed ↗
6BartoloméE Sánchez MJ Molina A Schaefer AL Cervantes I Valera M 2013 Using eye temperature and heart rate for stress assessment in young horses competing in jumping competitions and its possible influence on sport performance Animal 72044205310.1017/S 175173111300162624067493 · doi ↗ · pubmed ↗
7Baruzzi C Nawroth C Mc Elligott GA Baciadonna L 2018 Motor asymmetry in goats during a stepping task Laterality: Asymmetries of Body, Brain and Cognition 23559960910.1080/1357650 X.2017.141799329258375 · doi ↗ · pubmed ↗
8Beausoleil NJ Stafford KJ Mellor DJ 2004 Can we use change in core body temperature to evaluate stress in sheep?Proceedings of the New Zealand Society of Animal Production 647276