Balancing Objectivity and Welfare: Physiological and Behavioural Responses of Guide Dogs During an Independent Certification Protocol

Viola Faerber-Morak; Lisa-Maria Glenk; Karl Weissenbacher; Annika Bremhorst

PMC · DOI:10.3390/ani15131896·June 26, 2025

Balancing Objectivity and Welfare: Physiological and Behavioural Responses of Guide Dogs During an Independent Certification Protocol

Viola Faerber-Morak, Lisa-Maria Glenk, Karl Weissenbacher, Annika Bremhorst

PDF

Open Access

TL;DR

This study examines if the Austrian certification process for guide dogs causes stress, finding that it does not significantly affect their welfare.

Contribution

The study introduces a welfare-sensitive certification protocol for guide dogs that balances objectivity with animal well-being.

Findings

01

Cortisol levels did not significantly differ between the two evaluation phases.

02

Dogs turned around more in Phase 2, possibly seeking reassurance, but showed fewer stress-related behaviors.

03

Verbal praise from the unfamiliar tester may have helped reduce stress.

Abstract

Guide dogs support blind and visually impaired individuals by enabling safe, independent mobility. Austria is the first country to legally mandate that each guide dog be certified by an independent authority. This certification includes a two-phase evaluation: in Phase 1, the dog guides the familiar trainer; in Phase 2, he guides an unfamiliar blind tester. While Phase 2 ensures an objective assessment of guiding performance, it may also place considerable stress on the dog—potentially affecting welfare and performance. This study evaluated whether Phase 2 induces elevated stress in dogs and whether the protocol requires refinement by comparing the dogs’ responses in the two phases. The data was collected during a real guide dog evaluation. We measured salivary cortisol levels before the evaluation day and at several time points on the evaluation day (before and after each phase). We…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Canis lupus familiaris(dog · subspecies)

Chemicals1

Cortisol

Diseases2

visually impaired blind

Figures2

Click any figure to enlarge with its caption.

Funding1

—Federal Ministry Republic of Austria—Social Affairs, Health, Care and Consumer Protection

Keywords

guide dogsworking dogsindependent evaluationstress assessmentsalivary cortisolbehaviour analysistask performancecertification protocolsdog welfareblind handlers

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman-Animal Interaction Studies · Rabies epidemiology and control · Animal Behavior and Welfare Studies

Full text

1. Introduction

Guide dogs play an essential role in assisting blind or visually impaired individuals by enhancing their mobility, independence, and social inclusion [1,2]. Beyond their functional role in navigation, guide dogs offer emotional support and companionship and are often considered integral members of the family [3,4]. To ensure that guide dogs are both behaviourally capable and emotionally suited to their demanding role, certification procedures are widely used. Yet, little scientific attention has been paid to the potential stress such procedures may cause for the dogs themselves. Certification procedures with familiar and unfamiliar handlers confirm the ability of working dogs to perform a skill, regardless of who provides the cues. The certification process is not merely a formality but a safeguard ensuring that only dogs with the appropriate behavioural and emotional resilience are matched with individuals who depend on them daily. Research suggests that the perceived compatibility between guide dogs and visually impaired owners is marked by an overall high level of satisfaction [5], but nonetheless more than a third of guide dogs are returned to training organisations before retirement [6].

Despite their critical contributions, the regulation and certification processes for guide dogs vary significantly across European countries. This lack of consistency in standards—particularly regarding access rights and certification criteria—poses challenges for guide dog handlers, complicates the independence of the human–dog dyad [7], and raises concerns about the absence of unified, independent statutory inspections to evaluate the dogs’ capabilities and welfare.

As the demand for guide dogs increases, so does the need for standardised evaluation protocols. Currently, most inspection and qualification criteria are defined by individual guide dog training centres and associations, which can lead to variability and potential bias. By contrast, independent testing offers a more objective and transparent framework for assessing a dog’s health, task performance, environmental safety, and suitability for its role. Such evaluations increase the likelihood that guide dogs meet the highest quality standards, fostering trust in the certification process and safeguarding both the welfare of the dogs and the needs of their handlers.

Austria stands out as a pioneering country in the certification and welfare of guide dogs. In 1999, it became the first country to enact a national law (Section 39a Federal Disability Act) that formally defines the requirements, legal status, and qualifications for guide dogs. This legislation also established an independent audit centre to evaluate assistance dogs, ensuring that certified guide dogs meet rigorous standards of health, behaviour, and training. Since then, over 100 guide dogs have been certified in Austria, with around 30 examinations conducted annually.

The Austrian certification process consists of two main steps. First, a quality evaluation assesses the dog’s individual performance and training while working with the familiar trainer who prepared the dog for his working role and with an unfamiliar blind tester. Given the efforts and costs related to guide dog raising and training, the role of the unfamiliar blind tester in the Austrian protocol is necessary in the current protocol of the certification process to identify successful canine candidates. Only dogs that pass this evaluation can later proceed to the second step: a team assessment conducted after the dog has spent several weeks to months with its future handler, who is blind or visually impaired. This process ensures that both the dog’s capabilities and the human–dog partnership are carefully evaluated—offering a model that balances high performance expectations with a commitment to animal welfare.

The quality evaluation itself is conducted in a naturalistic setting and includes two test phases. In Phase 1, the dog guides his familiar trainer through a public route. In Phase 2, the dog guides an unfamiliar blind tester, with the trainer required to leave the testing environment entirely to eliminate any potential visual, olfactory, or auditory cues. This separation is intended to ensure that the dog’s behaviour is not influenced by the presence of the familiar person, thereby allowing for an objective assessment of the dog’s performance under unfamiliar conditions. This second phase enables an objective assessment by an independent blind or visually impaired evaluator but may also place additional emotional demands on the dog. However, while the unfamiliar handler condition is crucial for evaluating the dog’s ability to perform independently of its trainer, it raises questions about potential stress and its impact on performance.

Efforts within the Austrian certification protocol have been made to minimise stress during the examination process, as stress can compromise both guide dog welfare and the validity of performance assessments. This study aimed to evaluate whether the current two-phase Austrian protocol strikes an appropriate balance between objective evaluation and the emotional well-being of the dogs. To address this, we assessed both physiological and behavioural indicators of stress and performance across the two test phases—guiding a familiar trainer and an unfamiliar blind tester. In addition, we considered the behaviours of the human partners (trainer and tester), as these may influence dog behaviour. To gain a nuanced understanding of both immediate and evolving responses, we analysed the first 5 min and the full 15 min of each test phase separately.

Because dogs cannot verbally express internal states like stress, we must rely on measurable proxies. One widely used physiological proxy is salivary cortisol, widely used in dog welfare research as a non-invasive proxy for HPA axis activation as it has been associated with elevated arousal in dogs, for instance during separation from their handler (as described in [8]). However, the reliability of salivary cortisol in pets and working dogs has recently been called into question. A growing body of evidence highlights substantial intra- and inter-individual variability as well as limited correlation with serum levels, particularly under real-world conditions [8,9]. To strengthen the assessment of canine stress, we combined the physiological measurements with behavioural observations. Behavioural indicators such as lip licking, yawning, or body shaking are commonly used markers of stress and emotional arousal in dogs [10,11,12]. In addition, task-related performance behaviours—such as the ability to follow verbal cues by the handler, collaborate, and guide effectively—provide insight into how dogs cope with the cognitive and emotional demands of the evaluation. The behaviour of the human partner may further moderate the dog’s experience, either buffering or amplifying stress.

This multi-modal, context-sensitive approach aims to provide a comprehensive picture of dogs’ emotional and behavioural responses during certification. We hypothesised that dogs would show lower stress levels—both physiologically (as reflected in salivary cortisol) and behaviourally (as reflected in stress-related behaviours)—as well as more stable task-related performance when guiding their familiar trainer compared with an unfamiliar blind tester.

By systematically comparing physiological and behavioural data across the two phases of the Austrian evaluation, this study provides an evidence-based assessment of whether the current protocol appropriately balances welfare considerations with the need for independent, objective certification. The findings are intended to inform refinements of guide dog assessment procedures and contribute to internationally applicable standards that prioritise both functional performance and animal well-being.

2. Materials and Methods

2.1. Ethical Considerations

The procedures of this study were approved by the Ethics and Welfare Committee of the University of Veterinary Medicine Vienna, in accordance with GSP guidelines and national legislation (reference number ETK-11/03/2016). Written informed consent was obtained from all dog owners prior to participation.

2.2. Subjects

Fourteen guide dogs participated in this study during their real quality evaluation phase of the Austrian certification process for official recognition as guide dogs. The sample included six breeds: 10 Retrievers, 2 Collies, 1 Poodle, and 1 Labrador crossbreed. All dogs were neutered (8 females, 6 males) and ranged in age from 18 to 45 months (mean = 26.3 months, median = 24.5 months; also see Table 1). Each dog was owned by their respective guide dog trainer and had successfully passed comprehensive veterinary health screenings prior to the evaluation, including radiographic assessments of hips, shoulders, and elbows.

2.3. Procedures

Data collection was conducted during the formal guide dog evaluation as part of the Austrian certification process, taking place under real-life conditions. Each evaluation involved up to six individuals, including the guide dog trainer, an examination committee (comprising a blind or visually impaired tester, an expert in dog behaviour and training, and an examination chair), a mobility trainer, and the experimenter.

The evaluation protocol started at the Messerli Research Institute, located on the campus of the University of Veterinary Medicine in Vienna, Austria. The evaluation consisted of two test phases:

Phase 1—Trainer Test: In the first 45 min session, the guide dog trainer, blindfolded to simulate visual impairment, was guided by the dog along a predefined public route, beginning on the campus (including a brief obedience task) and continuing through urban environments.
Break: Following Phase 1, a break of at least 30 min was provided in a calm, park-like setting to allow the dog to rest, drink water, and gradually become acquainted with the blind tester.
Phase 2—Tester Test: In the second 45 min session, the dog was tasked with guiding the unfamiliar blind tester along the return route back to campus. However, a different route as in Phase 1 was used. To ensure the independence of the evaluation, the trainer was instructed to leave the scene entirely—out of sight, hearing, and smell range of the dog—unless the dog refused to proceed with the unfamiliar person. In such cases (observed in one dog, Armani), the trainer followed the dog and tester at close distance to provide support without interfering with the test. This phase also included navigating various forms of public transport (e.g., tram, bus, and subway), further reflecting the realistic challenges of everyday life.

2.4. Data Collection and Analysis

2.4.1. Physiological Measure: Salivary Cortisol

Saliva samples were collected to measure cortisol levels at the Institute of Medical Biochemistry at the University of Veterinary Medicine in Vienna using a validated enzyme immunoassay [13] as used in several previous studies [14,15,16]. All saliva samples (except two samples) were taken by the guide dog trainer in less than 4 min to prevent the procedure causing any stress to the dog [17]. The guide dog trainers were trained to collect saliva samples using standardised protocols, including detailed written instructions and demonstration images. To minimise contamination, the trainers wore gloves and used cotton swabs (Salivette^®^, Sarstedt, Biedermannsdorf, Austria), which were gently rubbed inside the dog’s cheek for approximately 60 s. If insufficient saliva was produced, the trainers used olfactory stimulation (e.g., holding a food reward) to encourage salivation. The samples were stored in iceboxes during exams and later frozen at −20 °C for further analysis.

Sampling Schedule

Saliva samples were collected at multiple predefined sampling time points on the day before the evaluation (pre-evaluation day) and on the day of the evaluation (see Figure 1).

Pre-evaluation day sampling time points (PES1–PES3):

PES1: Morning, immediately after waking, before feeding or exercise.PES2: Noon, approximately six hours after PES1.PES3: Evening, approximately six hours after PES2.

Evaluation day sampling time points (ES4–ES9):

ES4: Morning, immediately after waking, before feeding or exercise.ES5: Start of Phase 1—Trainer Test.ES6: End of Phase 1—Trainer Test.ES7: Start of Phase 2—Tester Test.ES8: End of Phase 2—Tester Test.ES9: Approximately 30 min after ES8.

To align with the approach of Glenk and co-authors [18], the sampling time points PES1, PES2, and ES4 were selected to represent baseline cortisol levels in the home environment across different times of day. Due to the potential influence of circadian rhythms on cortisol secretion, the evening sample (PES3) was excluded from the baseline calculation to avoid potential confounding effects. Instead, a baseline value was calculated by averaging the salivary cortisol concentrations from PES1, PES2, and ES4.

The sampling schedule was designed to capture the time course of cortisol in relation to the two test phases of the guide dog evaluation (approximately 45 min each). Given that cortisol typically peaks 20–30 min after a stressful event [19], the timing of samples was selected to detect potential physiological responses relevant to the study aims.

Analysis

Descriptive statistics (means and standard deviations) were calculated for six salivary cortisol measures: the computed baseline (average of PES1, PES2, and ES4) and the five individual post-baseline time points (ES5, ES6, ES7, ES8, and ES9). The Kolmogorov–Smirnov test was used to assess the normality of the data, and log transformations were applied where necessary to improve normality. To evaluate differences in cortisol concentrations across the six measurement points, a repeated measures ANOVA was conducted using the log-transformed values. Corrections for potential violations of sphericity (Greenhouse–Geisser and Huynh–Feldt) were applied to ensure the robustness of the analysis. Post hoc pairwise comparisons were conducted with adjustments for multiple testing to identify specific differences between time points. To further investigate potential time-related effects, multivariate tests were conducted to assess the combined effects of time on the dependent variables across all sampling points.

2.4.2. Behavioural Measures

Sampling

The behaviour of the dogs during both test phases (Trainer and Tester) was recorded using a GoPro Hero 4 camera (San Mateo, CA 94402, U.S.), mounted on the upper body of the person being guided (either the trainer or the blind tester). Video recordings were subsequently analysed using Solomon Coder Beta 17.03.22 (Copyright András Péter).

Video coding began at the moment the human handler—either the trainer or the tester—issued the first guiding cue to the dog, initiating movement. For each phase, the first 15 min of active guiding were coded to capture both immediate and sustained behavioural responses.

Behavioural Variables

We analysed three categories of behaviours: stress-related behaviours, task-related performance behaviours, and handler behaviours. All behaviours were coded in terms of their frequency of occurrence during the observation periods.

Stress-related behaviours: Six stress-related behaviours were selected based on the previous literature [10,11,12,17,18,20,21,22,23]. These behaviours included lip licking, shaking, smacking, scratching, stretching, and yawning (see Table 2). To quantify overall behavioural stress responses, a Standardised Stress Score (SSScore) was created from these variables. Due to its very low frequency, stretching was excluded from the final score.

To compute the SSScore, the frequency of each behaviour was first standardised using z-scores. Standardisation involved subtracting the mean frequency of a behaviour from the individual scores and dividing by the standard deviation. This approach adjusted for variance among behaviours and normalised the contribution of each behaviour, ensuring they were comparable on the same scale. The standardised z-scores were then combined to produce the SSScore, preventing any single behaviour from disproportionately influencing the measure due to differences in scale or frequency distribution. The final SSScore was calculated as a mean index of the five standardised stress behaviours (lip licking, shaking, smacking, scratching, and yawning).

Task-related performance behaviours: These behaviours reflect key aspects of the dog’s task performance and included the variables Refuse signal and Turn around (Table 2). They were chosen for their relevance to the dog’s collaboration and guidance abilities, and because they were reliably identifiable from the video perspective. Behaviour definitions were discussed and agreed upon in advance to ensure consistent coding.

Handler behaviours: Two variables were coded to assess human behaviour during the test: Praise (verbal reinforcement) and Treat (food-based reward; see Table 2). These variables provided context for the dog’s behavioural responses and were considered potential moderating factors in stress or performance.

Statistical Analysis

Reliability analysis: To ensure the reliability of the behavioural coding, intraclass correlation coefficients (ICCs) were calculated for each coded behaviour. This allowed us to assess the consistency of ratings provided by two independent observers (the second observer coded a subsample of videos (from four of the subjects, which were randomly selected) and confirm the robustness of the behavioural data.

Modelling behavioural data: To gain a nuanced understanding of both the immediate and evolving behavioural responses, we analysed the first 5 and 15 min of each test phase (Trainer and Tester) separately. These two timeframes allowed us to distinguish between the initial reactions to the testing situation and behavioural patterns that emerged or changed over time.

We used linear mixed models to investigate the effects of key predictors on the behavioural responses of the dogs. The dependent variables included the stress-related behaviours (SSScore), the task-related performance behaviours (Refuse signal and Turn around), and the handler behaviours (Praise and Treat). The fixed effects included test phase (Phase 1—Trainer; Phase 2—Tester), dog sex (male or female), and age (in months). To account for repeated measures and inter-individual variability, each dog was included as a random effect in the models. Where necessary, transformations were applied to the behavioural data to meet the assumptions of normality required for linear modelling.

3. Results

3.1. Physiological Parameter: Salivary Cortisol

3.1.1. Preliminary Analyses

Descriptive Statistics

The raw cortisol concentrations ranged from 1.879 ng/mL (SD = 1.712) at ES8 to 2.736 ng/mL (SD = 1.846) for the baseline. To meet assumptions of normality, logarithmic transformations were applied. After logarithmic transformation, the means ranged from 0.0043 ng/mL (SD = 0.613) at ES7 to 0.2452 (SD = 0.341) at ES9. Data completeness was 100% for all time points except ES6 (one sample missing for dog Maya).

Normality Testing

The Kolmogorov–Smirnov tests indicated non-normality for all variables except the baseline (p = 0.200). Significant deviations were found for ES5, ES6, ES7, ES8, and ES9 (p < 0.05). Log transformations improved normality across most variables (p > 0.05), and the log-transformed data were therefore used for further analyses, despite a slight reduction in normality for the baseline (p = 0.008).

3.1.2. Inferential Analyses

Repeated Measures ANOVA

To assess the changes in cortisol over time, a repeated measures ANOVA was conducted using the six log-transformed cortisol values (baseline, ES5–ES9). The analysis revealed no significant overall effect of time (F(5,60) = 0.811, p = 0.546). This result remained non-significant when applying corrections for sphericity violations (Greenhouse–Geisser: p = 0.494; Huynh–Feldt: p = 0.523).

Pairwise Comparisons Between Time Points

Pairwise comparisons between the six log-transformed cortisol values showed no statistically significant differences (all p > 0.05; Figure 2). The largest mean difference was observed between the baseline and ES7 (−0.272, p = 0.229), while the comparison between ES7 and ES9 yielded the lowest p-value (mean difference = −0.250, p = 0.112).

Cortisol Profile

Although statistical differences were absent, the cortisol profile (Figure 2, see Supplementary Materials) showed a descriptive pattern: levels appeared to decrease during the test with the familiar trainer and rise again during and after the test with the unfamiliar blind tester. These trends suggest individual variability but do not indicate a consistent or significant physiological stress response to the test conditions.

3.2. Behavioural Parameters

3.2.1. Reliability Analysis

Overall, the ICC values indicated excellent inter-rater reliability across most behavioural variables. Perfect agreement (ICC = 1.00) was observed for several behaviours, including shaking, scratching, and yawning. While some ICC values could not be computed due to uniform ratings (resulting in undefined variance), all available ICCs exceeded acceptable thresholds for reliability. Table 3 provides a detailed overview of the ICCs, confidence intervals, F-test values, and significance levels for each behaviour.

3.2.2. Behavioural Data Analysis: Early Response (First 5 min)

Stress-related behaviours: The Standardised Stress Score (SSScore), which quantifies stress-related behaviours, did not differ significantly between the two test phases (Trainer vs. Tester). Similarly, no significant effects were found for sex or age (see Table 4 and Table 6).

Task-related performance behaviours: The frequency of the log-transformed “Turn around” behaviour was significantly higher in Phase 2 (Tester) compared with Phase 1 (Trainer; p = 0.0251; see Supplementary Materials), suggesting that dogs were more likely to look back when guiding an unfamiliar person. A marginally significant effect of sex was observed (p = 0.0561), with male dogs tending to perform this behaviour more frequently than females. Age did not significantly influence this variable. In contrast, the frequency of the log-transformed “Refuse signal” behaviour was not significantly affected by test phase, sex, or age (Table 4 and Table 6).

Handler behaviours: Neither of the two handler-related behaviours—“Praise” and the log-transformed “Treat”—showed significant effects of test phase, sex, or age (Table 4 and Table 6; see Supplementary Materials).

3.2.3. Behavioural Data Analysis: Extended Response (First 15 min)

Stress-related behaviours: The SSScore showed a marginal effect of test phase (p = 0.0576), with dogs displaying slightly lower stress scores in Phase 2 (Tester) compared with Phase 1 (Trainer). No significant effects of sex or age were observed (see Table 5 and Table 6).

Task-based performance behaviours: The frequency of the “Turn around” behaviour was significantly higher in Phase 2 (Tester) than in Phase 1 (Trainer; p = 0.007), indicating that dogs were more likely to look back when guiding an unfamiliar person. A significant effect of sex was also found (p = 0.024), with male dogs performing this behaviour more frequently than females. Age had no significant effect. For “Refuse signal”, no significant influence of test phase, sex, or age was detected, although descriptively, the behaviour occurred more often in Phase 2 (Tester; Table 5 and Table 6; see Supplementary Materials).

Handler behaviours: The behaviour “Praise” occurred significantly more frequently in Phase 2 (Tester) than in Phase 1 (Trainer; p = 0.0115), whereas “Treat” showed a marginally significant decrease in Phase 2 compared with Phase 1 (p = 0.0569). Neither sex nor age significantly influenced these variables (Table 5 and Table 6; see Supplementary Materials).

4. Discussion

In the present study, we found no evidence that guiding an unfamiliar blind tester induced higher stress in dogs compared with guiding their familiar trainer. Cortisol levels remained stable across phases, and behavioural stress indicators (SSScore) did not differ significantly. However, dogs turned around significantly more often when guiding the unfamiliar tester—potentially seeking reassurance. The tester also gave significantly more verbal praise, which may have helped to buffer stress. Overall, the evaluation protocol did not appear to elicit substantial physiological or behavioural stress, supporting its continued use as a welfare-conscious certification approach.

Although the descriptive patterns of the salivary cortisol measures suggested a decrease during the test with the familiar trainer and appeared to increase again during the test with the blind tester, these trends did not reach statistical significance. Studies that particularly focused on canine stress reported that cortisol increases beyond baseline thresholds when the dog is confronted with arousing stimuli [24,25]. Moreover, human interaction style and personality have been shown to affect their dogs’ cortisol secretion during sequences of interaction [26,27]. The fact that cortisol concentrations assessed at the end of the evaluation (following the phase of interaction with the unfamiliar person and reunion with their familiar trainer) were similar to home baseline values do not point to any substantial strain caused by the examination protocol or change in handler. Previously reported elevations in cortisol levels among guide dogs, as compared to companion dogs, suggest a sustained activation of the HPA axis that is potentially linked to the unique demands of their working role [28].

However, the interpretation of salivary cortisol must be approached with caution. Meta-analyses and recent studies have highlighted substantial intra- and inter-individual variability [8], as well as weak correlations with serum cortisol in real-world settings [9]. Also the cortisol data gathered in this study point to a high variability. These findings raise questions about the validity of salivary cortisol as a standalone stress marker, particularly in applied working dog contexts. Importantly, this reinforces the need to complement hormonal measures with behavioural data, which may offer greater ecological validity.

The Standardised Stress Score, aggregating multiple behavioural indicators of stress (e.g., lip licking, yawning, and shaking), did not differ significantly between the two test phases during the early (5 min) observation. However, during the extended (15 min) phase, the SSScore tended (p = 0.058) to be lower when dogs guided the unfamiliar blind tester compared to their familiar trainer—a finding that contrasts with our initial hypothesis. One possible explanation for this unexpected pattern could lie in the behaviour of the handler. The blind tester provided significantly more verbal praise than the trainer during the evaluation. Verbal praise has been shown to act as a form of social support and stress buffering in dogs [29,30], which may have moderated the dogs’ arousal levels during this phase. Interestingly, while the trainer tended to offer more food treats, this difference did not reach significance.

Another possible explanation for the patterns we found in the present study could lie in the phenomenon of emotional contagion—the transfer of emotional states between individuals. Emotional contagion between dogs and humans is well documented in the literature [31,32,33]. For humans, examinations are perceived as a potent source of stress and fear, which consequently affects cortisol secretion and cognitive performance [34]. One moderating factor in the human–dog emotional contagion is the duration of dog ownership [35], which ultimately is connected to the intensity of the relationship. In fact, recent data indicate that dogs exhibit less empathetic behaviour to strangers in unfamiliar environments [36] but in owner–dog dyads, empathetic traits predispose dogs to be influenced by their owners’ anxiety [37]. Given that the dogs had built an intense relationship with their trainer but not with the blind tester, it could be possible that they were, therefore, more likely to be affected by the stress load of their familiar handler in this examination situation which could result in the observed behaviour effect. Similarly, the blind tester was also likely less concerned about the outcome of the certification and, therefore, less emotionally involved.

However, this interpretation remains speculative, as we did not assess the physiological or behavioural indicators of stress in the human handlers. Yet, previous research has shown that handlers’ stress levels can influence working dog behaviour [38,39], and even in pet dogs, long-term cohabitation with a stressed owner has been associated with elevated canine cortisol levels [40]. Future studies should therefore include measures of human stress (e.g., heart rate variability and self-report questionnaires) to fully understand the dyadic dynamics at play during certification.

The frequency of the “Turn around” behaviour—which could be interpreted as seeking visual contact or reassurance—was significantly higher in Phase 2 than in Phase 1 in both the early and extended observation periods. Data by Wanser & Udell [41] suggest that insecurely attached therapy dogs tended to gaze more often at their handlers while working. Our finding suggests that the dogs may have experienced some uncertainty or discomfort when guiding the unfamiliar person, potentially looking back to locate their trainer or reassess the situation. This behaviour was also more frequent in male dogs, thereby adding to the existing body of sex-based differences in canine social behaviour [42]. Sex differences in canine behaviour have been widely reported including aspects like aggression, fearfulness, playfulness, gazing, preferred paw use, and sociability with strangers [42]. The results of the review by Scandurra and collaborators [42] suggest that in general, females scored higher on sociability and cooperation with strangers. In line with these findings, the female dogs in our study possibly were more willing to cooperate with an unfamiliar person and because of that they may have needed less visual reassurance.

Despite these subtle behavioural differences, the dogs generally continued to perform reliably across both phases when guiding their familiar trainer and unfamiliar tester. There were no significant differences in the frequency of “Refuse signal” behaviours—an important task-related performance measure—between phases, suggesting that the dogs were able to execute cues regardless of the handler’s familiarity. This also supports the notion that guide dogs are capable of transferring their training to new contexts and individuals.

Our results also highlight the importance of human behaviour during evaluations. While task performance is often seen as a measure of the dog’s capability alone, our findings indicate that the interaction style of the handler may influence the dog’s emotional state. Increased praise in Phase 2 may have mitigated stress, whereas fewer treats may reflect different reinforcement strategies such as verbal praise, eye contact, or body language. These observations support the calls for including handler behaviour as a variable in future certification research.

The trends observed in the current study warrant further investigation to validate the robustness of these findings. Such insights could inform the further refinement of international certification standards, ensuring that independent evaluations, like the Austrian protocol, continue to prioritise both welfare and functional performance. Our findings also underscore the importance of evaluating stress responses in a nuanced manner and highlight the need for further research to account for the handler’s influence on the dog’s stress levels. The nuanced behavioural trends—such as increased “Turn around” behaviour and differences in human interaction styles—highlight the complexity of assessing stress in working dogs. Future studies should aim to replicate these findings with larger sample sizes and expand the range of physiological indicators by including additional biomarkers such as heart rate variability [43] or salivary immunological markers [44]. Importantly, human-related variables, such as the stress levels and interaction styles of trainers and testers, should also be systematically measured.

Of note, the present findings rely on the Austrian protocol and, therefore, cannot be generalised to the certification standards in other countries. However, some preliminary insights were gained on dog welfare indicators and performance during guide dog certification, which to date, is a relatively unexplored area of research.

5. Conclusions

To conclude, our hypothesis that handling by an unfamiliar tester universally induces heightened stress and so guide dogs would exhibit higher stress and reduced performance when guiding an unfamiliar tester was not supported by our data. Instead, the findings suggest that dogs seem to adapt well to the unfamiliar context and do not seem to experience substantial stress during the evaluation. The continued use of an unfamiliar tester thus appears justifiable as a means of ensuring objectivity, especially as it allows performance to be assessed independently of the dog’s established relationship with the trainer. Based on our findings, the current Austrian protocol appears to strike an appropriate balance between maintaining dog welfare and achieving objective assessments. As such, it may serve as a potential model for other countries or guide dog organisations seeking to develop or refine certification procedures that prioritise both transparency and animal well-being.

Bibliography44

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Audrestch H.M. Whelan C.T. Grice D. Asher L. England G.C.W. Freeman S.L. Recognizing the Value of Assistance Dogs in Society Disabil. Health J.2015846947410.1016/j.dhjo.2015.07.00126364936 · doi ↗ · pubmed ↗
2Whitmarsh L. The Benefits of Guide Dog Ownership Vis. Impair. Res.20057274210.1080/13882350590956439 · doi ↗
3Lane D.R. Mc Nicholas J. Collis G.M. Dogs for the Disabled: Benefits to Recipients and Welfare of the Dog Appl. Anim. Behav. Sci.199859496010.1016/S 0168-1591(98)00120-8 · doi ↗
4Glenk L.M. Weissenbacher K. PřibylováL. Stetina B.U. Demirel S. Perceptions on Health Benefits of Guide Dog Ownership in an Austrian Population of Blind People with and without a Guide Dog Animals 2019942810.3390/ani 907042831284677 PMC 6680747 · doi ↗ · pubmed ↗
5Lloyd J. Budge C. Stafford K. Handlers’ Expectations and Perceived Compatibility Regarding the Partnership with Their First Guide Dogs Animals 202111276510.3390/ani 1110276534679787 PMC 8532721 · doi ↗ · pubmed ↗
6Lloyd J. Budge C. La Grow S. Stafford K. An Investigation of the Complexities of Successful and Unsuccessful Guide Dog Matching and Partnerships Front. Vet. Sci.2016311410.3389/fvets.2016.0011428018910 PMC 5159482 · doi ↗ · pubmed ↗
7Bremhorst A. Mongillo P. Howell T. Marinelli L. Spotlight on Assistance Dogs—Legislation, Welfare and Research Animals 2018812910.3390/ani 808012930049995 PMC 6115927 · doi ↗ · pubmed ↗
8Cobb M.L. Iskandarani K. Chinchilli V.M. Dreschel N.A. A Systematic Review and Meta-Analysis of Salivary Cortisol Measurement in Domestic Canines Domest. Anim. Endocrinol.201657314210.1016/j.domaniend.2016.04.00327315597 · doi ↗ · pubmed ↗