Ageing modulates the effects of scene complexity on visual search and target selection in virtual environments
Isaiah J. Lachica, Aniruddha Kalkar, James M. Finley

TL;DR
Older adults struggle more with visual search in complex environments, showing slower performance and more distractions compared to younger adults.
Contribution
The study reveals how visual complexity affects visual search differently in older versus younger adults in virtual environments.
Findings
Increased visual complexity led to longer task completion times for all participants.
Older adults showed greater difficulty with gaze transfer and target selection in complex environments.
Working memory capacity was linked to visual search performance in dynamic settings.
Abstract
Processing task-relevant visual information is important for many everyday tasks. Prior work demonstrated that older adults are more susceptible to distraction by salient task-irrelevant stimuli, leading to less efficient visual search. However, these studies often used simple stimuli, and less is known about how ageing influences visual attention in environments more representative of real-world complexity. Here, we test the hypothesis that ageing impacts how the visual complexity of the environment influences visual search. Young and older adults completed a virtual reality-based visual search task in environments with increasing visual complexity. As visual complexity increased, all participants exhibited longer times to complete the task, which resulted from increased time transferring gaze from one correct target to the next and increased delay between when correct targets were…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5|
cognitive assessment |
cognitive domain |
young adults (mean ± s.d.) |
older adults (mean ± s.d.) |
|
|---|---|---|---|---|
|
Montreal Cognitive Assessment |
global cognition |
28 ± 2 |
27 ± 3 |
0.0440* |
|
Corsi Block task |
short-term memory |
6 ± 1 |
5 ± 1 |
0.0014* |
|
Backwards Corsi Block task |
working memory |
5 ± 1 |
5 ± 2 |
0.229 |
|
predictor |
β estimate |
standard error |
|
|---|---|---|---|
|
VR task completion time |
1.254 |
0.324 |
<0.001 |
|
VR task completion time: medium visual complexity level |
-0.080 |
0.171 |
0.641 |
|
VR task completion time: high visual complexity level |
-0.400 |
0.180 |
0.029 |
|
response |
predictor: | |||
|---|---|---|---|---|
|
Montreal Cognitive Assessment |
Corsi Block task |
Backwards Corsi Block task |
age | |
|
VR task completion time |
−1.46 (0.47)* |
0.04 (1.28) |
−2.65 (0.83)* |
0.30 (0.06)* |
|
number of task-relevant re-fixations |
−1.55 (0.95) |
−3.20 (0.61)* |
0.17 (0.04)* | |
- —National Science Foundationhttp://dx.doi.org/10.13039/100000001
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural and Behavioral Psychology Studies · Human-Automation Interaction and Safety · Spatial Neglect and Hemispheric Dysfunction
Introduction
Selecting and processing relevant visual information from the environment are vital for planning and executing everyday tasks [1–4]. For example, when walking in environments where precision stepping is required, we use vision to plan routes, step over obstacles or step onto safe footholds [5–7]. However, the ability to properly direct vision to relevant visual stimuli appears to be impaired in older adults. While walking, older adults appear to be more easily distracted by future threats on their path than young adults as they shift their gaze away from the current stepping target to the next before completing the step [8,9]. This gaze behaviour can be problematic as it can result in greater stepping errors [10]. Understanding how we select and process visual information in the environment and how these processes change with ageing is crucial for preventing potentially injurious behaviours.
Natural and human-made environments are filled with visual stimuli that may be relevant or irrelevant for guiding future actions. Which visual stimuli are selected and processed may depend on how overt visual attention is directed. The control of visual attention has been prominently described as a balance between bottom-up and top-down processes. Bottom-up processes emphasize saliency, a characteristic of visual stimuli based on low-level visual features such as contrast and luminance [11], which makes highly salient stimuli ‘pop out’ of a scene and capture attention automatically. In contrast, top-down processes control visual attention based on cognitive processes that prioritize factors such as task demands, prior knowledge or the gist of realistic scenes [12–15]. These processes are weighed flexibly as attention can be controlled primarily by top-down factors when completing goal-directed tasks or by bottom-up salience when faced with unexpected but important stimuli. However, this balance between bottom-up and top-down control appears to depend on cognition, which is known to decline with age [16,17].
Older adults appear more susceptible to distraction by irrelevant visual stimuli than young adults due to age-related changes in their cognition, particularly in their ability to exert top-down inhibition [18–20]. These effects prevent older adults from suppressing the automatic capture of attention by salient task-irrelevant stimuli, negatively affecting their performance in visually demanding tasks [21–24]. Impaired capacity for inhibition can also lead to task-irrelevant information being encoded into working memory at the cost of task-relevant information [22,25]. However, prior studies on age-related changes in visual search used simple visual scenes composed of arbitrarily selected shapes and letters. It is unclear whether these age-related effects on visual attention extend to performing tasks in scenes more closely reflecting the visual complexity in everyday visual search.
The trail making test [26] is a visual search task commonly used to assess cognitive domains such as visual attention, working memory and inhibition [27–29]. It is typically implemented in a pen-and-paper format with numbers and letters as search targets and completion time as the primary outcome measure. Performance in the test appears to decline with age as older adults tend to take more time to complete the task [30–34]. Recent attempts have been made to improve the generalizability of the test by transforming it into a three-dimensional pointing task in virtual reality (VR) [35]. Studies have also supplemented completion time with eye-tracking measures, such as average fixation time and number of fixations, to improve the interpretability of test results and identify differences in experimental [36] and neurological [37,38] conditions. However, the context in which people search for targets in both test versions does not reflect the visual complexity of search targets or the level of distraction present in everyday visual search. Questions remain on how increasing visual complexity influences visual search in young and older adults.
We aimed to determine how ageing-related differences in cognition influence visual search and target selection in increasingly complex virtual environments. Participants completed a custom-designed VR visual search task derived from the paper-based Trail Making Test-B in three levels of increasing visual complexity [39]. To determine the influence of visual attention on task performance, we recorded eye movements from which we calculated the time between fixations on correct targets, average and total fixation times, number of fixations and the saliency of fixated regions. We also determined how the actions involved in target selection using a laser pointer influenced task completion time by measuring the number of selection errors and the delay between fixating and selecting a correct target. Additionally, all participants completed assessments of global cognition, short-term memory and working memory. We hypothesized that as the complexity of the visual scene increased, participants would take longer to complete the task because attention would be attracted by task-irrelevant stimuli, leading to a reduced capacity to encode correct targets in working memory. Specifically, all participants would be more prone to re-fixating search targets and fixating on salient, task-irrelevant distractors, leading to longer times between fixations on correct targets. In addition, we hypothesized that this performance decline would be greater in older adults due to ageing-related impairments in their cognition.
Methods
Participants
2.1.
Fifteen young (9 female, age: 27.7 ± 3.33 years) and 15 older adults (9 female, age: 71.8 ± 4.46 years) with normal or corrected-to-normal vision participated in the study. Young adult participants were recruited from the University of Southern California’s student population, while older adult participants were recruited throughout the greater Los Angeles community through word of mouth or the laboratory’s previous participant database. Data from two older adults were excluded due to hardware/software issues. We conducted a sample size calculation for mixed-effects regression models using GLIMMPSE [40] and using means and standard deviations for VR task completion time (pooled s.d. = 10.45 s), task-relevant fixation time (pooled s.d. = 0.040 s) and saliency of fixated regions (pooled s.d. = 3.25%) from pilot data. A prior study found a large effect (Cohen’s d = 1.8) when comparing task performance between young and older adults using a direct implementation of the Trail Making Test-B [35]. As such, we chose to estimate a medium effect size (Cohen’s d = 0.45) to account for potential small differences between our age groups when comparing their gaze behaviours. Specifically, we aimed to detect a 20% difference in these variables between age groups and visual complexity levels. We found the largest sample size estimate across our three dependent variables and determined that a sample size of 26 (13 for each age group) was needed for a target power of 0.80. All study procedures were reviewed and approved by the University of Southern California’s Institutional Review Board. All participants provided written informed consent before participating and were provided monetary compensation for their time. All aspects of the study conformed to the principles described in the Declaration of Helsinki.
Experimental protocol
2.2.
All participants completed the paper-based Trail Making Test-B [26] to allow for comparison with their performance in the VR-based visual search task. Additionally, the participants completed a battery of tests selected to assess different domains of cognition, including the Montreal Cognitive Assessment [41] for global cognition, the Corsi task [42] for short-term memory and the Backwards Corsi task [43] for working memory. The Montreal Cognitive Assessment was administered on paper while the rest of the cognitive assessments were completed on a computer using Psytoolkit [44,45]. Higher scores in the Montreal Cognitive Assessment, the Corsi Block task and the backwards Corsi Block task indicate better global cognition, short-term memory and working memory capacity.
We also had the participants complete the Flanker task in Psytoolkit, a modified version of the original task by Eriksen and Eriksen designed to assess inhibitory control [46]. This implementation of the Flanker task calculates the Flanker effect as the difference in reaction times between trials with incongruent and congruent visual stimuli, with more positive numbers indicating slower reaction times in the incongruent than in the congruent trials. We found a small Flanker effect in the young adults (M ± s.d. = 5 ± 70.17 ms) and an unusual negative Flanker effect in the older adults (M ± s.d.= −10.77 ± 95.82 ms), with the Flanker data from both groups exhibiting large standard deviations. Using separate one-sample t-tests, we found that neither the young (t = 0.276, p = 0.787) nor the older adults (t = −0.405, p = 0.692) exhibited a significant Flanker effect. As such, we considered our Flanker data to be unreliable and we elected to omit them from any further analysis.
After completing the cognitive assessments, the participants completed three familiarization trials of the visual search task. These trials were completed in virtual environments with only the search targets present in front of a blank background to emphasize the identity of the search targets. In the first familiarization trial, the experimenter guided the participant through a single repetition of the task to provide a clear description of the task goal, the visual and auditory feedback that they will encounter and the search target sequence. The guided familiarization trial was followed by two trials where participants completed the task independently and were only provided feedback by the experimenter at the end. After the familiarization trials, the participants completed 30 trials of the visual search task presented in three sets of 10 trials, each corresponding to one of three visual complexity levels. Only the data from the last five trials at each visual complexity level were processed and analysed to account for potential learning effects.
Virtual reality task
2.3.
All participants completed a VR visual search task (figure 1a), which we based on the principles of the Trail Making Test-B. We particularly incorporated the visual attention, working memory and inhibition [27–29] components of the Trail Making Test-B into our VR task. The VR environment was created in Unity 2019.4.6 and used the HTC SRanipal SDK to incorporate eye tracking into the task. Participants were instructed to search for and select targets alternating between letters of the alphabet and animals whose names start with those letters in ascending order (A, ant; B, butterfly; C, chickens; D, dog; E, eagle and F, frogs). The search targets were positioned randomly for each trial in regions of the virtual environments where they would not be overly distinct from the background, such as across the sky for the letters and at ground level for the objects. The participants viewed the virtual environments using the HTC Vive Pro Eye (HTC, New Taipei, Taiwan) head-mounted display and selected targets using an HTC Vive Controller. The head-mounted display had a 110° field of view with a resolution of 1440 × 1600 pixels per eye (2880 × 1600 pixels combined). A laser pointer extended from the top of the controller in the virtual environment, and participants selected targets by aiming the laser at the centre of the target and pulling the trigger on the controller. All participants were informed that they could freely move their heads in all directions. Visual and auditory feedback were provided to indicate whether the selected target was correct. Specifically, the participants heard a bell sound and saw a ‘Correct!’ text appear in the environment when the correct target was selected. In contrast, the participants heard a buzzer sound and saw a ‘Try Again!’ text appear when they selected an incorrect target or when they did not select the correct target properly (laser not aimed at the centre of the target).
VR visual search task. (a) Single frames from representative gameplay recordings of each visual complexity level. Visual complexity was increased by adding visual distractors in the foreground and background in the form of vehicles, structures, plants and agriculture-related inanimate objects to dissociate them from the search targets, which were the only letters and animals in the environment. (b) Swarm plot comparing feature congestion of each visual complexity level with each data point representing a frame from representative gameplay recordings. The asterisk * indicates a significant difference between high and low visual complexity levels and high and medium visual complexity levels. The plus + indicates a significant difference between medium and low visual complexity levels.
The participants completed the visual search task in virtual environments with three levels of increasing visual complexity (figure 1a). The sequence of complexity levels was pseudo-randomized such that five participants in both young and older adult groups started with the low visual complexity level, another five with the medium visual complexity level and the remaining five with the high visual complexity level. The visual complexity of the environment was manipulated by adding visual distractors in the foreground and background. These visual distractors came strictly in the form of vehicles, structures, plants and agriculture-related inanimate objects to dissociate them from the search targets, which were the only letters and animals in the environment. Visual complexity was measured using feature congestion [47], which quantifies the distribution of low-level visual features of an image, including colour, edge orientation and luminance, as a single scalar measure with higher values indicating a more complex image. To determine if the modifications to the virtual environments produced increasing levels of visual complexity, we applied the feature congestion method to every frame of a representative gameplay recording from each visual complexity level (using Rosenholtz et al.’s MATLAB implementation: https://dspace.mit.edu/handle/1721.1/37593). Feature congestion increased monotonically across visual complexity levels (figure 1b; F_2,2349_ = 22 550, p < 0.001). Post hoc tests indicate that the high visual complexity level had greater feature congestion than the medium (Bonferroni-corrected p < 0.001) and low (Bonferroni-corrected p < 0.001) visual complexity levels. In addition, feature congestion was higher in the medium visual complexity level than in the low (Bonferroni-corrected p < 0.001).
Data collection and processing
2.4.
We collected the timing of each trigger press as participants selected objects. To quantify each participant’s overall performance, we calculated task completion time, which we defined as the difference between the time when the trial started and when the last correct target was selected. This measure of overall task performance is equivalent to the primary outcome measure of the paper-based versions of the Trail Making Test. Task completion time was calculated for every trial and reported as an average of the last five trials at each visual complexity level for each participant.
We collected eye and head movement data at 90 Hz using eye trackers and an inertial measurement unit built into the HTC Vive Pro Eye. The eye trackers were calibrated following a five-point calibration process provided by HTC. Additionally, video recordings of the participants’ first-person point of view were recorded at 60 Hz for each trial using NVIDIA Shadowplay (NVIDIA, Santa Clara, CA, USA). All data processing was performed using a custom script in MATLAB R2023b (Mathworks, Natick, MA, USA). Horizontal and vertical eye-in-head angles were calculated from the unfiltered raw eye movement data and were combined with the horizontal and vertical rotations of the head to compute gaze angles. The gaze angles were then time-synchronized with the video recordings. Since the video recordings and the gaze angles did not have the same sampling rate, the gaze angles were then interpolated at each video recording frame.
As visual processing occurs during fixations [48,49], we first identified fixation periods to analyse how participants directed visual attention during the task. To do so, we first computed horizontal and vertical gaze angular velocities by differentiating the interpolated gaze angles with respect to the time between video frames. We then identified fixations using a combination of a simple velocity threshold and a minimum fixation duration threshold. First, we categorized samples as fixations if their horizontal or vertical gaze angular velocity was less than 100° s^−1^ [50]. We computed gaze vectors for these fixations in the virtual environment with their origins set as the head position in the virtual environment and their direction determined by the gaze angles. We then recorded the names and locations of the objects or regions that the gaze vectors intersected in the virtual environment. We collapsed consecutive fixations to the same object or region together, calculated their durations and removed fixations that were less than 60 ms [51]. We then classified a fixation as task-relevant if it was directed towards one of the search targets and task-irrelevant if it was directed towards the visual distractors or any other region in the environment. We further classified task-relevant fixations as either the first fixation on a search target in a trial or a re-fixation, which we define as fixations on search targets that were previously fixated.
We computed several measures of gaze-related task performance to capture how overt visual attention varied across conditions and between groups. We calculated the average time between fixations on correct targets by recording the time intervals from the end of a fixation on one correct target to the beginning of a fixation on the next correct target and then averaging these times within a trial. The average time between fixations on correct targets includes the saccade times between targets and any fixation time on incorrect targets. We also recorded the number of fixations on task-irrelevant objects and the number of re-fixations on task-relevant objects. Additionally, we calculated the average time spent on first fixations on task-relevant objects, re-fixations on task-relevant objects and fixations on task-irrelevant objects by recording the time intervals from the start of a fixation on an object to the end of a fixation on the same object and averaging these times within a trial by fixation category. We similarly calculated the total time spent on first fixations on task-relevant objects, re-fixations on task-relevant objects and fixations on task-irrelevant objects by recording the time intervals from the start of a fixation on an object to the end of fixation on the same object and finding the sum of these times within a trial by fixation category. These gaze-related task performance measures were calculated for every trial and reported as an average of the last five trials at each visual complexity level for each participant.
We generated saliency maps from each frame of the video recordings of the participants’ first-person point of view during the task using the original Itti et al. [52] algorithm as implemented by Harel[53]. Specifically, each video frame was decomposed into feature maps of its low-level visual features, which included colour, orientation and intensity. These feature maps were combined into a saliency map with corresponding scalar values for each pixel. These scalar saliency values were then converted to percentile ranks [54,55], with 100% indicating the most salient pixel and 0% the least salient pixel in the video frame. We recorded the percentile rank of each fixated region and then calculated saliency by averaging these recorded percentile ranks within a trial. We calculated saliency for every trial and reported it as an average of the last five trials at each visual complexity level for each participant.
We also recorded information on the objects selected at every trigger press, which included the identity of the selected object, the location where the laser pointer was being aimed in the virtual environment at the time of the trigger press, and whether the selection was correct. We then calculated from the number of selection errors, defined as selecting an incorrect target or misaiming (controller aimed outside the bounds of the correct target) when selecting a correct target, and selection delay, which is the time interval between fixating a correct target and successfully selecting it averaged within a trial. The number of selection errors and selection delay were calculated for every trial and reported as an average of the last five trials at each visual complexity level for each participant.
Statistical analysis
2.5.
All statistical analyses were performed in R (R Project for Statistical Computing) with the alpha value set at p < 0.05. Tests for normality and equal variances were performed using the Shapiro–Wilk test and Levene’s test with functions from the stats and car packages, respectively. Welch’s analysis of variance and pairwise Wilcoxon rank sum tests were used to compare the feature congestion of each visual complexity level using the stats package when assumptions of equal variance and normality were violated.
We fit linear mixed-effects models using the lme4 package to test for the effects of age group, visual complexity and their interaction on the following dependent variables: task completion time, average time between fixations on correct targets, average and total first fixation time on task-relevant objects, number of re-fixations on task-relevant objects, average and total re-fixation time on task-relevant objects, number of fixations on task-irrelevant objects, average and total fixation time on task-irrelevant objects, saliency, number of selection errors and selection delay. We used the lmerTest package, which uses Satterthwaite approximations for the degrees of freedom, to test the null hypothesis that our model coefficients were zero. All models included random intercepts for each participant to account for repeated measures. Finally, we performed post hoc Bonferroni-corrected pairwise comparisons using the emmeans package when significant main effects or interactions, particularly differences between age groups within visual complexity levels, were found.
We compared cognitive assessment scores between age groups to determine the effect of age on various domains of cognition using functions from the stats package. Assumptions of normality and equal variance were tested using the Shapiro–Wilk and Levene’s tests, respectively. The Wilcoxon rank sum test was used when the normality assumption was violated; otherwise, a two-sample t‐test was used.
To determine if our VR visual search task was able to mimic the paper-based Trail Making Test-B, we fit a multiple linear regression model using the stats package to identify any relationships between performances in both tasks. Specifically, we used Trail Making Test-B completion time as the response and VR task completion time and its interaction with visual complexity level as the predictors.
We fit two multiple linear regression models to test for associations between performance in the task and various cognitive domains. In our first model, we used VR task completion time in the high visual complexity level as our response and scores on the Montreal Cognitive Assessment (global cognition), Corsi Block task (short-term memory) and Backwards Corsi Block task (working memory) as predictors. In our second model, we used the number of re-fixations on task-relevant objects as our response and scores on the Corsi Block task and the Backwards Corsi Block task as predictors. We used the performance metrics recorded from the high visual complexity level as the range of values observed at this level covered those found at the lower visual complexity levels. We also included age as a predictor in all three models to control for age-related deficits in performance.
Results
Task completion times
3.1.
All participants took longer to complete the task as the visual complexity of the environment increased, and this effect was greatest in older adults (figure 2). We found main effects of age group (F_1,26_ = 49.19, p < 0.001) and visual complexity level (F_2,52_ = 83.36, p < 0.001) on task completion time. We also found an interaction between age group and visual complexity level (F_2,52_ = 9.50, p < 0.001). Compared with the young adults, the older adults spent more time completing the task in the low (young M ± s.d. = 24.3 ± 3.8 s, old M ± s.d. = 33.2 ± 3.2 s, p = 0.002), medium (young M ± s.d. = 25.1 ± 3.7 s, old M ± s.d. = 35.8 ± 5.6 s, p < 0.001) and high complexity levels (young M ± s.d. = 32.5 ± 3.9 s, old M ± s.d. = 50.3 ± 10.9 s, p < 0.001). In addition, the young adults spent more time in the high versus low visual complexity (p < 0.001) and the high versus medium visual complexity (p < 0.001), while the older adults spent more time in the high versus low visual complexity (p < 0.001) and in the medium versus high visual complexity (p < 0.001) to complete the task.
Effect of age group and visual complexity on VR task completion times. The asterisk * indicates a significant difference between age groups.
Time between fixations on correct targets
3.2.
Several performance-related factors could explain the changes in task completion time with complexity level and differences in completion time between groups. For example, participants may have taken longer to complete the task in levels with higher visual complexity because they spent more time searching for correct targets. We calculated the average time between fixations on correct targets as the time interval from the end of a fixation on one correct target to the beginning of a fixation on the next correct target and found main effects of age group (F_1,26_ = 42.13, p < 0.001) and visual complexity (F_2,52_ = 83.69, p < 0.001). We also found an interaction between age group and visual complexity (F_2,52_ = 10.03, p < 0.001, figure 3a). Compared with the young adults, the older adults spent longer times between fixations on correct targets in the medium (young M ± s.d. = 1.38 ± 0.20 s, old M ± s.d. = 1.95 ± 0.30 s, p < 0.001) and high visual complexity levels (young M ± s.d. = 1.92 ± 0.25 s, old M ± s.d. = 2.91 ± 0.68 s, p < 0.001). Additionally, the young adults spent more time in the high versus low complexity level (young low M ± s.d. = 1.41 ± 0.19 s, p < 0.001) and in the high versus medium complexity level (p < 0.001), while the effect of visual complexity level was larger in the older adults as they spent more time in the high versus low complexity level (old low M ± s.d. = 1.80 ± 0.19 s, p < 0.001) and in the high versus medium complexity level (p < 0.001) on average between fixations on correct targets.
Effect of age group and visual complexity level on gaze behaviour. (a) Average time between fixations on correct targets, (b) average task-relevant first fixation time, (c) total task-relevant first fixation time, (d) number of task-relevant re-fixations, (e) average task-relevant re-fixation time, (f) total task-relevant re-fixation time, (g) number of task-irrelevant fixations, (h) average task-irrelevant fixation time and (i) total task-irrelevant fixation time. The asterisk * indicates a significant difference between age groups.
The increase in task completion time in more complex environments may also stem from changes in gaze behaviour, such as an increase in fixations on task-relevant versus task-irrelevant objects. We classified fixations into three categories. We considered the first fixation on task-relevant objects as a measure that potentially reflects when participants encoded information about these objects, such as location, in memory. We then identified re-fixations on task-relevant objects as a potential indicator that participants did not recall the location of a given object. Finally, we measured fixations on task-irrelevant objects as a measure of distractibility. We separately quantified the average duration of each fixation type, total fixation time and the number of each type of fixation in a trial.
First fixations on task-relevant objects
3.3.
We found a main effect of age group on average task-relevant first fixation time (F_1,26_ = 14.49, p < 0.001, figure 3b), but there was no significant effect of visual complexity level (F_2,52_ = 1.62, p = 0.208) or interaction between age group and visual complexity level (F_2,52_ = 1.77, p = 0.180). Compared with the young adults, the older adults spent more time on average on their first fixations on task-relevant objects (young M ± s.d. = 0.391 ± 0.086 s, old M ± s.d. = 0.497 ± 0.095 s). We also found a main effect of age group on total task-relevant first fixation time (F_1,26_ = 14.63, p < 0.001, figure 3c) but not a main effect of visual complexity (F_2,52_ = 2.04, p = 0.140) or an interaction between age group and visual complexity on total (F_2,52_ = 1.71, p = 0.190) task-relevant first fixation times. Compared with the young adults, the older adults spent more time in total on their first fixations on task-relevant objects (young M ± s.d. = 4.66 ± 1.04 s, old M ± s.d. = 5.95 ± 1.15 s).
Re-fixations on task-relevant objects
3.4.
We calculated the number of re-fixations, the average re-fixation time and the total re-fixation time on task-relevant objects. We found a main effect of age group on the number of re-fixations on task-relevant objects (F_1,26_ = 30.26, p < 0.001, figure 3d) as the older adults committed more re-fixations than the young adults (young M ± s.d. = 24.6 ± 6.9, old M ± s.d. = 34.3 ± 9.0). We also found a main effect of visual complexity (F_2,52_ = 73.31, p < 0.001) as the number of re-fixations was greater in the high versus low visual complexity level (low M ± s.d. = 24.7 ± 6.1, high M ± s.d. = 36.9 ± 9.2, p < 0.001) and in the high versus medium visual complexity level (medium M ± s.d. = 25.8 ± 7.0, p < 0.001). However, we did not find an interaction effect between age group and visual complexity (F_2,52_ = 1.87, p = 0.163).
We found a main effect of age group on the average task-relevant re-fixation time (F_1,26_ = 14.84, p < 0.001; figure 3e) as the older adults spent more time on average on their re-fixations on task-relevant objects than the young adults (young M ± s.d. = 0.427 ± 0.083 s, old M ± s.d. = 0.529 ± 0.091 s). However, we did not find a main effect of visual complexity (F_2,52_ = 0.63, p = 0.538) or an interaction effect between age group and visual complexity (F_2,52_ = 0.27, p = 0.761).
We found main effects of age group (F_1,26_ = 36.68, p < 0.001) and visual complexity (F_2,52_ = 42.02, p < 0.001) on the total task-relevant re-fixation time. We also found an interaction effect between age group and visual complexity (F_2,52_ = 5.38, p = 0.008, figure 3f). Compared with the young adults, the older adults spent more time in total on their task-relevant re-fixations in the low (young M ± s.d. = 8.7 ± 2.5 s, old M ± s.d. = 14.6 ± 2.5 s, p = 0.005), medium (young M ± s.d. = 9.1 ± 2.3 s, old M ± s.d. = 15.8 ± 3.9 s, p = 0.001) and high complexity level (young M ± s.d. = 12.7 ± 3.0 s, old M ± s.d. = 23.3 ± 7.8 s, p < 0.001). In addition, young adults spent more time in the high versus low complexity level (p = 0.004) and in the high versus medium complexity level (p = 0.014), while older adults spent more time in the high versus low complexity level (p < 0.001) and in the high versus medium complexity level (p < 0.001) on their total task-relevant re-fixation times.
Fixations on task-irrelevant objects
3.5.
We separately quantified the number of fixations, average fixation time and total fixation time on task-irrelevant objects. We found main effects of age group (F_1,26_ = 28.95, p < 0.001) and visual complexity (F_2,52_ = 185.39, p < 0.001) on the number of task-irrelevant fixations. We also found an interaction effect between age group and visual complexity (F_2,52_ =14.00, p < 0.001, figure 3g). Compared with the young adults, the older adults fixated on task-irrelevant objects more times in the medium (young M ± s.d. = 9.6 ± 2.1, old M ± s.d. = 14.6 ± 3.6, p = 0.024) and high complexity level (young M ± s.d. = 19.0 ± 4.1, old M ± s.d. = 30.2 ± 7.1, p < 0.001). In addition, the young adults fixated on task-irrelevant objects more times in the high versus low complexity level (low M ± s.d. = 9.1 ± 2.0, p < 0.001) and in the high versus medium complexity level (p < 0.001), while the older adults fixated on task-irrelevant objects more times in the high versus low complexity level (low M ± s.d. = 12.1 ± 3.1, p < 0.001) and in the high versus medium complexity level (p < 0.001).
We found a main effect of visual complexity (F_1,26_ = 3.61, p = 0.034) on the average task-irrelevant fixation time as all participants exhibited longer times in the high versus medium visual complexity level (medium M ± s.d. = 0.184 ± 0.020 s, high M ± s.d. = 0.200 ± 0.029 s, p = 0.032). We did not find a main effect of age group (F_1,26_ = 2.87, p = 0.102) or an interaction effect between age group and visual complexity (F_2,52_ = 2.93, p = 0.062, figure 3h).
We found main effects of age group (F_1,26_ = 25.89, p < 0.001) and visual complexity (F_2,52_ = 111.18, p < 0.001) on the total task-irrelevant fixation time. We also found an interaction effect between age group and visual complexity level (F_2,52_ = 16.02, p < 0.001, figure 3i). Compared with the young adults, the older adults spent longer on their total task-irrelevant fixation time in the high visual complexity level (young M ± s.d. = 3.58 ± 0.84 s, old M ± s.d. = 6.61 ± 2.25 s, p < 0.001). In addition, the young adults spent longer times in the high versus low complexity level (low M ± s.d. = 1.73 ± 0.48 s, p < 0.001) and in the high versus medium complexity level (medium M ± s.d. = 1.67 ± 0.42 s, p < 0.001), while the older adults spent longer times in the high versus low complexity level (low M ± s.d. = 2.26 ± 0.72 s, p < 0.001) and in the high versus medium complexity level (medium M ± s.d. = 2.80 ± 0.78 s, p < 0.001) in total on their task-irrelevant fixations.
Saliency of fixated regions
3.6.
Although both young and older adults increased their fixations on task-irrelevant objects as the visual complexity of the environment increased, these fixations were not towards more salient regions or objects (figure 4). We quantified the saliency of the fixated regions on the visual scene using the Itti, Koch and Neibur model of saliency-based visual attention [52] and found main effects of age group (F_1,26_ = 5.28, p = 0.030) and visual complexity (F_2,52_ = 45.19, p < 0.001). We also found an interaction between age group and visual complexity (F_2,52_ = 4.96, p = 0.011). Young adults fixated more salient regions than older adults in the highest visual complexity level (young M ± s.d. = 83.9 ± 2.2%, old M ± s.d. = 81.3 ± 2.0%, p = 0.004). In addition, young adults fixated more salient regions in the low versus the medium (low M ± s.d. = 86.3 ± 1.1%, medium M ± s.d. = 83.0 ± 2.0%, p < 0.001) and high (p = 0.002) complexity levels, while older adults fixated more salient regions in the low versus the medium (low M ± s.d. = 86.0 ± 1.5%, medium M ± s.d. = 82.7 ± 1.4%, p < 0.001) and high (p < 0.001) complexity levels.
Effect of age group and visual complexity on the saliency of fixated regions. The asterisk * indicates a significant difference between age groups.
Number of selection errors
3.7.
Participants may have also taken longer to complete the task with increasing visual complexity due to changes in their target selection strategy. To this end, we quantified target selection errors, defined as selecting an incorrect target or misaiming (laser pointed off-centre) when selecting a correct target (figure 5a). We found a main effect of age group on the number of selection errors (F_1,26_ = 6.02, p = 0.021), with older adults committing more selection errors than younger adults (young M ± s.d. = 1.42 ± 1.29, old M ± s.d. = 0.78 ± 0.53). We also found a main effect of visual complexity (F_2,52_ = 3.25, p = 0.047) on the number of selection errors; however, post hoc tests found no differences between visual complexity levels. We did not find an interaction effect between age group and visual complexity (F_2,52_ = 2.05, p = 0.140) on the number of selection errors.
Effect of age group and visual complexity level on (a) the number of selection errors and (b) the time between fixating and selecting a correct target. The asterisk * indicates a significant difference between age groups.
Selection delay
3.8.
Another way to investigate possible influences of changes in target selection on task completion time is to quantify selection delay, defined as the average time interval between fixating a correct target and successfully selecting it (figure 5b). We found main effects of age group (F_1,26_ = 34.08, p < 0.001) and visual complexity (F_2,52_ = 11.47, p < 0.001) on selection delay. We also found an interaction effect between age group and visual complexity (F_2,52_ = 4.39, p = 0.017) on selection delay. Compared with the young adults, the older adults selected the correct target slower in the medium (young M ± s.d. = 0.84 ± 0.13 s, old M ± s.d. = 1.29 ± 0.39 s, p = 0.001) and high complexity level (young M ± s.d. = 0.97 ± 0.12 s, old M ± s.d. = 1.61 ± 0.49 s, p < 0.001). In addition, the older adults selected the correct target slower in the high versus low complexity level (low M ± s.d. = 1.15 ± 0.20 s, p < 0.001) and in the high versus medium complexity level (p = 0.013).
Cognitive assessment scores
3.9.
Age-related differences were observed in several standard cognitive assessments (table 1). Compared with young adults, older adults had lower global cognition as they exhibited lower Montreal Cognitive Assessment [41] scores (young M ± s.d. = 28 ± 2, old M ± s.d. = 27 ± 3, Wilcoxon rank sum test p = 0.044) and poorer short-term memory based on their poorer Corsi Block task [42] results (young M ± s.d. = 6 ± 1, old M ± s.d. = 5 ± 1, Wilcoxon rank sum test p = 0.001). However, no differences between age groups were found in working memory (young M ± s.d. = 5 ± 1, old M ± s.d. = 5 ± 2, two sample t‐test p = 0.229) as tested using the Backwards Corsi Block task [43].
Trail Making Test-B and virtual reality visual search task
3.10.
As we designed our VR visual search task to mimic several aspects of the paper-based Trail Making Test-B, we were also interested in determining if the outcomes of these assessments were correlated. As such, we fit a multiple linear regression model to determine if performance in our VR visual search task was related to performance in the paper-based Trail Making Test-B (table 2). In our model, we used Trail Making Test-B completion time as the response and VR task completion time and its interaction with visual complexity level as predictors (F_3,80_ = 5.61, p = 0.002, Adj. R^2^ = 0.143). Trail Making Test-B completion time was positively correlated with VR task completion time (β = 1.254, p < 0.001), but this correlation decreased in the high visual complexity level (β = −0.400, p = 0.029).
Virtual reality visual search task performance and cognitive assessments
3.11.
Performance in the visual search task could have been influenced by various cognitive domains such as short-term memory, working memory and inhibitory capacity. As such, we fit a multiple linear regression model to determine if scores on the Montreal Cognitive Assessment, Corsi Block task and Backwards Corsi Block task are associated with VR task completion time in the high visual complexity level (table 3). Additionally, we included age as a predictor to control for potential age-related performance deficits. We specifically chose the high visual complexity level since its range of completion times covered those observed in the lower visual complexity levels. We found significant relationships between VR task performance and cognitive assessment scores (F_4,23_ = 28.5, p < 0.001, Adj. R^2^ = 0.80). Specifically, VR task completion times were lower in participants with higher global cognition based on the Montreal Cognitive Assessment (β = −1.46, p = 0.005) and better working memory as tested using the Backwards Corsi Block task (β = −2.66, p = 0.004). Additionally, VR task completion times increased with age (β = 0.301, p < 0.001).
Table 3.: Results of multiple linear regressions testing for associations between performance metrics in the high visual complexity level and cognitive assessment scores. Age was added as a predictor to control for age-related performance deficits. The asterisk * indicates p < 0.05.Figures
Gaze behaviour and cognitive assessments
3.12.
Gaze behaviour in the task could also act as an internal measure of certain cognitive domains. For instance, the number of re-fixations on task-relevant objects could reflect a participant’s short-term and working memory capacity. To test this, we fit a multiple linear regression model to determine if scores in the Corsi Block and Backwards Corsi Block tasks are related to the number of re-fixations on task-relevant objects in the high visual complexity level (table 3). We included age as a predictor to control for age-related performance deficits and chose to use gaze behaviour measurements from the high visual complexity level since its range of values covered those observed in the lower visual complexity levels. Our model explained the data well (F_3,24_ = 38.98, p < 0.001, Adj. R^2^ = 0.81) and indicated that the number of re-fixations on task-relevant objects was lower in participants with better working memory as indicated by higher Backwards Corsi Block scores (β = −3.20, p < 0.001) and increased with age (β = 0.173, p < 0.001).
Discussion
We used a VR-based visual search task to determine if increasing the visual complexity of a three-dimensional search environment would lead to changes in performance and if this effect is modulated by age. Additionally, we sought to understand which cognitive domains were associated with performance in VR-based visual search task. For both age groups, increasing the visual complexity of the virtual search environments led to longer task completion times. This effect was due, in part, to all participants spending more time between fixations on correct targets as they increased their re-fixations on task-relevant and fixations on task-irrelevant objects. We also found that all participants took more time to select a correct target after fixating them in levels with higher visual complexity, which probably contributed to the longer task completion times. These changes were greater in older adults. However, increased completion time did not appear to be influenced by saliency as both groups increasingly fixated on less salient regions as the visual complexity of the environment increased. Finally, we found that short-term and working memory capacities explained the variability in performance across participants.
As the complexity of the virtual environments increased, all participants spent more time completing the search task because they spent more time searching for correct targets, as demonstrated by the increase in the time interval between fixations on correct targets. These results are consistent with those classically found in simple conjunction search paradigms with simple stimuli composed of arbitrarily selected shapes and colours. When participants searched for stimuli that shared features with distractors, search times increased as the number of distractors in the visual display increased [56]. We also found that the effect of visual complexity on search performance was greater in older adults than in young adults. Similar age-related differences in search performance have been demonstrated using simple conjunction search paradigms [57–59] and paradigms using more complex stimuli. For example, when searching for and responding to a star-shaped stimulus in a vehicle’s digital dashboard, older adults demonstrated longer search times, as the visual complexity of the display increased, than young adults [60]. Similar results were observed when comparing search times for specific face configurations [61]. It appears then that the influence of visual complexity, age and their combination on search performance generalizes from two-dimensional displays to more complex, three-dimensional virtual environments.
The increase in suboptimal gaze behaviour demonstrated by all participants with increasing visual complexity could have also contributed to the changes in visual search performance. One such suboptimal gaze behaviour is the increase in task-irrelevant fixations with increasing visual complexity, which was found to be greater in older adults. Increased fixations on task-irrelevant objects could be due to a decreased capacity for top-down suppression of irrelevant stimuli with increasing visual complexity of search displays [62]. Older adults are also more susceptible to distraction by task-irrelevant information than young adults [21–24,63] and this has been attributed to cognitive impairments, particularly with deficiencies in inhibitory capacity [18,19]. However, we did not include assessments of inhibitory capacity in our cognitive test battery, which prevents us from determining whether the increase in task-irrelevant fixations is associated with differences in this cognitive domain. While all participants, particularly older adults, were more prone to directing their attention towards task-irrelevant distractors in increasingly complex virtual environments, it remains unclear if this effect can be attributed to poor inhibitory capacity.
Another suboptimal gaze behaviour observed from all participants at higher visual complexity levels was an increase in task-relevant re-fixations, with this effect being greater in older than young adults. We also found that scores on the Backwards Corsi Block task, a known test for working memory, were inversely associated with the number of task-relevant re-fixations but did not differ between age groups. It has been previously demonstrated that working memory processes and visual attention are tightly linked. Particularly, increased distraction by task-irrelevant visual stimuli may interfere with encoding task-relevant visual information, ultimately resulting in poor maintenance and retrieval from working memory [25,64]. As such, the increased distraction by task-irrelevant objects experienced at higher visual complexity levels may have led to the greater re-fixations observed across all participants.
While all participants became more prone to distraction by task-irrelevant stimuli as visual complexity increased, this effect was not driven by saliency as the participants increasingly fixated regions that have lower saliency values as visual complexity increased. While prior studies showed that susceptibility to distraction during visual perception tasks is greater in the presence of salient visual stimuli, particularly in older adults [21,23,63], this effect appears to be subdued in more complex scenes as people weigh prior knowledge of or the gist of the scene more than saliency [13–15]. In our task, participants may have formed a gist of each visual complexity level during practice and the initial five trials in a visual complexity level. Specifically, they would have learnt that the search environments are modelled after farms, the animals are located naturally at ground level and the letters are arbitrarily placed in the sky. The emphasis on prior knowledge and gist instead of bottom-up saliency when directing gaze and visual attention in realistic images [65–68] appears to generalize to three-dimensional virtual environments. As such, the ability of visual stimuli to automatically capture visual attention may not be contingent upon the stimuli being highly salient.
Proper allocation of visual attention is important for completing visual search tasks and appears to be associated with cognition. Performance in our task was related to global cognition and working memory capacity, as those who scored higher in the Montreal Cognitive Assessment and the Backwards Corsi Block task demonstrated shorter task completion times. These results echo the importance of working memory capacity on visual attention [69–71], which has been demonstrated in simple selective attention [72] and visual search [61] tasks. Beyond memory, inhibitory capacity is another cognitive domain that influences the allocation of visual attention and the successful completion of visual search tasks. As previously discussed, inhibitory capacity allows for the proper allocation of visual attention by selecting for and processing information relevant to the task while suppressing those that are irrelevant. As our experimental design lacked the inclusion of cognitive assessments specifically targeting inhibitory capacity, it remains unknown how differences in this cognitive domain may influence performance in a visual search task in increasingly complex environments. As such, visual search performance and the relevant gaze behaviour that may influence it appear to be influenced by an individual’s general cognitive capacity, with working memory playing a more prominent role.
The increase in task completion times at higher visual complexity levels can be explained not only by the time required to search for the correct targets, as exhibited by the calculated fixation times but also by changes that influence the speed with which targets are selected, such as by moving slower. We quantified selection delay as the time between fixating and selecting a correct target, which is influenced by a combination of the time it takes to move the laser pointer to the correct target and successfully select it. All participants demonstrated longer selection delays as the visual complexity of the search environments increased, with this effect being greater in older adults. These effects on selection delay could have been due to participants spending more time during information processing to determine whether the target is correct relative to the sequence. However, we did not find an effect of visual complexity on average fixation times on task-relevant objects. Longer selection delays at higher visual complexity levels could also have been caused by a general movement slowing as participants traded off speed for accuracy [73,74]. This effect seems to be supported by the lack of changes in the number of selection errors with increasing visual complexity for all participants. However, the selection errors were greater in older adults than young adults, contradicting previously demonstrated parity between the two groups resulting from older adults using movement slowing as a compensatory mechanism to maintain accuracy [73,74].
Limitations
It is important to consider that our study has limitations that may have influenced our results and their interpretability. We performed our sample size estimation primarily to power the analysis performed on the effect of age group, visual complexity and their interaction on task performance in a VR-based visual search task and the corresponding gaze behaviours. As such, the study may not be sufficiently powered for the analysis performed with the cognitive assessments. Future studies primarily interested in these outcomes should be powered based on measures of variability and desired effect sizes of interest.
While the task design improves on prior visual search tasks on static two-dimensional realistic images, using letters in the search sequence may have diminished the ecological validity of our task. The search targets in our tasks were related (letters of the alphabet and the visual representation of animals whose names started with those letters) such that the participants only needed to keep track of a single set in their working memory. In contrast, the paper-based Trail Making Test-B uses two different, unrelated lists (letters of the alphabet and numbers), requiring working memory to keep track of two sets. The low adjusted R^2^ we found when determining the relationship between performance in our VR task and the paper-based Trail Making Test-B also supports this limitation. Future work may benefit from creating a VR-based visual search task that uses search targets that would be more ecologically valid, such as using farm animals and farming vehicles instead of farm animals and letters in the context of our task. Additionally, future work should incorporate all aspects of the paper-based Trail Making Test-B.
While all of our participants have normal or corrected-to-normal vision, we were not able to assess the visual acuity and contrast sensitivity of our participants, which may influence a person’s perception of saliency since it is quantified using the low-level visual features of a scene and ultimately the results in our study. Future work should include assessments that target various visual domains.
Finally, the cognitive assessments included in our experimental design do not fully capture the cognitive domains involved in the task. While we found an association between performance in the paper-based Trail Making Test-B and our VR visual search task, we are unable to determine if our task involved set switching ability as it is commonly calculated as the difference between Trail Making Test-A and Test-B. Additionally, our experimental design did not include an assessment of inhibitory capacity in the cognitive test battery that our participants completed. As such, we were unable to determine whether differences in task performance and gaze behaviours, particularly with task-irrelevant fixations, were associated with differences in inhibitory capacity. Future work would benefit from including assessments of inhibitory capacity, considering that there are different types of inhibition that require the inclusion of different tests that target the type of inhibition present during visual search.
Conclusion
We found that the time it took to complete the VR visual search task increased with the visual complexity of the environment as participants increased time on re-fixating task-relevant objects and fixating task-irrelevant objects. This negative effect was greater in older adults and is associated with differences in cognitive capacity, particularly in working memory. Increased susceptibility to distraction is potentially problematic as suboptimal allocation of visual attention may lead to difficulties completing everyday tasks and can potentially lead to injuries. For instance, young and older adults who show greater anxiety stemming from fear of falling while walking demonstrate greater susceptibility to distraction by threatening stimuli, which may negatively affect their ability to adapt their gait for safe locomotion [75–77]. These results highlight the importance of designing cognitive assessments that use tasks that closely simulate the visual complexity and dynamicity in the real world. In addition, our study also emphasizes the need for a more comprehensive analysis of task performance beyond completion time to accurately assess visuomotor behaviour, particularly in older adults.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Land MF, Hayhoe M. 2001 In what ways do eye movements contribute to everyday activities? Vision Res. 41, 3559–3565. (10.1016/s 0042-6989(01)00102-x)11718795 · doi ↗ · pubmed ↗
- 2Land MF. 2006 Eye movements and the control of actions in everyday life. Prog. Retin. Eye Res. 25, 296–324. (10.1016/j.preteyeres.2006.01.002)16516530 · doi ↗ · pubmed ↗
- 3Hayhoe M, Ballard D. 2005 Eye movements in natural behavior. Trends Cogn. Neurosci. 9, 188–194. (10.1016/j.tics.2005.02.009)15808501 · doi ↗ · pubmed ↗
- 4Hayhoe MM, Rothkopf CA. 2011 Vision in the natural world. Wiley Interdiscip. Rev. Cogn. Sci. 2, 158–166. (10.1002/wcs.113)26302007 · doi ↗ · pubmed ↗
- 5Patla AE. 1997 Understanding the roles of vision in the control of human locomotion. Gait Posture 5, 54–69. (10.1016/s 0966-6362(96)01109-5) · doi ↗
- 6Marigold DS, Patla AE. 2007 Gaze fixation patterns for negotiating complex ground terrain. Neuroscience 144, 302–313. (10.1016/j.neuroscience.2006.09.006)17055177 · doi ↗ · pubmed ↗
- 7Matthis JS, Yates JL, Hayhoe MM. 2018 Gaze and the control of foot placement when walking in natural terrain. Curr. Biol. 28, 1224–1233. (10.1016/j.cub.2018.03.008)29657116 PMC 5937949 · doi ↗ · pubmed ↗
- 8Chapman GJ, Hollands MA. 2006 Evidence for a link between changes to gaze behaviour and risk of falling in older adults during adaptive locomotion. Gait Posture 24, 288–294. (10.1016/j.gaitpost.2005.10.002)16289922 · doi ↗ · pubmed ↗
