A confound-free method to manipulate pupil size in psychological experiments
Joshua Snell, Luise Wagner, Ana Vilotijević

TL;DR
This paper introduces a reliable method to change pupil size without affecting cognitive performance, using blue and red light.
Contribution
The study validates that blue light-induced pupil constriction does not confound cognitive task performance.
Findings
Blue light reduces pupil size without impacting auditory task performance.
Task difficulty affected both pupil size and performance, but display color only affected pupil size.
The ipRGC method is confirmed as confound-free for psychological experiments.
Abstract
Researchers are increasingly showing interest in the ways in which various cognitive processes are influenced by the size of the pupil. However, this realm of research is complicated by the pupil’s notorious susceptibility to confounders and difficulties in disentangling cause and effect. Recent studies have sought a solution in the exploitation of intrinsically photosensitive retinal ganglion cells (ipRGCs), which trigger pupil constrictions upon detecting blue light. The crux of the method is that stimuli are presented against blue versus red backgrounds of equal perceived luminance, to effectuate systematically smaller pupils in the former compared with the latter condition. Here we provide a further validation of this method, by testing a scenario of potential concern. Via retino-hypothalamic pathways, ipRGC activation modulates alertness and arousal. Blue and red backgrounds may…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3- —http://dx.doi.org/10.13039/100010663H2020 European Research Council
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsColor perception and design · Visual perception and processing mechanisms · Neural and Behavioral Psychology Studies
Introduction
The size of the pupil impacts visual processing in various ways: pupil constrictions (i) enhance foveal acuity, (ii) increase depth of field, and (iii) enhance foveal vision through reducing light scatter; however, they also (iv) reduce peripheral awareness and (v) impair vision in low-light conditions. Pupil dilations necessarily do the precise opposite. But beyond these basic sensory consequences, might the size of the pupil also affect certain cognitive processes, such as attention or memory? Whereas the pupil has long been used as a read-out of various cognitive processes (e.g., arousal: Joshi et al. 2016; Joshi & Gold 2020; Murphy et al. 2014; mental effort: Bumke, 1911; Granholm et al., 1996; Hess & Polt, 1960; Kahneman & Beatty, 1966; visuo-spatial attention: Binda et al., 2014; Binda & Murray, 2015; Bombeke et al., 2016; Mathôt et al., 2013; Naber & Nakayama, 2013; Unsworth & Robison, 2017; Snell et al., 2018; Vilotijević & Mathot, 2023; Vilotijević & Mathot, 2024; Vilotijević & Mathôt, 2025), cognitive neuroscience has only recently begun to consider the pupil’s potential role as an effector in its own right (e.g., Mathôt et al., 2023; Suzuki et al., 2019; Vilotijević & Mathôt, 2024). This, however, begs ways to disentangle cause and effect; and in the case of the pupil this happens to be notoriously difficult. Consider, for example, visuo-spatial attention. When attention is more widespread and/or shifts towards the peripheral visual field, the pupils tend to dilate (e.g., Vilotijević & Mathôt, 2023). With larger pupils, there is more retinal illumination, particularly in the periphery. This enhances peripheral visual processing, which increases the potential of peripheral stimuli to capture attention; and with the resulting attentional shift to the peripheral visual field we have come full circle. Many such “loops” exist between the pupil and other cognitive processes (for a review, see Mathôt, 2018), and scientists have yet to pinpoint the causal relationships.
How do we solve this conundrum? Intriguingly, recent studies suggest that the solution may be as simple as presenting stimuli on red versus blue backgrounds. In addition to the more well-known rods and cones, the retina comprises intrinsically photosensitive retinal ganglion cells (ipRGCs; e.g., Berson, 2003, 2007; McDougal & Gamlin, 2010; Spitschan et al., 2017; Zele et al., 2018). These do not directly project onto the visual cortex, meaning they do not engender conscious visual percepts. Instead, they are mainly involved in non-image-forming functions such as the regulation of circadian rhythms and pupillary control, through their projections onto the suprachiasmatic nucleus (SCN) of the hypothalamus and the olivary pretectal nucleus (OPN) of the midbrain, respectively (e.g., Berson, 2003; Ecker et al., 2010; Fan et al., 2018; McDougal & Gamlin, 2010). ipRGCs contain the photopigment melanopsin, which has peak sensitivity to blue light (~480 nm). Their activation brings about a relatively slow (onsetting at ~5 s) but sustained pupillary constriction. As such, it would seem that the size of the pupil can be very easily and non-invasively manipulated by presenting participants with isoluminant red versus blue displays, for respectively larger versus smaller pupils (e.g., Kinzuka et al., 2022; Mathôt et al., 2023; Vilotijević & Mathôt, 2024; Wardhani et al., 2022).
The ipRGC method has various benefits compared to other methodologies. For instance, manipulation of pupil size through overall display luminance (e.g., presenting black vs. white backgrounds) potentially introduces differences in the visibility of stimuli or different degrees of physical strain on the eyes. These problems are avoided with the ipRGC method because the red and blue displays are (perceptually) isoluminant. Alternatively, manipulation of pupil size may involve pharmacological methods: for example, Campbell and Gregory (1960) dilated the pupil by instilling a homatropine hydrobromide solution in the eye’s conjunctival sac and subsequently controlled pupil size artificially with an aperture in the line of sight. But these instillations may be accompanied by blurred vision (since the eye’s ability to accommodate is impaired) and differential behavior because of discomfort and physical sensations during and after the instillation.
That being said, the ipRGC method has only been used in a handful of studies and warrants further validation (Mathôt et al., 2023; Wardhani et al. 2022). A potential point of concern is the retino-hypothalamic pathway: upon activation, the SCN – through pathways involving the locus coeruleus (LC) (Vandewalle et al., 2007), can enhance alertness and arousal. As such, there is a possibility that the ipRGC method inadvertently impacts behavior, with, for example, better task performance with blue backgrounds relative to red backgrounds. Note, there is at least one reason to believe that the SCN’s role is relatively minor compared to that of the retino-tectal pathway and the OPN: arousal is associated with dilated pupils (e.g., Bradley et al., 2008; Mathôt, 2018), meaning that, as far as the SCN is concerned, the pupil should be larger in blue light – but this is evidently not the case (Kinzuka et al., 2022; Mathôt et al., 2023; Vilotijević & Mathôt, 2024; Wardhani et al., 2022). Yet, even if the OPN has a firmer grip on the pupil than the SCN, the latter may nevertheless separately exert an inadvertent influence on behavior, and this would pose a problem to the ipRGC method. With the present study we hope to rule out this scenario.
The present study
Here we investigated whether ipRGC activation impacts behavior in a response-time task. Participants responded as quickly as possible to auditory cues, while looking at red versus blue displays. Participants were tested in both an easy and a difficult version of the auditory task. We expected that the difficult task would provoke more mental effort and arousal, and that this would affect the pupil concurrently with task performance. Against this reference variable, we were hoping to see that the color of the display would exclusively affect the pupil without impacting task performance. All data and scripts are available via the Open Science Framework at https://osf.io/27fhu/
Methods
Participants
Thirty-eight students from Vrije Universiteit Amsterdam (age M = 21.0, SD = 3.0 years) gave informed consent to participate in this study for course credit or monetary compensation. Participants reported having no visual impairments or mental disabilities. Recruitment and data collection were done in full accordance with the Declaration of Helsinki.
Stimuli and design
Participants performed an auditory detection task while looking at red and blue displays. We used a 2 × 2 experimental design with task difficulty (easy vs. difficult) and display color (red vs. blue) as factors. The experiment consisted of four blocks – two easy (E) and two difficult (D) – and these were offered in a counterbalanced subset of block orders (i.e., we used EDED, DEDE, EDDE, and DEED, but not EEDD or DDEE). The two display colors were used in the two respective halves of each block, again with counterbalanced order.
In the easy blocks, participants responded as quickly as possible with the down arrow key to a total of ninety-two 50-ms 440-Hz beeps played at ~60 dB (46 with a blue display, 46 with a red display). Two beeps were played per 5-s interval, with the timing between beeps randomly varying between 300 and 4,700 ms. This means that if the second beep was played, say, 1,750 ms after the first beep, then the next beep occurred 5,000 ˗ 1,750 = 3,250 ms after that (hence ensuring two beeps per 5 s). In the difficult blocks, there were forty 440-Hz beeps and forty 458-Hz beeps. Here, participants had to respond with the up and down arrow keys to the higher- and lower-pitch beeps, respectively. In the difficult blocks, there were also 12 high-pitch (660-Hz) double-beeps (two 40-ms tones separated by 55 ms of silence). Upon hearing a double-beep, participants had to refrain from pressing any key. The total of 92 beeps was presented in random order.
For our displays, we adopted the luminosities that were established by Mathôt et al. (2023) to have equal subjective intensity, at 6.74 and 9.84 cd/m^2^ for red and blue, respectively. The equal subjective intensity was verified by the fact that the initial pupillary light response, as triggered by rod and cone cells in the first ~2.5 s (which is earlier than the ipRGC-driven sustained pupil constrictions), differed non-significantly between the red and blue displays.1
Procedure
Participants were seated in a dimly-lit room where we provided task instructions orally and on-screen. Participants were instructed to focus at a central fixation dot throughout the experiment. The eye position and pupil size were recorded with an EyeLink 1000 eye tracker (SR Research, Canada). The experiment was implemented in OpenSesame (Mathôt et al., 2012) with the PyGaze package for eye tracking (Dalmaijer et al., 2014). At the start of each block, participants were informed about whether they were about to do the easy or the difficult task. Each block started with a 7-s interval during which we showed the display in red or blue without presenting any beeps. These were so-called inducer intervals (e.g., Mathôt et al., 2023), which served to trigger a difference in the sustained pupil response due to differential ipRGC activation. Then 46 beeps occurred (these could be regarded as 46 trials or data points). After every beep, participants had until the next beep to respond. After 46 beeps, the display changed to the other color, and after another 7-s inducer, the remaining 46 beeps were presented. The total of 368 experimental trials was preceded by ten practice trials with a gray display. The entire experiment took approximately 35 min.
Pupil data processing
Following the workflow for preprocessing pupillary data (Mathôt & Vilotijević, 2023), we first interpolated blinks and downsampled the data by a factor of 10. Next, we baseline-corrected the data by subtracting the mean pupil size during the first 50 ms after the onset of the display from all subsequent pupil-size measurements on a trial-by-trial basis. Trials in which the baseline pupil size deviated by more than ± 2 z-scores from the mean (6.25%) were considered outliers and excluded from further analysis.
Results
Pupil size, response time (RT), and accuracy were analyzed with respectively linear mixed-effect models (LMMs) and generalized LMMs (GLMMs), with task difficulty and display color as factors and participants as random effect. Models successfully converged with inclusion of the random slope for display color. We report b-values, standard errors (SEs) and t-values (RTs) or z-values (accuracy), with values | b | and | z | > 1.96 considered significant. In the case of non-significance, we computed Bayes factors to quantify evidence for the null hypothesis. Before all analyses, we excluded trials with RTs < 100 ms or > 1,499 ms (~8%), so that 12,568 data points remained.
As noted above, every beep was considered a trial. For our analysis of pupil size, we logged the pupil size at the time of responding. Although the analysis is based on one pupil size measurement per trial, we do report continuous pupil size data per block type in Fig. 1.Fig. 1. Baseline-corrected pupil size as a function of task difficulty (easy vs. difficult) and display color (red vs. blue), measured across 46 beeps (i.e., half a block). The shaded areas around each line reflect standard errors. a.u. = arbitrary units
In line with previous applications of the ipRGC method, blue displays engendered systematically smaller pupils than red displays (b = −53.27, SE = 21.97 t = −2.43). The pupil was also impacted by task difficulty, with smaller pupils in the easy blocks than in the difficult blocks (b = −20.99, SE = 4.82, t = −4.35). We also observed an interaction, such that the impact of task difficulty on pupil size was significantly greater with red than with blue displays (b = 37.18, SE = 9.64, t = 3.86; Fig. 2). In fact, analyzing red and blue display trials separately, we found that task difficulty impacted pupil size solely in the red display trials (b = 39.70, SE = 7.29, t = 5.45). No such effect was observed in the blue display trials whatsoever (b = 2.43, SE = 6.31, t = 0.39). We reflect on this interaction in the Discussion.Fig. 2. Overall average pupil size (arbitrary units, not baseline-corrected) as a function of task difficulty (easy vs. difficult) and display color (red vs. blue). Error bars represent standard errors
The effect of task difficulty on pupil size was accompanied by an effect in task performance, with slower responses and more errors in the difficult blocks (RT: b = 263.68, SE = 3.60, t = 73.16; accuracy: b = 2.69, SE = 0.07, z = 41.10). Such relations between behavior and pupil size may be considered common and unsurprising (e.g., Mathôt, 2018); however, in the present study we were also hoping to find that the ipRGC-induced pupillary responses were not accompanied by behavioral differences. These hopes were fulfilled: in the presence of a clear effect of display color on pupil size, the display color did not affect task performance whatsoever (RT: b = 0.52, SE = 5.29, t = 0.01; accuracy: b = 0.03, SE = 0.06, z = 0.50; Fig. 3). There were also no interactions between display color and task difficulty on performance (RT: b = 8.82, SE = 7.22, t = 1.22; accuracy: b = 0.02, SE = 0.01, z = 1.73). For the (absence of) main effects of display color on performance, corresponding Bayes factors were computed in JASP (JASP Team, 2025) using the default Jeffreys–Zellner–Siow prior on the standardized effect size (Cauchy distribution, scale r = 0.707, two-sided; see, e.g., Rouder et al., 2009). We observed very strong evidence for the null hypothesis, with BF_0,1_ = 48.53 for RTs and BF_0,1_ = 41.02 for accuracy.Fig. 3. Average response times (RTs) and accuracies as a function of task difficulty (easy vs. difficult) and display color (red vs. blue). Error bars – which are so small that they are largely obscured by the value markers – represent standard errors
For a last post hoc analysis, we considered the possibility that influences of task difficulty on pupil size were in fact caused by the double beeps – which exclusively occurred in the difficult blocks – with the rationale that the double beeps provide more stimulation (and consequently larger pupils) than the single beeps. Reanalyzing the data after excluding double deep intervals, the impact of task difficulty on pupil size persisted (b = 29.98, SE = 13.86, t = 2.16).
Discussion
How does the size of the pupil affect various cognitive processes? This is an open question, primarily because pupil size is determined by – and likely reciprocally determines – a myriad of factors. An important first step in solving this puzzle is the development of a technique to manipulate the pupil in a clean, confound-free manner. Various recent studies have championed the stimulation of intrinsically photosensitive retinal ganglion cells (ipRGCs) as the solution (e.g., Kinzuka et al., 2022; Mathôt et al., 2023; Vilotijević & Mathôt, 2024; Wardhani et al., 2022). The ipRGC approach is promising because ipRGCs themselves do not project directly onto the visual cortex (meaning they do not contribute directly to visual percepts), and because their exploitation – which involves the presentation of red versus blue displays – is strikingly simple and non-invasive.
The present study constitutes an important validation step. In addition to the retino-tectal pathway that underlies the pupillary response, ipRGCs project to the suprachiasmatic nucleus (SCN) and locus coeruleus (LC), which are involved in regulating alertness and arousal (e.g., Berson, 2003; Fan et al., 2018). Therefore, we needed to rule out that red and blue displays inadvertently impact behavior in addition to the pupil.
The ipRGC method has passed our validation test: in the presence of an impact of task difficulty on both pupil size and behavior, ipRGC stimulation solely affected pupil size without concurrently impacting behavior. We surmise that, within the scope of a typical cognitive psychology experiment, activation of the SCN-LC pathway is too minor to have a meaningful effect on behavior. This stands in sharp contrast to the retino-tectal pathway, the involvement of which is prominent enough to effectuate clear pupil size differences.
Although the present results are encouraging, they also provide a theoretical insight that is worth taking into account when employing the ipRGC method. We found that the modulation of pupil size by task difficulty was only observed in the presence of a red display. When participants viewed the blue display, pupil size remained stably small, regardless of task difficulty. This finding might suggest that the relationship between pupil size and arousal – long exploited in the study of arousal and/or mental effort (e.g., Hess, & Polt, 1964; Kahneman, & Beatty, 1966; Klingner et al., 2011; van der Wel & van Steenbergen, 2018) – may be contextually constrained by sensory inputs that engage low-level photoreceptive pathways. One possibility is that stimulation of ipRGCs by blue light gates arousal-related influences on pupil size. The parasympathetically driven pupillary constriction would as such dominate, leaving little room for arousal-driven pupil dilations. Though speculative at this point, one consequence would be that one cannot simultaneously control pupil size via ipRGCs and track certain cognitive processes (e.g., mental effort, arousal) via pupil size modulations. Such processes are then better tracked via alternative behavioral or physiological measures. We should also emphasize that this does not undermine the ipRGC method in itself: after all, the ipRGC method’s main value lies in the ability to investigate the pupil’s influence on cognition, rather than the other way around.
On a final methodological note, might the story have been different if we had used a different behavioral task? Specifically, is there a possibility that we would observe a direct impact of ipRGCs on behavior if the task were more sensitive to small fluctuations in alertness and/or arousal? We consider this unlikely, as our task already appears to have been quite sensitive, in the sense that relatively small task-induced fluctuations in pupil size coincided with drastic fluctuations in behavior (Fig. 3). By comparison, the ipRGC-induced pupil size effect was approximately twice as large (Fig. 2), but there was clearly no impact on behavior whatsoever. It may furthermore seem reasonable to predict that things would look different if our task had been visual rather than auditory in nature. But this is precisely the point: our auditory task shows that any impact of ipRGCs in a typical cognitive experiment (irrespective of whether that experiment involves a detection or discrimination task) on behavior, if it were to exist at all, is negligible. Therefore, if any effect of ipRGC stimulation on behavior were to emerge in a visual task, researchers can safely draw a causal link to the size of the pupil rather than arousal.
In closing, here we have established that the ipRGC method is a clean and confound-free way to manipulate the size of the pupil. Furthermore, our data lead us to believe that arousal-related dilations are not uniformly expressed, but are instead modulated by background and environmental illumination. We anticipate that the ipRGC method will have a prominent role in the ongoing endeavor to unravel the pupil’s impact on cognition.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bradley, M., Miccoli, L., Escrig, M., & Lang, P. (2008). The pupil as a measure of emotional arousal and autonomic activation. Psychophysiology,45, 602–607.
- 2Dalmaijer, E. S., Mathôt, S., & Van der Stigchel, S. (2014). Py Gaze: An open-source cross-platform toolbox for minimal-effort programming of eyetracking experiments. Behavior Research Methods, 46(4), 913–921. 10.3758/s 13428-013-0422-2
- 3Ecker, J., Dumitrescu, O., Wong, et al. (2010). Melanopsin-expressing retinal ganglion-cell photoreceptors: cellular diversity and role in pattern vision. Neuron, 67, 49–60.
- 4Granholm, E., Asarnow, R. F., Sarkin, A. J., & Dykes,K. L. (1996). Pupillary responses index cognitive resource limitations. Abstract Psychophysiology, 33(4), 457–461. 10.1111/j.1469-8986.1996.tb 01071.x
- 5Hess, E. H., & Polt, J. M. (1960). Pupil size as related to interest value of visual stimuli. Science (New York, N.Y.), 132(3423), 349–350.
- 6Mathôt, S., Schreij, D., & Theeuwes, J. (2012). Open Sesame: An open-source graphical experiment builder for the social sciences. Behavior Research Methods, 44(2), 314–324. 10.3758/s 13428-011-0168-7
- 7Mathôt, S., & Vilotijević, A. (2023). Methods in cognitive pupillometry: Design preprocessing and statistical analysis. Abstract Behavior Research Methods, 55(6), 3055–3077. 10.3758/s 13428-022-01957-7
- 8Mathôt, S., van der Linden, L., Grainger, J., Vitu, F. (2013). The pupillary light response reveals the focus of covert visual attention. PLOS ONE, 8(10), e 78168.
