Decision-making trades off learned and perceived information

Tal Nahari; Boaz Rozenberg; Yoni Pertzov; Eran Eldar

PMC · DOI:10.21203/rs.3.rs-7972005/v1·October 29, 2025

Decision-making trades off learned and perceived information

Tal Nahari, Boaz Rozenberg, Yoni Pertzov, Eran Eldar

PDF

Open Access

TL;DR

This study shows how people balance using learned knowledge and new sensory information when making decisions.

Contribution

The paper introduces a novel framework showing a trade-off between learned and perceived information in decision-making.

Findings

01

Participants who relied more on learned information gathered less perceptual data.

02

The trade-off is explained by the faster availability of learned information, reducing the need for perception.

03

Individuals tend to consistently favor one information source over the other.

Abstract

A fundamental question in cognitive science is how information from internal memory is combined with external sensory input when making decisions. We hypothesized that previously learned and currently perceived information trade off against each other, such that extracting information from one source reduces the gathering and usage of information from the other. To test this hypothesis, we designed a two-armed bandit task where each arm is composed of both learned and perceived elements. We monitored participants’ gathering of perceptual information using eye tracking. Participants’ choices and gaze deployment showed a trade-off between the impact of learned and perceived information. The more a participant utilized internally stored learned information, the less they gathered perceptual information, and vice versa. Modeling participants’ information gathering indicated that the…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Figures5

Click any figure to enlarge with its caption.

D](#F2)). Moreover, both this effect and the shorter choice time in combined trials correlated with the degree to which participants prioritized colors over Cs in making their choices, as inferred from participants’ choices ( $[eqn]$ ; [Figure 2F](#F2); choice time: r = −.39, $[eqn]$ , [Figure 2G](#F2)), and validated by participants’ self-reports concerning their use of Cs and colors (r = 0.55, $[eqn]$ , [Figure 2E](#F2)).

C](#F3)). Thus, mirroring the model-independent results, the modelling indicated that the tradeoff between learned and perceived information resulted from a fast retrieval of learned information that suppressed further gathering of perceptual information.

A](#F4)) and model ([Figure 4E](#F4)). With regards to decisions to continue looking, we examined the probability of gathering additional perceptual information following each additional fixation. This measure too showed a good correspondence between participants ([Figure 4B](#F4); $[eqn]$ , t = 29.4, p < .001, mixed regression) and model ([Figure 4F](#F4); $[eqn]$ , t = 148.8, p < .001, mixed regression). Finally, we examined measures that stem from the mechanism underlying the tradeoff between learned and perceived information: the reduced tendency to continue looking at more Cs the higher t

A](#F5)), indicating a correspondence between the task measure and participants’ self-reports. We next tested the validity of the task measure for explaining behavior outside the task, specifically, by assessing its correspondence with personality self-reports (Big 5^[19](#R19)^). We hypothesized that greater use of perceived, relative to learned, information would be associated with trait extraversion and openness to experience. We found a trend-level correlation with extraversion (r = .21, p = 0.057) but not with openness to experience (.15, p = 0.33; all correlations appear in Table S1). We

Keywords

decision makingeye trackingvalue learninginformation gathering

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural and Behavioral Psychology Studies · Decision-Making and Behavioral Economics · Mind wandering and attention

Full text

The many decisions we make every day may rely on information from the environment that we perceive using our senses (e.g., deciding whether to take an umbrella by looking outside to see whether it is cloudy), and on previously learned information that we gather from within our minds (e.g., recollecting whether it normally rains at this time of the year^1^). Prior research extensively examined how people use either source of information in separate, yet it remains unclear how people integrate information from both sources.

Answering this question is difficult not only due to a lack of empirical data but also because research on perceptual and learning-based decisions has largely relied on diverging modeling frameworks^2^. Perceptual decision-making has most prominently given rise to models of gradual evidence accumulation where the proclivity to take one or another decision evolves over time during the decision process^3–5^. Conversely, learning-based decisions have mostly been studied through the lens of reinforcement learning^6–10^, which explains how preferences change from decision to decision, but not during the decision process. That said, substantial research has shown how learning-based decisions may also form gradually, through the accumulation of internally sampled information^11–15^. Perceptual and leaning-based decision-making may therefore share a common algorithmic form, and potentially require overlapping cognitive resources.

One key difference, however, is that visual stimuli are typically sampled progressively via eye movements^16^, whereas the gathering of learned information is limited only by the speed of neurons. If these two sources of information are gathered at comparable speeds, the trade-off between them can be expected to be symmetric, such that gathering more information from one source means gathering less of the other. However, if learned information is gathered faster, as it does not require operating ocular muscles, learning could enable faster decisions and reduce the residual value of investing additional time in gathering perceptual information. Thus, here we ask whether there is a trade-off between the use of learned and perceived information in making decisions, and if there is, whether it is symmetric or directional.

To answer this question, we designed a novel decision-making task that required jointly utilizing learned and perceived information to maximize reward (preregistered at https://osf.io/78vqu). Importantly, sensory information was collected by participants by shifting their gaze, as consistent with how such information is gathered in the wild. This also enabled us to precisely monitor the gathering process via eye tracking. As anticipated, we found that the two sources of information traded off against each other. However, the tradeoff was directional — learned information influenced the amount of perceptual information that was gathered to reach a decision. Joint modeling of choices and eye movements revealed the mechanism at play - the more informative the learned information, the less participants gathered perceptual information. Preliminary evidence also indicated that the tendency to rely on either learned or perceived information is a stable individual trait.

Results

88 individuals completed the task, out of which 72 provided full eye tracking data. In each trial, participants chose between two stimuli, each composed of two elements (Figure 2): a colored circle and a surrounding ring of 12 Landolt Cs, each facing either upwards or downwards. The colored circle was associated with a fixed probability of reward that could only be learned through trial and error. By contrast, with regards to the surrounding Landolt C’s, participants were informed that the probability of reward scales linearly with the number of Cs that face upwards. Thus, whereas the colored circle could be instantaneously identified using peripheral vision but required consulting learned information to estimate associated reward probability, the Landolt Cs’ required protracted perceptual processing to identify, but their associated reward probabilities were precisely known. Importantly, participants were made aware that the color of the circle and the number of Cs facing upwards bared equal weights in determining reward probability associated with each stimulus, and thus both learned and perceived information were equally important for maximizing reward.

Participants jointly used learned and perceived information

We first validated that participants utilized both perceived and learned information when making a decision. To test that, we compared participants’ tendency to choose the stimulus with the higher total reward probability, accounting for both Cs and colors, to their tendency to simply choose the stimulus with the more upward-facing $[eqn]$ , and to their tendency to choose the stimulus with the more rewarding color $[eqn]$ . The results confirmed that participants accounted for both sources of information in making their choices (Figure 1C).

In addition, to determine whether the use of both Cs and color explained participants’ choices better than either the Cs or the color separately, we compared three regression analyses explaining participants’ choices by: (1) the difference between the two available stimuli in the number of upward Cs; (2) the difference between the stimuli in the reward probability associated with the color; and (3) both. A $[eqn]$ test showed that the Cs and color together explained choices better than either one separately ( $[eqn]$ of combined vs. Cs: 1241.2, p<.001; $[eqn]$ of combined vs. color: 4332.4, p<.001). Additionally, both Cs and color predictors were significant $[eqn]$ . Lastly, the same conclusion was also obtained by comparing the regression models using the Bayesian information criterion (BIC; Figure 1D).

Trade-off between learned and perceived information within participants

Having verified that participants used both learned and perceived information to make their choices, we next tested whether a trade-off existed between them within participants, such that when a participant used more learned information, they gathered less perceived information, and vice versa. For this purpose, we leveraged the monitoring of participants’ perceptual information gathering via eye-tracking. We first confirmed that the Cs participants fixated on, as identified by eye tracking, explained participants choices better than accounting for all the Cs that appeared on the screen (Table S2). Then, we regressed participants choices on: (i) the difference between the two choice options in the number of upward minus downward Cs participants fixated on ( $[eqn]$ ); (ii) the difference between the choice options in the reward probability associated with the stimuli colors (color); and (iii) the trade-off between using the colors and looking at Cs – an interaction between the color predictor and the total number of all Cs the participant fixated on during the trial, denoted color $[eqn]$ . We did not include here the total number of Cs participants fixated on as a predictor outside of the interaction because we did not expect it to have a direct effect on choice, as confirmed in a supplementary analysis (see Figure S2).

We found that both Cs and colors influences choice, and most importantly, the interaction was negative, meaning that the more participants used learned information the less they gathered perceived information (Figure 2A, Figure S2).

Next, we ruled out that this interaction was due to the higher strength of evidence that a higher number of Cs could offer in favor one or the other stimulus. According to this alternative interpretation, the tradeoff results because participants use the colors less when the strength of perceptual evidence is higher. We tested this interpretation by adding to the previous regression another interaction - between the color predictor and the strength of perceptual evidence, that is, the absolute difference between stimuli in the upward- minus downward-facing Cs that participants fixated on (denoted: $[eqn]$ ). In this expanded regression, we only found a significant effect for the interaction with the total number of Cs participants observed and not with the strength of the evidence these Cs offered ( $[eqn]$ , $[eqn]$ ). Thus, the use of learned information for making choices traded off specifically against the amount of perceptual information gathered.

Learned information reduces sampling of perceptual information

Next, we asked why a trade-off exists between the use of perceived and learned information. A natural suggestion is that the trade-off is caused by the fact that gathering each of the two kinds of information requires time and uses overlapping cognitive resources, such that it is slower to gather both of them simultaneously. If this is the case, we may expect the time it takes to gather both kinds of information to be longer than the time it takes to gather each of them separately. However, comparing choice times on trials where stimuli consisted of either Cs alone, color alone, or both showed that the addition of color to the Cs made participants reach decisions faster, not slower (combined vs. Cs only: $[eqn]$ ; Figure 2C), despite the fact that to choose between combined stimuli optimally, both Cs and color needed to be consulted, and participants had plenty of time left in the trial to do so (they were given 4 second to make a choice but did so on average within 1.91 s, ±0.26).

Examination of the choice times also revealed that participants were substantially faster in choosing between colors than between $[eqn]$ . This raised the possibility that the tradeoff that was evident in the combined trials was the result of fast gathering of learned information, leading to lesser need to use perceived information to reach a decision threshold. If this is the case, we may expect to see that participants fixated on fewer Cs in combined trials, compared to the Cs-only trials. The data showed this was indeed the case ( $[eqn]$ ; Figure 2D). Moreover, both this effect and the shorter choice time in combined trials correlated with the degree to which participants prioritized colors over Cs in making their choices, as inferred from participants’ choices ( $[eqn]$ ; Figure 2F; choice time: r = −.39, $[eqn]$ , Figure 2G), and validated by participants’ self-reports concerning their use of Cs and colors (r = 0.55, $[eqn]$ , Figure 2E).

The finding that participants were faster to choose when stimuli also involved color put into question the assumption that gathering information about the color takes substantial time, similarly to the gathering of the perceptual information. To explicitly test this, we added to our logistic regression model of choice an interaction between the color predictor and choice time, while still controlling for the association of color use with fewer observed Cs. The results did not provide evidence that using the colors took additional time. In fact, the interaction of color use and choice time was negative $[eqn]$ . Possibly, this result reflects that the better the color reward probabilities were learned, the faster it was to use them.

A computational model of perceptual and learned information gathering

The results thus far establish a trade-off between the use of learned and perceived information, and suggest that this trade-off results because fast, initial gathering of learned evidence gets people closer to a decision threshold, and thus reduces the amount of perceptual information they need to gather in order to reach a decision. To formally test this mechanism, we leveraged the traceability and discreteness of perceptual information gathering in our task to form a computational model of the task. Thus, the model predicted not only the choices participants made, but also the sequence of fixations that would lead them to each choice (see Methods for full model specification).

At each timepoint, the model first decides whether to gather more perceptual information or already make a choice. This decision can lead to a tradeoff between perceptual and learned information because, among other factors, it is based on the absolute difference in the currently estimated values of the stimuli’s colors $[eqn]$ :

[eqn]

Specifically, a tradeoff would result if $[eqn]$ is negative. Other factors influencing this decision are the value difference according to the observed $[eqn]$ , the model’s general tendency to gather or not gather perceptual information ( $[eqn]$ ), and its urgency to make a decision as time goes by from trial onset ( $[eqn]$ ). Here, $[eqn]$ determines how quickly urgency grows with time.

If the model decides to gather additional perceptual information, it then chooses whether to look at the right or left stimulus’ Cs:

[eqn]

This choice is based on a general bias to look in one or another direction ( $[eqn]$ ), the values of the right versus left stimuli, and where the model last looked at (since it is less effortful to saccade within a stimulus). Here, $[eqn]$ (looked right) is coded as +1 if the model last looked right, and −1 if the model last looked left.

When the model decides to stop gathering perceptual information, it chooses either the right or the left stimulus, based on their estimated values:

[eqn]

where $[eqn]$ is an inverse temperature parameter, and $[eqn]$ accounts for both colors and Cs:

[eqn]

where $[eqn]$ weights the relative impact of the colors on choice.

Lastly, the value of a stimulus based on its observed Cs is computed as a mean of the expected reward for 3, 6, or 9 upward-facing Cs, each multiplied by the likelihood that there were this number of upward-facing Cs given the observed Cs (as per the binomial probability mass function). And the value added due to the stimulus having a particular color is learned on a trial-by-trial basis from the observed rewards (see methods). To account for the fact that retrieving the learned color values could take time, for instance due to a process of sampling from stored memory^14^, the effective value used for making choices increasingly approaches the learned value as time within a trial progresses:

[eqn]

with the rate of increase determined by $[eqn]$ .

Learned values impact perceptual information gathering

Fitting the model to participants’ actual gaze choices (fixations) and choices showed that choices of whether to continue gathering information, and choices of where to sample from, were influenced by both Cs and color information. Thus, the best-fitting model according to integrated Bayesian Inference Criteria^17,18^ (iBIC = 182139.3) incorporated learned color information in both the decision of whether to continue gathering information and the decision of where to sample. This full model outperformed models that included learned color information only in the gather-versus-choose decision (iBIC = 182435.46) or only in the sampling location decision (iBIC = 182207.92; Figure 3B).

Examining the parameter values that best fitted participants’ choices and gaze fixations showed the decision to continue gathering perceptual information was negatively influenced by the color values – the higher the absolute difference between the color values, the lower the probability of continuing to gather perceptual information. Moreover, color values were gathered very rapidly by the model, on average reaching 94.6% of the learned color value already in the second fixation (see Figure 3C). Thus, mirroring the model-independent results, the modelling indicated that the tradeoff between learned and perceived information resulted from a fast retrieval of learned information that suppressed further gathering of perceptual information.

To confirm that the model captured participants’ behavior well, we compared it to the actual data in terms of both stimulus choices and decisions to continue looking. With regards to stimulus choices, we examined the probability of choosing the right or left stimulus as function of the reward probabilities associated with the stimuli’s color and Cs. This showed an adequate correspondence between participants (Figure 4A) and model (Figure 4E). With regards to decisions to continue looking, we examined the probability of gathering additional perceptual information following each additional fixation. This measure too showed a good correspondence between participants (Figure 4B; $[eqn]$ , t = 29.4, p < .001, mixed regression) and model (Figure 4F; $[eqn]$ , t = 148.8, p < .001, mixed regression). Finally, we examined measures that stem from the mechanism underlying the tradeoff between learned and perceived information: the reduced tendency to continue looking at more Cs the higher the absolute differences between the color values. This reduced tendency was evident in the number of Cs observed (Figure 4C; $[eqn]$ , t = 7.28, p < .001, mixed regression) and the time it took participants to stop looking and choose a stimulus (Figure 4D; $[eqn]$ , t = 3.66, p < .001, mixed regression). These measures aligned well with the number of Cs observed by the model (Figure 4G; $[eqn]$ , t = 6.5, p < .001, mixed model) and the model’s probability of continuing to look (Figure 4H; $[eqn]$ , t = 12.01, p < .001, mixed regression).

Individual differences and reliability in the use of learned and perceived information

The results so far established and explained a trade-off between learned and perceived information within participants. We finally asked whether a trade-off also exists between individuals, such that the more a person relies on one source of information, the less they rely on the other. To test this, we examined the correlation between the regression coefficients quantifying the degree to which each participants used the Cs and the colors to make choices. This planned analysis showed a weak, non-significant negative correlation (r = −0.16, p = .14; Figure S1A), suggesting that our measures might not be sensitive enough given the present sample size.

We thus repeated the correlation analysis replacing the regression coefficients with the parameter values for the weights participants gave to continue gathering $[eqn]$ and color ( $[eqn]$ ) as derived from the modelling of participants’ choices and fixations. The modelling more specifically quantifies participants’ baseline tendency to gather Cs and their usage of color information to which they were exposed. We found a negative correlation across participants between the two parameters, such that the more participants used the color information the less they gathered information about the Cs and vice versa (r = −0.23, $[eqn]$ ; Figure S1B). We also note that this result likely underestimates the true magnitude of the trade-off because it may reflect a superposition of two correlations, one negative due to a trade-off and another positive due to individual differences in general performance, for instance due to motivation, which would lead to higher or lower values in both coefficients.

In addition, we wanted to examine whether participants’ tendency to prioritize perceived or learned information in making choices is a stable and valid individual trait. For this purpose, we first examined the correlation between the color-Cs relative coefficient measure (see Figure 2C) in two separate sessions of the experiment, each with different colors, that participants completed in two different days (between 2 and 175 days apart, mean 79 ± 63). A positive correlation (n = 27, r = 0.6, p<.001) indicated moderate test-retest reliability suggesting some degree of stability in individuals’ tendency to use perceived versus learned information in our task (Figure 5C).

To test the validity of these individual differences, we next examined whether they correlated with participants’ own ratings of how much they used the colors and the Cs. This showed a significant positive relationship (Figure 5A), indicating a correspondence between the task measure and participants’ self-reports. We next tested the validity of the task measure for explaining behavior outside the task, specifically, by assessing its correspondence with personality self-reports (Big 5^19^). We hypothesized that greater use of perceived, relative to learned, information would be associated with trait extraversion and openness to experience. We found a trend-level correlation with extraversion (r = .21, p = 0.057) but not with openness to experience (.15, p = 0.33; all correlations appear in Table S1). We thus repeated the analysis with respect to extraversion using the more specific task measures derived from the modeling, which were moderately correlated with the measures derived from a regression analysis of choices (Figure 5B). This showed a significant correlation (r = −.24, p = 0.04, Figure 5D), such that more extraverted participants gathered more perceptual information relative to their usage of learned information.

Discussion

Using a novel pre-registered experiment that required participants to jointly use learned and perceived information to form decisions, we showed that there is a trade-off between the use of learned information and the gathering of new, perceptual information, both within- and between-participants.

The data indicated that learned information is fast to gather reducing the need for the effortful gathering of additional perceptual information. Moreover, learned information directs the gathering of perceptual information towards higher-value options^15,20–22,22–26^, and thus leads one to gather perceptual information that is more likely to generate a decision.

This work illustrates how humans dynamically integrate internal and external sources of information to flexibly adapt their exploratory strategies. Crucially, it shows how humans decide not only whether to seek additional information, but also which information to pursue. These findings are particularly relevant for understanding ecological human behavior, where information seeking is not an isolated event but a continuous, multidimensional process that underpins adaptive action in complex environments^23,27–39^.

Prior work has primarily considered similarities between the gathering of externally perceived and internally stored information^27,40–45^, suggesting that attention gathers information from our senses, while working memory does so from within the mind^27^. Indeed, comparable patch-like information structures and search behaviors were found in both domains. Thus, principles drawn from research about foraging^43,46^ have been used to explain how we retrieve semantic information from memory^47–49^, do creative search^50^, or shift our gaze to areas rich in semantic or visually dense information^26^. Here we extend prior work by showing how the two sources of information compete in influencing decision making.

Importantly, our findings emerged within an experimental design in which perceptual information could only be accessed through eye movements. This requirement reflects a naturalistic feature of decision-making, but the precise time cost of eye movements is likely to vary across contexts, raising the possibility that the strength of the trade-off we observed may itself adapt to environmental demands^51^.

In addition to the trade-off within individuals, participants’ overall tendency to prioritize one source of information over the other emerged as a stable individual trait across two separate sessions of the experiment. This characteristic was reflected in the modeling results and had a marginally significant correlation with trait extraversion. These results suggest a stable latent individual preference for relying on either internally gathered or externally sampled information, with the latter motivating extraverted behavior. Indeed, trait extraversion has been associated with a greater sensitivity to external stimuli and a preference for externally sourced information^52^. Here, we characterize how this personality tendency may manifest in learning and decision-making. Given our marginally significant result, further research employing larger samples is necessary to establish our conclusions regarding trait differences.

Future work could also examine relationships with additional traits that potentially originate from a preference for gathering perceptual information. One such trait is emotional temperament, in particular, proneness to depression which is known to be associated with a tendency to ruminate internally while avoiding activities that can expose one to new external information^53^.

Another promising avenue for further research concerns the information provided by each source. In our experiment, learned and perceived information provided the same amount and kind of information – probability of reward. This feature of our task can be manipulated, for instance, by having one source of information convey reward probability and the other reward magnitude, which could potentially serve to reduce the tradeoff between them.

In conclusion, our findings extend current understanding of how humans combine newly perceived and previously learned information^54^ by establishing a trade-off between these two sources of information in guiding decisions, highlighting that beneath each choice lies a more fundamental decision about which source of information to consult.

Methods

Participants

The sample included 90 university students with normal or corrected-normal vision (62 women; 66 with right dominant eye; average age 23.5, sd 3.03), all scoring 70 or above in the Ishihara color blindness test. We based our sample size on a power analysis as pre-registered in https://osf.io/78vqu and detailed below. Two participants were removed from behavioral data analyses because their performance in combined color-Cs trials, measured as the proportion of choices of stimuli associated with the higher probability of reward, was not higher than chance level (50%). Valid eye tracking data (above 75% valid samples) were obtained from 72 participants (see exclusion criteria under Eye tracking) and inserted into the eye movements analysis. To test the reliability of task measures, 29 participants were invited back for another session of the experiment, out of which 27 met the performance criterion. All participants signed an informed consent before the experiment, approved by the Ethics review board of the Hebrew University of Jerusalem. They were granted either course credit or 40 NIS (~10 $), in addition to 0.3NIS (~0.01$ ) for each reward obtained in the experiment. Due to dropout and participants returning rates, only 51 filled the ATQ questionnaire, and 68 filled the big5 questionnaire.

Power analysis

We performed a power analysis to determine the required sample size for the across-participants trade-off analyses, as such analyses require larger sample sizes. We expected a medium effect size, which entails a correlation of approximately 0.3, and required a power of 0.8 at a significance level of p=0.05. This necessitated a sample size of at least 84 participants. For testing whether the prioritization of internal or external information is a reliable disposition, we expected a large effect size of at least r=0.5 in a test-retest correlation across participants. For this analysis, a power of 0.8 at significance of p=0.05 necessitated a sample size of at least 28 participants.

Procedure

Participants were first familiarized with the experimental stimuli and instructed that reward probability scales with the number of upward-facing Cs, which could be either 3, 6, or 9. Additionally, participants were told that the reward probability associated with the colors can be learned by trial and error. To maximize their reward, participants were instructed to give weight to both Cs and colors in choosing between combined stimuli, because the contributions of the two components of the stimuli to predicting reward are additive.

The experiment started with at least 10 practice trials of each type – Cs only, color only, and their combination - each of which continued until the participant reached at least 80% accuracy:

Then, participants played three blocks of trial. The first and second block each introduced three new colors, associated with different reward probabilities (0, 0.5, and 1). The colors associated with each probability were counter-balanced across participants. To ensure participants learned about the colors and understood the Cs trials properly, 36 combined stimuli trials were interleaved with 18 Cs-only trials and 18 color-only trials. In the third block, participants faced all six colors they had already learned about. Since no new colors were introduced in this phase, this block consisted only of 162 combined trials.

Following every 30 trials, participants rated how they felt on valence and arousal on visual analog scales. At the end of the experiment, participants reported the reward probability associated with each color and rated the degree to which they relied on the Cs versus the color in making their choices on a visual-analog scales, used to calculate the ratings in Figure 7A. Finally, participants were requested to fill out a curiosity and personality questionnaire – big5 and ATQ^19,55^.

Eye tracking

The experiment began with a standard 9-point calibration and validation procedure provided by Eyelink 1000+ (SR Research Ltd., Mississauga, Ontario, Canada). The eye-tracking measures are based on EyeLink’s standard parser configuration: samples were defined as a saccade when the deviation of consecutive samples exceeded 30 °/s velocity or 8,000 °/s^2^ acceleration. Samples gathered from time intervals between saccades were defined as fixations.

The eye tracking data was parsed into fixation reports, and interest areas were defined around each of the Cs. We discarded repeat fixations (fixations on a C that has already been fixated on in the same trial) from further analysis, assuming that if another fixation was made to the same location it means that the subjects did not register its orientation the first time. We then calculated for each participant how many of each stimulus’ Cs the participant has already fixated on at each time point, and how many of these faced upwards or downwards.

Regression models

Several mixed regression models were run using the lme4 package in R. To validate that participants use both Cs and color, we compared three regression analyses explaining participants’ choices.

the difference between the two available stimuli in the number of upward $[eqn]$ :

[eqn]

the difference between the stimuli in the reward probability associated with the color (color):

[eqn]

both:

[eqn]

To assess the hypothesized trade-off between the use of Cs and color to make choices, we examined the interaction between the use of color and the overall number of Cs participants fixated on:

[eqn]

where $[eqn]$ encodes how many upwards facing minus downwards facing Cs participants fixated on during the trial for the right minus left stimulus, $[eqn]$ is the overall number of Cs the participant fixated on, and $[eqn]$ is the difference between the stimuli in the reward history of their colors, that is, the proportion of times that choosing a stimulus with the color was rewarded.

Then, to test whether retrieving the reward history of a color required time, we added the reaction time (RT) as another moderator of the color predictor:

[eqn]

All predictors were normalized within participants.

Modeling

The computational models were fit to the data using a custom iterative importance sampling algorithm, implemented in python. This allowed us to extract max-likelihood estimates both for the hyper parameters and for participant-level parameters. The major part of the model is described in the main text. Here we complement the description by specifying additional components.

Value of color (Vcolor)

The value associated with each color was updated by a temporal difference learning algorithm:

[eqn]

where the prediction error, $[eqn]$ , is computed as:

[eqn]

and the learning rate $[eqn]$ adapts based on the number of prior observations for each color stimuli:

[eqn]

Here $[eqn]$ is a free parameter, and the number of observed outcomes is divided by 2 because each outcome is only half attributable to the color.

Value of C (VCS)

The value of each side’s Cs was computed as the likelihood of observing the number of upward-facing Cs, using a binomial probability density function based on all Cs observed on that side. The likelihood for a given number of upward-facing Cs (n) out of the total observed Cs is calculated as:

[eqn]

where n/12 represents the probability of observing an upward-facing CS, and the exponents represent the actual counts of upward and downward-facing Cs observed. The overall value for each side is then computed as a weighted sum across possible numbers of upward-facing CSs:

[eqn]

Cs only trials

For Cs-only trials, color predictors were set to zero, and the expected value of a stimulus was computed as:

[eqn]

Color only trials

For color-only trials, Cs predictors were set to zero. Thus, choice probabilities were modelled as a logistic function of $[eqn]$ .

Hyper priors

All $[eqn]$ and $[eqn]$ parameters were each drawn for each participant from a separate group-level normal distribution, with a mean of 0 and a standard deviation of 1. The prior distribution for the $[eqn]$ and $[eqn]$ parameters were specified as approximately uniform distribution ranging [0, 1]. To achieve this normally distributed values ( $[eqn]$ ) were transformed using the cumulative distribution function (CDF) of the normal distribution. This approach ensured that parameter values remained constrained to [0, 1], while allowing the distribution to adapt during the fitting process. For the parameter $[eqn]$ , we specified a lognormal prior with mean 2 and standard deviation 2, chosen as a weakly informative, wide prior.

Bibliography55

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Gibson E. J. & Pick A. D. An Ecological Approach to Perceptual Learning and Development. (Oxford University Press, USA, 2000).
2Summerfield C. & Tsetsos K. Building Bridges between Perceptual and Economic Decision-Making: Neural and Computational Mechanisms. Front. Neurosci. 6, (2012).
3Yon D., Zainzinger V., De Lange F. P., Eimer M. & Press C. Action biases perceptual decisions toward expected outcomes. J. Exp. Psychol. Gen. 150, 1225–1236 (2021).33289575 10.1037/xge 0000826 PMC 8515773 · doi ↗ · pubmed ↗
4Globig L. K., Witte K., Feng G. & Sharot T. Under Threat, Weaker Evidence Is Required to Reach Undesirable Conclusions. J. Neurosci. 41, 6502–6510 (2021).34131038 10.1523/JNEUROSCI.3194-20.2021 PMC 8318074 · doi ↗ · pubmed ↗
5Drugowitsch J., Moreno-Bote R., Churchland A. K., Shadlen M. N. & Pouget A. The Cost of Accumulating Evidence in Perceptual Decision Making. J. Neurosci. 32, 3612–3628 (2012).22423085 10.1523/JNEUROSCI.4010-11.2012 PMC 3329788 · doi ↗ · pubmed ↗
6Dayan P. & Niv Y. Reinforcement learning: The Good, The Bad and The Ugly. Curr. Opin. Neurobiol. 18, 185–196 (2008).18708140 10.1016/j.conb.2008.08.003 · doi ↗ · pubmed ↗
7Nussenbaum K. & Hartley C. A. Reinforcement learning across development: What insights can we draw from a decade of research? Dev. Cogn. Neurosci. 40, 100733 (2019).31770715 10.1016/j.dcn.2019.100733 PMC 6974916 · doi ↗ · pubmed ↗
8Sutton R. S. & Barto A. G. Reinforcement Learning: An Introduction. (2018).