Performance of normative and approximate evidence accumulation on the dynamic clicks task
Adrian E. Radillo, Alan Veliz-Cuba, Kre\v{s}imir Josi\'c, and Zachary, P. Kilpatrick

TL;DR
This paper investigates how normative and approximate evidence accumulation models perform in a dynamic clicks task, revealing conditions for optimality, model distinguishability, and implications for experimental design and data interpretation.
Contribution
It introduces a detailed analysis of ideal and near-ideal observers in a dynamic decision task, highlighting how model tuning and fitting methods affect performance assessment.
Findings
Optimal performance regions depend on specific task parameters.
Approximate models require fine-tuning to achieve near-optimal results.
Using 0/1-loss for model fitting introduces bias, especially with sensory noise.
Abstract
The aim of a number of psychophysics tasks is to uncover how mammals make decisions in a world that is in flux. Here we examine the characteristics of ideal and near-ideal observers in a task of this type. We ask when and how performance depends on task parameters and design, and, in turn, what observer performance tells us about their decision-making process. In the dynamic clicks task subjects hear two streams (left and right) of Poisson clicks with different rates. Subjects are rewarded when they correctly identify the side with the higher rate, as this side switches unpredictably. We show that a reduced set of task parameters defines regions in parameter space in which optimal, but not near-optimal observers, maintain constant response accuracy. We also show that for a range of task parameters an approximate normative model must be finely tuned to reach near-optimal performance,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural dynamics and brain function · Neuroscience and Music Perception · Probabilistic and Robust Engineering Design
\papertype
Original Article \paperfieldJournal Section
\contrib[\authfn1]co-first authors \contrib[\authfn2]co-last authors
Code Availability: Codes developed to produce figures are available at https://github.com/aernesto/NBDT_dynamic_clicks
\corraddressZachary P. Kilpatrick, Department of Applied Mathematics, University of Colorado, Boulder, CO 80309 \[email protected] \fundinginfoThis work was supported by NSF/NIH CRCNS (R01MH115557) and NSF (DMS-1517629). ZPK was also supported by NSF (DMS-1615737). KJ was also supported by NSF (DBI-1707400). AVC was supported by the Simons Foundation (516088) and OSC (PNS0445-2).
Performance of normative and approximate evidence accumulation on the dynamic clicks task
Adrian E. Radillo
Department of Neuroscience, University of Pennsylvania, Philadelphia, PA 19104
Alan Veliz-Cuba
Department of Mathematics, University of Dayton, Dayton, OH 45469
Krešimir Josić
Departments of Mathematics and Biology and Biochemistry, University of Houston, Houston, TX 77204
Department of BioSciences, Rice University, Houston, TX 77251, USA
Zachary P. Kilpatrick
Department of Applied Mathematics, University of Colorado, Boulder, CO 80309
Department of Physiology and Biophysics, University of Colorado School of Medicine, Aurora, CO 80045
Abstract
The aim of a number of psychophysics tasks is to uncover how mammals make decisions in a world that is in flux. Here we examine the characteristics of ideal and near–ideal observers in a task of this type. We ask when and how performance depends on task parameters and design, and, in turn, what observer performance tells us about their decision-making process. In the dynamic clicks task subjects hear two streams (left and right) of Poisson clicks with different rates. Subjects are rewarded when they correctly identify the side with the higher rate, as this side switches unpredictably. We show that a reduced set of task parameters defines regions in parameter space in which optimal, but not near-optimal observers, maintain constant response accuracy. We also show that for a range of task parameters an approximate normative model must be finely tuned to reach near-optimal performance, illustrating a potential way to distinguish between normative models and their approximations. In addition, we show that using the negative log-likelihood and the 0/1-loss functions to fit these types of models is not equivalent: the 0/1-loss leads to a bias in parameter recovery that increases with sensory noise. These findings suggest ways to tease apart models that are hard to distinguish when tuned exactly, and point to general pitfalls in experimental design, model fitting, and interpretation of the resulting data.
keywords:
decision-making, Poisson clicks, Bayesian inference, dynamic environment, model identifiability
1 Introduction
Decision-making tasks are widely used to probe the neural computations that underlie behavior and cognition [33, 24]. Mathematical models of optimal decision-making (normative models)111We will use the phrases ‘optimal model,’ ‘optimal observer,’ ‘normative model,’ and ‘ideal observer’ interchangeably, as they refer to the best possible model for a given set of task and observation constraints. have been key in helping us understand tasks that require the accumulation of noisy evidence [55, 23, 7]. Such models assume that an observer integrates a sequence of noisy measurements to determine the probability that one of several options is correct [55, 5, 53].
The random dot motion discrimination (RDMD) task is a prominent example, in which the neural substrates of the evidence accumulation process can be identified in cortical recordings [3, 9, 47]. The associated normative models take the form of tractable stochastic differential equations [42, 7], and have been used to explain behavioral data [43, 29]. Neural correlates of subjects’ decision processes display striking similarities with these models [50, 28], although a clear link between the two is still under debate [32, 49].
Poisson clicks tasks [11, 34] have recently become popular in studying the cortical computations underpinning mammalian perceptual decision-making. Neural activity during such tasks also appears to reflect an underlying evidence accumulation process [25]. The corresponding normative models and their approximations are low-dimensional and computationally tractable. This makes the task well-suited to the analysis of data in high-throughput experiments [11]. [38] have extended the clicks task to a dynamic environment to understand how animals adjust their evidence accumulation strategies when older evidence decreases in relevance. [22] carried out a similar study in an extension of the RDMD task. Both studies concluded that subjects are capable of implementing evidence accumulation strategies that adapt to the timescale of the environment.
However, identifying the specific strategy subjects use to solve a decision task can be difficult because different strategies can lead to similar observed outcomes [43]. How to set task parameters to best identify a subject’s evidence accumulation strategy has not been studied systematically, especially in dynamic environments [44]. Here, we focus on the dynamic clicks task and aim to understand what task parameters (or combinations of parameters) determine performance, and under what conditions different strategies can be identified.
In the dynamic clicks task, two streams of auditory clicks are presented simultaneously to a subject, one stream per ear [38, 11]. Each click train is generated by a Markov-modulated Poisson process [16] whose click arrival rates switch between two possible values ( vs. ). The two streams have distinct rates which switch at discrete points in time according to a memoryless process with hazard rate . Thus streams played in different ears always have distinct rates (). The subject must choose the stream with highest instantaneous rate when interrogated at a time , which ends the trial. Switches occur at random times that are not signaled to the subjects, who must therefore base their decision on the observed sequences of Poisson clicks alone. The rate at which the environment changes is a latent variable that needs to be learned for optimal performance. In this study, however, our observer models always use a constant rate of change for their environment 222See [41] for an optimal observer that can learn the hazard rate in a dynamic version of the RDMD task. This approach can be extended to the case of the dynamic clicks tasks as in [40]..
We analyze the normative model of the dynamic clicks task to shed light on how its response accuracy depends on task parameters, as this is a measure commonly used when fitting to behavioral data [38]. As shown in Section 2, the ideal observer accumulates evidence from each click to update their log likelihood ratio (LLR) of the two choices. Each click corresponds to a pulsatile increase or decrease in the LLR. Importantly, evidence must be discounted at a rate that accounts for the timescale of environmental changes.
The main goal of this work is to identify how task parameters shape an ideal observer’s response accuracy, and the identifiability of evidence accumulation models. We find effective parameters that can be fixed to keep the accuracy of the ideal observer constant.333We use the term “accuracy” to refer to the percentage of correct choices for a given model and parameter set. This is our primary measure of a model’s performance on the task. One such parameter is the signal-to-noise ratio (SNR) of the clicks during a single epoch between changes and the other is , the trial length rescaled by the hazard rate . These two parameters fully determine the accuracy of an optimal observer interrogated at the end of the trial (Section 3), as well as response accuracy conditioned on the time since the final change point of a trial (Section 4).
While the normative model determines the optimal strategy, subjects may also use heuristics or approximations that are potentially simpler to implement. The accuracy of approximate models may also be more sensitive to parameter changes, so fitting procedures converge more rapidly. As an example we consider a linear model, which has been previously fit to data from subjects performing dynamic decision tasks [38, 22], and has also been studied as an approximation to normative evidence accumulation [53]. To obtain response accuracy close to that of the normative model, the discounting rate of the linear model needs to be tuned for different click rates and hazard rates (Section 5). In contrast, the discounting rate in the normative (nonlinear) model equals the hazard rate. Moreover, the linear model’s accuracy is more sensitive to changes in its evidence-discounting parameter than the nonlinear model. 444The ‘nonlinear’ model here refers to the family of models obtained by tuning the discounting rate away from the value defining the normative model. This detuning results in a model that is not normative. This effect is most pronounced at intermediate SNR values, suggesting a task parameter range where the two models can be distinguished.
Lastly, we ask how model parameters can be inferred from subject responses. Using maximum likelihood fits of the models to choice data, we show that the fit discounting parameters are closer to the true parameter in the linear model compared to the nonlinear model (Section 6). This is expected, since the response accuracy of the nonlinear model depends weakly on its discounting parameter. We also explore the impact of the loss function on model fitting, and show that in the presence of sensory noise using a 0/1-loss function results in a systematic bias in parameter recovery (Section 7). The 0/1-loss function gives a one unit penalty on trials in which the decision predicted by the model and the data disagree, and no penalty when they agree. Therefore, minimizing this loss function leads to models that best match the trial-to-trial responses in the data rather than the response accuracy.
Ultimately, our findings point to ways of identifying task parameters for which subjects’ decision accuracy is sensitive to the mode of evidence accumulation they use in fluctuating environments. We also show how using different models and different data fitting methods can lead to divergent results, especially in the presence of sensory noise. We argue that similar issues can arise whenever we try to interpret data from decision-making tasks.
2 Normative model for the dynamic clicks task
In the dynamic clicks task an observer is presented with two Poisson click streams, and (), and needs to decide which of the two has a higher rate [11]. The rates of the two streams are not constant, but change according to a hidden, continuous-time Markov chain, , with binary state space . The frequency of the switches is determined by the hazard rate, , so that . The left and right rates, and , can each take on one of two values, with . When , we have , and when the opposite is true. Therefore means that stream has the higher rate at time : (Fig. 1A). The observer is prompted to identify the side of the higher rate stream, , at a random time . The interrogation time, , is sampled ahead of time by the experimenter for each trial and is unknown to the subject. We refer the reader to [38] and [11] for more details about the experimental setup.
This task is closely related to the filtering of a Hidden Markov Model studied in the signal processing literature [13, 39]. For a single, 2-state Markov-modulated Poisson process [16], the filtering problem was solved by [48] – see also [52] for review and extensions. This filtering problem corresponds to a task in which a single, variable rate click stream is presented to the observer who has to report whether at some time the rate is high or low. In the present case, the observer is presented with two coupled Markov-modulated Poisson processes. The normative model reduces to that considered by [48] when we consider a single stream version of the task, so our approach can be considered a generalization.
Assuming the Poisson rates and the hazard rate, are known, a normative model for the inference of the hidden state, , has been derived by [38]. The resulting model can be expressed as an ordinary differential equation (ODE) describing the evolution of the LLR of the two environmental states:
[TABLE]
For completeness, we present the derivation in Appendix A, yielding the same ODE as [38]:
[TABLE]
where is the evidence gained from a click, is the Dirac delta function centered at 0, and (resp. ) is the -th right click (resp. -th left click).
Eq. (2) has an intuitive interpretation: A click provides evidence that the higher rate stream is on the side at which the click was heard. Thus, a click heard on the right (left) results in a positive (negative) increment in the LLR (Fig. 1B). Since the environment is volatile, as evidence recedes into the past it becomes less relevant. In Eq. (2) each click is followed by a superlinear decay to zero. Note that the discounting term only depends on the current LLR, , and the hazard rate, , and not on the click rates.
Performance on the task may increase with the informativeness of each click, . However, alone does not predict the response accuracy (i.e. the fraction of correct trials) of the normative model [11, 38]. In the next section, we will show that an ideal observer’s response accuracy is determined by the click frequencies and the hazard rate : A sequence of a few very informative clicks may provide as much evidence as many clicks, each carrying little information. But if the environmental hazard rate is high, even informative clicks quickly lose their relevance.
The LLR, , contains all the information an ideal observer has about the present state of the environment, given the observations [23]. If interrogated at time , , determines the most likely current state ( for and for ), and therefore the response of an optimal decision maker. In the following, we will show that two effective parameters govern the response accuracy of the optimal observer.
3 The signal-to-noise ratio of dynamic clicks
Four parameters characterize the dynamic clicks task: the hazard rate, , duration of a trial, i.e. interrogation time, , the low click rate, , and the high click rate, . However, we next show that only two effective parameters typically govern an ideal observer’s performance (Fig. 1C,D): the product of the interrogation time and the hazard rate, , and the signal-to-noise ratio (SNR) of the dynamic stimulus. The former corresponds to the mean number of switches in a trial, and the latter combines the click rates and into a Skellam–type SNR (Eq. (4) below), scaled by the hazard rate (Eq. (6)).
To motivate our definition, consider first the case of a static environment, Hz, for which the normative model is given by Eq. (2) without the nonlinear term. Since does not affect the sign of , response accuracy depends entirely on the difference in click counts , where are the counting processes associated with each click stream. Thus we can define the difference in click counts as the signal, and the SNR as the ratio between the signal mean and standard deviation at time [51],
[TABLE]
where
[TABLE]
In a dynamic environment, the volatility of the environment, governed by the hazard rate, , also affects response accuracy. The environment can switch states immediately before the interrogation time, , decreasing response accuracy. This suggests that accuracy will not only be determined by the click rates, but also by the length of time the environment remains in the same state prior to interrogation. Using this fact and the definition of SNR in a static environment, we determine the statistics for the difference in the number of clicks between the high- and low-rate streams during the final epoch preceding interrogation (for derivation details see Appendix B). Averaging over the Poisson distributions characterizing the click numbers, and the epoch length distribution yields a nonlinear expression representing the SNR that involves and the rescaled trial time :
[TABLE]
The unitless measure of trial duration, characterizes the timescale of the evolution of the LLR, . As accuracy should not depend on the units in which we measure time, this is a natural measure of the evidence accumulation period 555This is related to dimensional analysis often used when studying physical models [30].. As indicated, only depends on and . We therefore predict that optimal observer response accuracy is determined by the following two parameter combinations,
[TABLE]
Henceforth, we will refer to as the SNR and as rescaled trial time. Note that the term can also be realized as a SNR of Eq. (2) by performing a diffusion approximation, and computing the SNR of the corresponding drift-diffusion signal (See Appendix C).
Fig. 1C shows examples in which the ideal observer’s response accuracy is constant when SNR and are fixed. Accuracy is computed as the fraction of trials at which the observer’s belief, matches the underlying state, at the interrogation time, , that is the fraction of trials for which . The accuracy as a function of and remains constant if we change and , but keep fixed. As the interrogation time is increased, the accuracy saturates to a value below 1 (Fig. 1C), consistent with previous modeling studies of decision-making in dynamic environments [22, 53, 41, 38]. Evidence discounting limits the magnitude of the LLR, . Hence a sequence of low rate clicks can lead to errors, especially for low SNR values. Moreover, on some trials the state, switches close to the interrogation time . As it may take multiple clicks for to change sign after a change point (See Fig. 1B), this can also lead to an incorrect response.
In Fig. 1D we show that the maximal accuracy (obtained for sufficiently large) as a function of and (colormap), is approximately constant along SNR level sets (black oblique curves). This correspondence is not exact when and are small (Fig. 1D inset), and we conjecture that this is because higher order statistics of the signal determine response accuracy in this parameter range. As discussed in Appendix C, for large and we can use a diffiusion approximation for the dynamics of Eq. (2). When and are small, the diffusion approximation does not apply, and response accuracy is characterized by features of the signal beyond its mean and variance. Since the SNR only describes the ratio between the mean and standard deviation of the stimulus, it cannot capture the impact of higher order statistics on accuracy at low click rates. Nonetheless, the SNR predicts response accuracy well.
The consequences of these observations are twofold: Two parameter combinations determine optimal performance, potentially simplifying experiment design. To ensure coverage of different response accuracy regimes, we can initially vary SNR and . To increase the accuracy of an ideal observer, it is not sufficient to increase both click rates, for instance, since the SNR stays constant if and follow the parabolas shown in Fig. 1D. Second, this approach makes testable predictions about the accuracy of an optimal observer: If we change parameters so that SNR and are fixed, and a subject’s accuracy is affected, this indicates that the subject may not have learned the hazard rate, or is using a suboptimal discounting model.
4 Post change-point decisions depend on SNR
To understand how an optimal observer adapts to environmental changes, we next ask how their fraction of correct responses depends on the final time, between the last change point preceding a decision and the decision itself (Fig. 2A). Overall accuracy again depends on both SNR and rescaled trial time . In addition, for sufficiently long trials, accuracy as a function of time since the last change point depends only on the rescaled time since the change point, and the SNR.
If the click rates, and are varied, but and are held fixed, the accuracy as a function of remains unchanged (Fig. 2B, for , ). On the other hand, accuracy changes if we fix (SNR) but vary (Fig. 2B, left inset). With fixed accuracy depends only on the rescaled time since the last change point, (Fig. 2B, right inset). Thus, while absolute accuracy depends on the total length of the trial, measured in units of average epoch length, , accuracy relative to the last change point depends only on the elapsed time, measured in the same units.
[22] introduced the notion of an accuracy crossover effect in the dynamic RDMD task: The normative model predicts that after a change point observers update their belief more slowly, but eventually reach higher accuracy at low compared to high hazard rates. Thus plotting the maximal accuracy against time since the last change point for different hazard rates results in curves that cross. Behavioral data indicates that human observers behave according to this prediction [22, 21].
A similar crossover effect also occurs in the dynamic clicks task: Accuracy just after a change point is lower for small hazard rates, (Fig. 2C) and takes longer to reach 50%, but saturates at a higher level compared to more volatile environments. In slow environments, the optimal observer integrates evidence over a longer timescale , leading to more reliable estimates of the state, . But this increased certainty comes at a price, as it requires more time to change the observer’s belief after a change point. Similarly, in environments with stronger evidence (larger , Eq. (4)), accuracy immediately following a change point is lower, since state estimates, and hence the beliefs are more reliable compared to trials with weak evidence (Fig. 2D). However, stronger evidence also causes a rapid increase in accuracy, which then saturates at a higher level than on trials with weaker evidence (lower ). Therefore, both evidence quality, and environmental volatility determine accuracy after a change point.
We conclude that accuracy after a change point is characterized by SNR () and the rescaled time since the change point, . This only holds when trials are sufficiently long, and the belief at trial outset does not affect accuracy. In addition, increasing SNR lowers accuracy immediately after a change point, and increases the recovery of accuracy to a higher saturation point (Fig. 2D). On the other hand decreasing volatility, while fixing (Fig. 2C) leads to lower accuracy immediately after a change point, and higher saturation. However, the rate at which accuracy is recovered after a change point decreases with decreasing .
These are again characteristics of an optimal observer, and deviations from these predictions indicate departures from optimality.
5 A linear approximation of the normative model
Following [38] we next show that an approximation of the normative model given by Eq. (2) can be tuned to give near optimal accuracy, but the accuracy of the approximation tends to be sensitive to the changes in the discounting parameter. This approximate, linear model is given by,
[TABLE]
In particular, here the nonlinear term in Eq. (2) is replaced by a linear term proportional to the accumulated evidence.
When tuned appropriately, Eq. (7) closely approximates the dynamics and accuracy of the optimal model (Fig. 3A) [38, 53]. Moreover, it also provides a good fit to the responses of rats on a dynamic clicks task [38]. As the normative and linear models exhibit similar dynamics, it appears that they are difficult to distinguish. However, as we show next, the linear model is more sensitive to changes in its discounting parameter, providing a potential way to distinguish between the models.
We assume that is large enough so that accuracy has saturated (as in Fig. 1C), and compare the maximal accuracy of the nonlinear and linear model. For the linear discounting rate that maximizes accuracy, the linear model obtains accuracy nearly equal to the normative model (Fig. 3A, inset). The optimal linear discounting rate, increases with SNR (Fig. 3A), whereas the discounting term in the normative, nonlinear model remains constant when the hazard rate, is fixed. When SNR is large, evidence discounting in the linear model can be stronger (larger ), since each evidence increment is more reliable and can be given more weight. When SNR is lower, linear evidence discounting is weaker (smaller ) resulting in the averaging of noisy evidence across longer timescales.
What is the impact of using the wrong (suboptimal) evidence discounting rate in the two models? To answer this question we compare the accuracy of two observers, one using the nonlinear model with a wrong hazard rate, , and the other using the linear model with a suboptimal discounting rate . As shown in Fig. 3B accuracy is more sensitive to relative changes in in the linear model, than relative changes in the assumed hazard rate, in the nonlinear model. We quantified the sensitivity of both models to changes in evidence discounting rates by computing the curvature of accuracy functions at the optimal discounting value over a range of SNRs (Fig. 3C).
Both models are insensitive to changes in their discounting parameter at low SNR (bottom curve of Fig. 3B). This result is intuitive, as when SNR is small observers perform poorly regardless of their assumptions. On the other hand, when SNR is high observers receive strong evidence from a single click, and the nonlinear model adequately adapts across a broad range of discounting parameter values. The linear model, however, is still sensitive to changes in the discounting parameter, . At high SNR, the belief, as descried by either model is driven to larger values. Whereas the nonlinear model can rapidly discount extreme beliefs as it includes a supralinear leak term, the linear model is not as well adapted, and requires fine tuning. Note, however, that at values of SNR higher than the ones used in Fig. 3B, when, for instance, a single click is sufficient for an accurate decision, both the linear and nonlinear models are insensitive to changes in their discounting parameters. We also note that the insensitivity of the nonlinear model to changes in the discounting rate, , suggests that this is a more robust model: An observer who does not learn the hazard rate, , exactly can still perform well. A linear model requires finer parameter tuning to achieve maximal accuracy.
The nonlinear model obtains maximal accuracy as long as the assumed hazard rate matches the true hazard rate . On the other hand, the optimal discounting rate of the linear model is also sensitive to changes in the SNR due to changes in the click rates. To quantify this effect, we computed the ratio between the maximal accuracy of the linear model with discounting rate to the maximal accuracy of the nonlinear model with as the SNR was varied, but was kept fixed (Fig. 3D). To compute the maximal accuracy we kept fixed at , the optimal discounting rate for a reference SNR. The maximal-accuracy ratio for the linear model decreases as SNR changed from this reference SNR, as the optimal discounting parameter of the linear model depends on SNR, and the hazard rate . Thus, the linear model can achieve maximal accuracy very close to that of the nonlinear model, but this requires fine tuning.
This points to a general difficulty in distinguishing models subjects could use to make inference: Simpler approximations may predict performance that is near identical to that of a normative model. However, this may require precise tuning of the approximations. If the parameters of the task are changed to differ from those on which the subjects have been trained, i.e. on tasks where subjects are lead to assume incorrect parameters, the normative model may behave differently from the approximations. In the case we considered, the models may be distinguishable if an animal is extensively trained on trials with fixed parameters and but subsequently interrogated using occasional trials with different task parameters.
The preceding point is illustrated by the following thought experiment. Assume a subject is extensively trained on a fixed set of task parameters: , Hz, Hz (peak of the red curve in Fig. 3D). We then introduce some trials with different click rates, say with Hz and Hz, chosen so that is constant across the two conditions. We denote by Acc and Acc the accuracies of an observer using the linear and normative models on trials with a given . Since the subject was trained on click rates that correspond to , their discounting strategy will be adapted to these values. Note that the ratio between Acc and Acc when is the red curve in Fig. 3D. Since the ratio between Acc and Acc is near 1, the linear and normative models cannot be distinguished at . However, a subject using the normative model tuned at , will still perform optimally at , if and are held constant. On the other hand, a linear model optimized at , will no longer be optimal at . This distinction is captured by the drop in the accuracy ratio along the red curve in Fig. 3D.
We can quantify the distinction between the two models by their relative difference:
[TABLE]
More generally, for any decision making model, we may define the quantity
[TABLE]
which will equal 0 if the model used is the normative one. If we compute using responses from a real subject, one can generate curves such as those in Fig. 3D. If the curves are not constant (equal to 1), this would suggest the subject is not using an optimal model. Furthermore, a single value of for which provides evidence that the model is not optimal.
In the next two sections, we show how the linear and nonlinear model with added sensory noise differ when fitting the discounting parameters to choice data.
6 Fitting discounting parameters in the presence of sensory noise
The models we have discussed so far translate sensory evidence into decisions deterministically, and do not account for the nervous system’s inherent stochasticity [15]. We next asked whether the inclusion of sensory noise leads to further differences between the two models, particularly when fit to choice data.
[11] showed that in the static version of the clicks task humans and rats make decisions that are best described by a model in which evidence obtained from each click is variable. In the dynamic version of the task, [38] showed that rats’ suboptimal accuracy is well explained by a model that includes similar internal variability. [38] modeled such “sensory” noise either by applying Gaussian perturbations to the evidence pulses, or by attributing, with some mislocalization probability, a click coming from the right or left speaker to the wrong side.
As a minimal model of neural or sensory noise, we too introduced additive Gaussian noise into the evidence pulse of each click, so that the nonlinear model in Eq. (2) takes the form
[TABLE]
where are i.i.d. Gaussian random variables with mean and standard deviation . Similarly, the linear model from Eq. (7) becomes:
[TABLE]
Before fitting these models to choice data, we note that an increase in sensory noise, , decreases the value of the discounting parameters that maximize accuracy in both models [38]: Noisier observations require integration of information over longer timescales (Fig. 4A,B). Thus, adaptivity to change points is sacrificed in order to pool over larger sets of observations . This, in turn, leads to larger biases, particularly after change points. A similar trade-off between adaptivity and bias has been observed in models and human subjects performing a related dynamic decision task [21].
We next fit the discounting parameters in both models using synthetic choice data, treating the other parameters of the models as known. To do so we produced responses using a fixed reference model from both classes, and fit a model from each class to the resulting datasets. Specifically, let (L = linear, NL = nonlinear) denote the reference model used to produce the choice data, and let denote the model that was fit to the resulting data. We independently studied the four possible model pairs . In what follows, refers to the discounting parameter that was fit to data in any given class, so that when and when . Note also that the hazard rate parameter, that was fit to data in the case is distinguished both from the hazard rate, used to generate click stimuli, and the hazard rate, , used to produce the reference choice data of the nonlinear model. Therefore, to remove ambiguity, we denote by the two constant discounting parameter values used to produce the reference choice data with the nonlinear and linear models, respectively. To pick these constants in our simulations, we took the values that would maximize accuracy in the corresponding noise-free systems. That is, and (See Appendix F for more details on the simulations).
During a single fit, we generated stimulus data for i.i.d. trials,
[TABLE]
where is the sequence of right clicks and left clicks on trial , and is the choice datum for this trial. We used Bayesian parameter estimation (See Appendix D.3 for details) to obtain a posterior probability distribution over the discounting parameter, .
To account for the variability in the posteriors that arise due to finite size effects, we performed independent fits per model pair , with different dataset sizes: . To quantify the goodness of these fits, we used the relative mean posterior squared error, averaged across the fits,
[TABLE]
This quantity provides a relative measure of how close the posterior distribution is to . Here denotes the posterior density, {\rm Pr}\left(\theta\ \big{|}\ \mathfrak{D}\right), from fit . Note, the definition of is nuanced. If the reference and fit models are the same, then is set to the ground truth, i.e. the discounting parameter value used to produce the reference choice data (e.g., when ). However, when the fit and reference model classes differ (i.e. when ), then there is no obvious ground truth, and must be defined differently. In this case, we used the correspondence . That is, when fitting the nonlinear model we always set , and when fitting the linear model, we always set . There are other possible ways of defining in this case, such as picking a discounting parameter value for the fit model class that produces the same accuracy as the reference model. Although arbitrary, our definition is sufficient to illustrate – as we show next – that cross-model fits are feasible and that the case is qualitatively different than the case, regardless of the reference model class. However, due to the model mismatch, we expect a bias in the parameter estimate for these situations (i.e. an error that does not converge to [math]), unless we define the ground truth self-referentially as the value of the parameter for which the estimate is unbiased.
The maximizer of our Bayesian posterior defines the maximum likelihood estimate (MLE)666Which is equal to the Maximum A Posteriori (MAP) estimate in our case, as we picked a uniform prior over a wide interval (See Appendix D.3 for details). of our discounting parameter. We plot the distribution of these across the 500 independent fits, for each pair in Fig. 4C,D. As the number of trials used in the reference dataset increased from 100 to 500, the spread of the estimates diminishes. However, a bias in the estimate appears whenever . For reference datasets of size 500, 98% of the 500 MAP estimates in the L-NL fits lie strictly above , versus 50.4% for the corresponding L-L fits. Similarly, 86.6% of the estimates in the NL-L fits lie strictly below , versus 44.2% for the corresponding NL-NL fits.
We found that the relative error from Eq. (11) decreases as larger blocks of trials are used to fit the discounting parameter (Fig. 4E). We note the following parallels between the sensitivity to parameter perturbation of each model class (explored in Fig. 3B,C) and the decreasing rate of the relative errors for each model pair. As expected, a model that produces responses that are less sensitive to changes in its discounting parameter requires more trials to be fit to data: The reduction in relative error is the slowest for the and pairs. This is consistent with the insensitivity of the nonlinear model to changes in discounting parameter, making it difficult to identify its parameters. On the other hand, the linear model fits – and – converge more rapidly, likely because the linear model is sensitive to changes in its discounting parameter (See Fig. 3B,C).
In anticipation of our next section, we point out that computing the MLE can be treated as a statistical learning problem in which we minimize a negative log-likelihood loss function over the dataset (See Eq. 7.8 in [17]):
[TABLE]
Here and are the choices generated by the reference and fit models, respectively, on the trial. As before the discounting parameter, , and the level of sensory noise, parametrize the fitted model. Fitted model responses are non-deterministic only because of sensory noise. The likelihood is the probability that the response generated by the fit model on trial matches the response observed in the data (See Appendix F for details on how this likelihood was computed for each model class), which must be obtained from many realizations of subject to click noise of amplitude . The MLE, for is then found by minimizing the expected loss across all trials,
[TABLE]
taking the expectation over all samples in the dataset, but conditioning on the fitted model’s discounting parameter, and noise amplitude, . As the MLE is consistent, we expect the fit parameters will converge to the true parameters [54] (Fig. 4C,D). Framing Bayesian parameter estimation in this way will help us compare to our approach of fitting by minimizing the 0/1-loss function we introduce next.
7 Fitting with the 0/1-loss function
We next asked how the parameters that define the model whose responses best match the choices of a reference observer compare to those that maximize the likelihood of observing these choices. As we noted minimizing the log-likelihood loss given in Eqs. (12) and (13) gives the parameters most likely to have produced the data, and we expect the corresponding estimates of the discounting parameter to converge to the true value when the fit and reference models match.
To find the parameters that maximize the probability of matching the choices of the model to those observed in a dataset on every trial, we define the 0/1-loss function,
[TABLE]
where is the indicator function, is a data sample indicating the click stimulus and response on a trial , and is the response of the fitted model with discounting parameter , click stimulus , and , are realizations of sensory noise, i.e. a sequence of i.i.d. Gaussian variables that perturb the evidence obtained from each click. We will marginalize over realizations of the (unobserved) sensory noise, and denotes the number of realizations we use in the actual computation. Fitting the discounting parameter then involves minimizing the empirical expectation of the loss function over the data samples and across realizations, of sensory noise,
[TABLE]
For a binary decision model, this involves finding the parameter that minimizes the expected number of mismatches (or probability of a mismatch) between the choices of the model and those observed by the data (minimizing 0/1-loss), or maximizes the expected number of matches (or probability of a match) between the data and fit model (maximizing 0/1-prediction accuracy). In our fits, we used , sampling a single realization of click noise perturbations per click stream. As we sampled from a large number of click streams, this was sufficient to average the loss function.
Both loss functions, and , are commonly used to fit models to data [18, 17]. Minimizing the expectation of is reasonable, as it seems likely that the parameters that define the model that matches the largest number of choices observed in the data should be close to the one the reference observer actually uses (assuming that there is no model mismatch). These parameters will then also best predict future responses. On the other hand, minimizing produces the most likely parameters that produced the observed data.
However, it is well-known that parameters estimated using different cost functions can differ, even when the models used to fit and generate the data agree. To see the difference between using and in Eq. (13) consider a Bernoulli random variable, with parameter . Given a large sequence of observed outcomes, , the parameter that minimizes the expected loss converges to as the MLE is consistent and asymptotically efficient [54]. On the other hand, the parameter that minimizes the expected loss is (See Appendix E): The individual outcomes in a series of independent trials are best matched by a model that always guesses the more likely outcome.
We observed a similar bias when we used the loss function to infer the discounting parameters in our evidence accumulation models [18]: We generated a set of click-train realizations, and two sets of responses, from each the linear and nonlinear evidence accumulation models with sensory noise (See Appendix F). Next we used these stimulus realizations as input to an evidence-accumulation model (linear or nonlinear) with a fixed discounting parameter to produce a corresponding set of reference observer responses. We generated a second set of model responses using the same database of click-train realizations, but allowed the discounting parameter to vary. We call the fraction of time the reference observer and model responses agree the 0/1-prediction accuracy (PA) of the model, the complement of the expected 0/1-loss over a test set, . When the model and reference observer agree the PA is 1 in the absence of sensory noise (), as the stimulus determines the choice fully. However, PA decreases as sensory noise increases.
Somewhat surprisingly, the parameters that minimize expected 0/1-loss are biased, and this bias increases with sensory noise (Fig. 5). In particular, the discounting parameter that best predicts the reference observer responses is lower than the one used to generate these responses (Fig. 5B,D). This is consistent with our observations in Section 6, as integration over longer periods of time decreases response variability (Fig. 4A,B). This tendency is pronounced when larger values of the discounting parameters are used to produce the training data. Larger discounting leads to shorter integration time, and increased variability in the responses. Furthermore, the nonlinear (NL) model exhibits this bias much more strongly than the linear model (L). See Appendix G for a possible metric of the reported bias, and its dependence on sensory noise for each model class (Fig. 6).
Thus sensory noise is the main reason the expected 0/1-loss is minimized at a discounting parameter that does not match the one used to generate the data. Such internal noise introduces variability in the responses: even the same model will not match its own responses given the same stimulus, and a decrease in output variability can increase the PA of a model. In the present case, such a decrease in response variability is achieved by decreasing the discounting parameter, and increasing integration time.
We expect that similar biases occur whenever a 0/1-loss function is used to fit models to choice data. Sensory noise, lapses in attention, and numerous other sources of noise nearly always introduce some variability in the responses of observers. In such cases, models that are less variable than the observer may best match an observed set of responses, and best predict future responses [17]. However, these parameters are not always most likely to have been used by the observer. Using a 0/1-loss function may thus not always reveal the process that the observer used to generate the responses, even if the model the observer uses is close to the one used to fit the data.
8 Discussion
Normative models of decision-making make concrete predictions about the computations and actions of experimental subjects, and can be used to interpret behavioral data [20]. Such models can also be used to identify task parameter ranges in which observers’ responses are most sensitive to their assumptions about the task. In turn, such information can then be used to tease apart candidate model classes the experimental subject might be employing. Here we have focused on properties of a normative, nonlinear model, and its differences with a close, linear approximation. We found that the linear model is more sensitive to changes in the discounting parameter compared to the nonlinear model, and suggest this is why fitting a linear model to choice data requires fewer trials than fitting a nonlinear model.
In dynamic environments, task parameters may have predictable effects on subjects’ overall accuracy and accuracy relative to change points. We have shown that there is a range of intermediate to high SNR in which the linear model is sensitive to changes in its discounting parameter, but the nonlinear model is not. This suggests this range could be probed to distinguish the evidence accumulation strategies subjects are using. These strategies may also be fit by other approximate models, like accumulators with no-flux boundaries or sliding-window integrators [57, 22, 4], which can also be sensitive to changes in their discounting parameters.
Psychophysical tasks used to infer subjects’ decision-making strategies can require extensive training and data collection [27, 26]. Normative and approximately normative decision-making models diverge most in their response accuracy when tasks are of intermediate difficulty. As we have shown, task difficulty may be controlled by combinations of task parameters representing fewer dimensions than the total number of parameters. Identifying these parameter combinations may be possible by computing the signal-to-noise (SNR) ratio of the stimulus produced by a particular parameter set. However, subjects’ responses are also susceptible to noise from sensing and processing evidence, so it is important to extend descriptions of SNR to account for such factors [11].
Normative models of evidence accumulation and decision-making can be complex, and simpler, approximately optimal strategies may perform nearly as well [56, 21]. If such approximate strategies are easier to learn and tune, subjects may prefer them. [38] showed rats’ performance on the dynamic clicks task is well fit by a linear discounting model. However, optimal and well-tuned suboptimal strategies may be difficult to distinguish, and this problem is likely to worsen with increasing task complexity and corresponding model dimensionality. We have described possible model-guided task design approaches that may help tease apart similarly performing models.
The addition of noise in our evidence accumulation models provides an extra parameter that can account for suboptimal performance. What is the best way to distinguishing whether internal noise or suboptimal evidence accumulation strategies best account for underperformance? One way to do this, as suggested by our model analysis, is to collect sufficient data over trials in which a task parameter was changed unbeknownst to the observer.
For purposes of model fitting to experimental data, we expect that trial-to-trial variability can be more faithfully tracked in the dynamic clicks task than in dynamic decision tasks based on the RDMD task. This is due to the relative simplicity of the clicks as evidence sources: They are either on the right or left, although click side and evidence strength can be misattributed [38]. In contrast, dot motion can be estimated in many ways, making it difficult to interpret which aspects of the stimulus an animal observed, and used as evidence. Spatiotemporal sampling methods may be too spatially coarse and may require fitting filters to each subject, which could change trial-to-trial [1, 36]. Transforming click times to delta pulses using Eq. (2) is more straightforward. Thus, the dynamic click task paradigm is a promising avenue for probing evidence accumulation to complement dynamic tasks which are extensions of classic RDMD [22].
The use of discrete evidence tasks does come with caveats. The neural computations underlying visual motion discrimination in non-human primates are well studied [8], and have a significant history of being linked to decision tasks [24]. As a result, there is an extensive literature connecting neural systems for processing visual motion and those involved in decision deliberation [50, 47]. However, only recently have the neural underpinnings of the decisions of rats performing auditory discrimination tasks been examined [10]. Furthermore, mathematical issues may arise in precisely characterizing discounting between clicks, when evidence arrives discretely. Many different functions could lead to the same amount of evidence discounting between clicks, leading to ambiguity in the model selection process.
Parameter identification for evidence accumulation models can be sensitive to the method chosen to fit model responses to choice data [45]. [22] used the approach of minimizing the cross-entropy error function, which measures the dissimilarity between binary choices in the model and the data. [38] used a maximum likelihood approach to identify model parameters that most closely matched choice data. This is related to the Bayesian estimation approach we used to fit parameters of the nonlinear and linear models. We obtained similar results by minimizing the expected 0/1-loss, which biases towards less variable models, especially for models with strong sensory noise (Fig. 5). A more careful approach to fitting model parameters should also consider penalizing more complex models, which would also allow us to distinguish between the nonlinear and linear model.
[21] recently studied the strategies humans use when making binary decisions in dynamic environments whose hazard rates changed across trial blocks. In this case the ideal observer must infer both the state, and the rate at which the environment is changing [41]. Interestingly, [21] found that the model that best accounted for response data was not the full Bayes optimal model, but rather a sampling model in which a bank of possible hazard rates replaces the full hazard rate distribution. Such sampling strategies can more easily be implemented in spiking networks [12], and may also arise when considering an information bottleneck, which forces a balance between information required from the past and model predictivity [6]. As in Occam’s razor, the brain may favor simpler models, especially when they perform similarly to more complex models [2].
Analyses of normative models for decision-making are important both for designing experiments that reveal subjects’ decision strategies and for developing heuristic models that may perform near-optimally [53, 38, 21]. Our findings suggest subjects should be tested mainly at intermediate levels of SNR to provide informative response data. We found that such a level of SNR is between 1 and 2 for an optimal observer, and between 3 and 4 for an observer that uses linear discounting. Tasks that are too easy or hard allow subjects to obtain similar performance with a wide variety of strategies. Interestingly, we also found that the models that best predict observer responses, are not necessarily those closest to the ones that the observer is using. Moreover, modifications of normative models can also suggest more revealing experiments, like those that include feedback or signaled change points. Ultimately, data from decision-making tasks that require subjects to accumulate evidence adaptively will provide a better picture of how organisms integrate stimuli to make choices in the natural world.
Appendix A Normative evidence accumulation for dynamic clicks
In dynamic environments, the state evolves according to a continuous-time Markov chain with symmetric transition rates given by the hazard rate, . We construct a sampled-time approximation of the continuous-time Markov process , parameterized by , which is valid for small enough [19]. More precisely, we define a discrete-time Markov chain by the transition probabilities: and , for all and initial condition . Note that these probabilities are a truncation to first order in of the transition probabilities that one would otherwise obtain for the embedded discrete-time Markov chain . Then, we set
[TABLE]
for all and all . In the following, our discrete-time evidence accumulation equations are embedded in continuous-time via the correspondence given by Eq. (14). As the resulting equations apply to the original state process in virtue of the sampled-time approximation just described.
Just as in Eq. (1), the log-posterior odds ratio in discrete-time is:
[TABLE]
Hence, equations (A.3) and (B.1) from the appendix of [53] hold in our context:
[TABLE]
In addition, we use the approximation for small , since , so that:
[TABLE]
Taking the limit yields the ODE:
[TABLE]
or the equivalent rescaled version
[TABLE]
which both appear in [38].
Appendix B Derivation of dynamic clicks SNR
Our derivation considers the signal in the dynamic clicks task to be the difference in the number of clicks during the final epoch prior to interrogation at time . The distribution of final epoch times of the telegraph process is
[TABLE]
The first term is the distribution of waiting times between switches. We truncate the period at the interrogation time, , and the second term accounts for the probability that no switches occur during the entire trial, and the final and only epoch is of length . For a final epoch of a given length , we can describe both the conditional expectation and variance of the difference in click counts again using the results of [51]:
[TABLE]
Therefore to obtain the unconditional expectation and variance for , we must marginalize using the laws of total expectation and variance with respect to the distribution of epoch times given in Eq. (15). This yields
[TABLE]
for the total expectation. Notice that as , the expected number of clicks is limited from above by . Using the law of total variance we can thus compute
[TABLE]
Plugging Eq. (16) and (17) into the expression for yields
[TABLE]
Recalling our definition from equation (4),
[TABLE]
we can rewrite equation (18) in the more convenient form
[TABLE]
where we have highlighted the fact that the SNR is a function of the rescaled trial time and the Skellam SNR rate scaled by the root of the timescale . Indeed, in the limit as , we find that consistent with Eq. (3). We also find that in the limit of infinitely long trials , Eq. (20) tends to
[TABLE]
so the SNR is solely determined by .
Note also that to keep Eq. (20) constant it is sufficient to keep its constituent arguments constant. This is convenient, since we already must keep constant to fix the statistics of information accumulated prior to the final epoch, so we predict that performance is fixed by the following two parameters
[TABLE]
as reported in Eq. (6).
Appendix C Diffusion approximation
Here we demonstrate the diffusion approximation of the normative model for the dynamic clicks task, Eq. (2) in the limit of large Poisson rates and . Diffusion approximations for jump processes have been addressed by [31], and [46] who studied the impact of shot noise and pulsatile synaptic inputs on integrate-and-fire models. Following this work, we note that the difference of the click streams in Eq. (2) can be approximated by a drift-diffusion process with matched mean, and variance, . This results in the following stochastic differential equation (SDE) for the approximation :
[TABLE]
where and is the increment of a Wiener process. Note the resulting nonlinear drift-diffusion model is similar to the normative models presented in [22, 53]. The SNR of the signal in Eq. (21) can be associated with the mean divided by the standard deviation in an average-length epoch. Fixing this SNR leads to the relations in Eq. (6). Importantly, the signal in Eq. (21) is characterized entirely by its mean and variance, so we expect that the performance of the model can be directly associated with the SNR. Note, however, that Eq. (21) will only be valid for . Otherwise, one must consider the effects of higher order moments of the click streams, and a prediction of performance purely based on the SNR will break down (Fig. 1D, Inset), since higher order statistics likely shape response accuracy in these cases.
Appendix D Model identification
We fit parameters of the linear and nonlinear models in two stages. First, we generated synthetic response data from a model (linear or nonlinear) by solving the corresponding ODE or SDE. We then solved a second set of models (linear or nonlinear) for a range of discounting parameters ( for the linear model; for the nonlinear model), and constructed a posterior distribution over the discounting parameter. For noisy models, we expect the posterior to be a smooth function that is peaked around the most likely values of discounting parameter for that trial. We now describe the details of these parameter fitting procedures for each of the cases: linear vs. nonlinear models.
D.1 Linear model with stochastic response
We incorporate noise into the linear Eq. (7) by considering multiplicative noise on the click increments, as described by Eq. (9). For a fixed realization of the click train, we can solve this equation explicitly for at the end of trial :
[TABLE]
where , revealing is simply the sum of i.i.d. normal random variables scaled by exponential decay. Conditioning on the clicks , then is normally distributed with expectation and variance
[TABLE]
so is a Bernoulli random variable. The likelihood function will be a smooth function of , and determined as an integral over the half-line corresponding to the decision:
[TABLE]
where is the cumulative distribution function of a standard normal random variable. We can thus compute the posterior over the discounting parameter as a rescaled product of the likelihoods on each trial.
D.2 Nonlinear model with stochastic response
When click heights are noise-perturbed, we cannot explicitly solve the extended nonlinear model. However, we can make progress by applying the idea of mapping between clicks. If we draw trains of clicks, , ahead of time, Eq. (8) defines the nonlinear model with multiplicative noise. We can iteratively define the probability density by sampling over the click amplitude noise distribution at each click according to
[TABLE]
where click noise is drawn from the normal distribution , is the time of the -th click, according to the side of the -th click, and we have used the convolution theorem for independent random variables. For any trains of clicks, , Eq. (23) can be solved iteratively to obtain the distribution . The likelihood will thus be a smooth function of , determined by the integral over the half-line corresponding to the decision ():
[TABLE]
D.3 Bayesian fitting procedure
Our goal is to compute or estimate the posterior distribution {\rm Pr}\left(\theta\ \big{|}\ \mathfrak{D}\right), which by Bayes’ rule is proportional to the product of the likelihood of the data with the prior over the parameter777Since all the other task and model parameters are assumed known and fixed, we may omit them from the equations.:
[TABLE]
Our method focuses on exploiting the likelihood function . We have,
[TABLE]
where the last step comes from the fact that the clicks trains are independent of the discounting parameter used by the decision-making model888We remind the reader that we operate a distinction between the discounting parameter of the decision maker and the hazard rate used to produce the data.. From there, we remark that the choice data are conditionally independent on the clicks stimulus and the discounting parameter. Thus,
[TABLE]
Therefore we can rewrite Eq. (25) as:
[TABLE]
We use uniform priors for , over a finite interval . In this context, the problem of computing the posterior distribution of reduces to assessing the likelihoods of the decision data on each trial, (), for a range of -values spanning the interval . In practice, we picked when fitting the linear model and when fitting the nonlinear model. Finally, note that for numerical stability reasons, our algorithms actually sum log-likelihood values, as opposed to multiplying probability values. Relegating the -independent prior into a normalization constant , Eq. (26) becomes, in the log-domain:
[TABLE]
Appendix E Minimizing 0/1-loss in a Bernoulli random variable
Consider a simple stochastic binary decision-making model in which we ignore the specifics of evidence sources, as in [37]. We that in this case the 0/1-loss function also leads to biased estimates. This result has been pointed out in previous work in which parameter fitting results have been compared between Bernoulli random variables fit with the 0/1-loss function as opposed to maximum likelihood estimators [18, 17].
Consider a Bernoulli random variable with success probability generating the reference choices, and the fit Bernoulli model with success probability . Minimizing the log-likelihood loss function recovers in the limit of a large number of trials : In this limit, given , we have that the expected loss measured by the negative log-likelihood is
[TABLE]
which is minimized999Note Eq. (28) is the cross-entropy between and . at , the mean of . Thus, the parameter from the reference model is recovered, as the Bernoulli random variable satisfies the requirements for the MLE to be consistent [54].
On the other hand, if we fit the parameter by minimizing the expected 0/1-loss function, in the limit of trials, the expected loss is
[TABLE]
which decreases in for , so the minimal expected loss when is achieved with (for it is minimized at ).
Of course, the synthetic data and the fit evidence accumulation models we consider are generated from the same click streams on each trial, so a realistic comparison should account for such noise correlations in simplified Bernoulli random variable models, as analyzed in [14]. This analysis is more involved, and we save such a study for future work.
Appendix F Details on Monte Carlo simulations for figures
Fig. 1C was generated using simulations of Eq. 2 from to s with the parameters shown in the figure. The time for saturation was chosen to be s. For each time between 0 and 0.4s the accuracy was computed as the percentage of the simulations for which the choices were correct. Fig 1D was generated using simulations of Eq. 2 for each data point in the plane. The maximal accuracy reported corresponds to the numerically computed accuracy at s.
Fig. 2B-D was generated using simulations of Eq. 2 from to with the parameters shown in the figure. The reference change point was chosen to be the last change point in the simulation. For each time between the last change point and one unit of time later, the accuracy is the fraction of the correct responses, simulations for which , the sign of the LLR matched the sign of the telegraph process. Since intervals between change points are exponentially distributed, there are many more data points for short times than for long times after change points. Since some simulations did not last a full unit of time after the last change point, the number of simulations is less than or equal to (decreasing as time increases). Simulations that had no change point were omitted when computing the accuracy.
Fig. 3A was generated as follows. For each value of , simulations of Eq. 7 from to were generated over a range of values. For each value of the accuracy was computed at and was selected as the value that maximized accuracy. This resulted in a specific value of for each . Fig. 3B was generated using simulations of Eq. 2 (using instead of ) and Eq. 7 (using instead of ) from to s for a range of values of and . For each value of and , the maximal accuracy was estimated as the value of the accuracy at s. Fig. 3C was generated by estimating the second derivative of the curves shown in Fig. 3B for each value of . Fig. 3D was generated as follows using Eq. 2 and Eq. 7. For each of the four curves, was fixed to the value of corresponding to the reference values for (see Fig. 3A). For each curve, this value of was not changed when new values were used. Then, for each curve, the maximal accuracy for the linear and nonlinear models were computed using simulations for a range of new values. The quotient of the maximal accuracy of the linear model and the maximal accuracy of the nonlinear model is shown in the figure.
Fig. 4C-D presents the results of five hundred independent fitting procedures, performed on two different dataset sizes. The parameters for the reference dataset of trials are: Hz, Hz, and s. For each fitting procedure, the trials (either 100 or 500) were sampled uniformly without replacement from a bank of 10,000 trials. The fitting algorithm is an implementation of the Bayesian approach leading to equation (27) above. When fitting the linear model, the analytical solution from appendix D.1 was used to compute the likelihood of a single trial ( term in Eq. (27)). When fitting the nonlinear model, Monte Carlo sampling was used instead. More specifically, the distribution of the decision variable at decision time for a given clicks stimulus, , was estimated by simulating 800 independent trajectories. Thus, each trajectory had its own independent realization of sensory noise but the realization of the stimulus (timing of the clicks) was frozen. Once the density of was estimated, the likelihood term, in Eq. (27), could be estimated. More details on this method, such as how the number of 800 particles was chosen and how this method was validated on the linear model for which the analytical solution is available, may be found in section 3.5.5 of [40].
In Fig 4E, up to trial number 500 on the x-axis, the same fits as in panels C-D were used to compute the relative error (y-axis). Because of the high computational cost of our fitting algorithm (Monte Carlo sampling described above), the points for 1000 trials on the x-axis were computed with only 84 independent fits per model pair (as opposed to 500 for the other points of the figure).
All panels in Fig. 5 were produced with a common dataset of trials, generated by presenting the same sets of click streams to the evidence accumulation models. All trials had same task parameters: trial duration s; hazard rate Hz; Hz and Hz so ; and the initial state of the environment was randomly assigned with a uniform prior. For each panel of Fig. 5, we selected a pair along with a sensory noise amplitude ( for ) to be applied to the evidence pulses from the clicks. For each possible pair of discounting parameters ( for linear models, for nonlinear models), we computed the decisions (Left or Right) and determined whether the models agreed or not. For the linear model, we used values of between 0 and 10, with increments of 0.1. For the nonlinear model, we used values of between 0 and 2.5, with increments of 0.1. For each decision comparison between reference model and fit model, the same click streams were used, but independent noise realizations of click perturbations were applied. The number of agreements was divided by the total number of decision comparisons to produce the color of a single point in the plot.
Appendix G Bias metric as a function of sensory noise
In this section, we provide additional information about the bias in parameter recovery with the 0/1-loss function described in Section 7. Fig. 6A includes results from simulations for noise in addition to noise and noise also shown in Fig. 5. Bias magnitude and its dependence on sensory noise were determined as follows. Let denote the discounting parameter of the reference model – this is the model used to produce the initial decision data. Let denote the fit value of the discounting parameter, using 0/1-loss minimization. In Fig. 6A, spans the -axis and as a function of is depicted by the golden curve. After smoothing with a Savitzky-Golay filter, we obtain represented by the green curves in the figure. Picking a fixed reference value for (red dotted line), we then plot the bias as a function of sensory noise levels in Fig. 6B, where bias is defined as:
[TABLE]
The fixed values of chosen were the same as in Section 6, . As described in Section 7, the bias in parameter recovery with the 0/1-loss fitting procedure is more pronounced for the nonlinear model than for the linear model, and increases with sensory noise.
Acknowledgements
We thank Gaia Tavoni, Alex Filipowicz, and Alex Piet for helpful feedback on an earlier version of this manuscript. Some computations for this manuscript were done using the [35].
Reviews and Responses
Publication Decision 1 from Neurons, Behavior, Data analysis, and Theory on July 27, 2019
Editorial board’s determination: Revise and resubmit
Comments from the editor. Sorry again for the long delay. The reviewer is more or less happy with the manuscript, there is a list of suggestions, which I would like to ask you to pay careful attention to before the paper can be accepted.
Reviewers’ comments are italicized. Our responses are in plain text. Changes to the manuscript are in blue.
Comments to the author.
Summary. This is a careful and thorough paper that presents several results regarding a normative model of decision-making within dynamic environments. The specific case studied is that of an evidence accumulation task presented in Piet et al., 2018. In this task, the subject hears two streams of Poisson-timed clicks coming from their left and right sides. The click rates on both sides may switch during the trial with some hazard rate . The subject is asked to infer which side had the higher rate of clicks when the trial ends, thus making newer information more relevant than older information. In such a task, there are four parameters: , , , , where is the hazard rate, is the duration of the trial, is the higher Poisson rate of clicks, and is the lower rate of clicks.
The authors first show that, over a broad range as long as high, low are sufficiently large, the maximal accuracy achievable by the normative model almost exactly depends on only two parameters, , and , where . They use these two effective parameters to study the model. The authors first compare the behavior of the normative model to a linear model that approximates the normative model, and find that the linear model performs near-optimally if the discounting parameter is finely tuned. To further compare the models, they generated choice data from both the normative model and the linear model, and fit the choices and clicks separately to either the normative model or the linear model. They found that the parameters recovered are biased when the model used to generate data and the model used to fit the data are different. The linear model required less number of trials when it was fit to data; the discounting parameter converged to the true value used to generate data faster for the linear model than for the normative model. The authors note that different cost functions (MLE and 0/1-loss) lead to different estimates of the parameter values and one should be cautious about what cost function to use.
*The authors suggest that the normative model and the linear model fits on subjects? performance at different and provide a convenient way of identifying the decision-making strategy that the subject is employing in the dynamic clicks task.
*Major comment:The authors claim that ‘if an animal is extensively trained on trials with fixed parameters h and S, but subsequently interrogated using occasional trials with different task parameters,’ one may be able to identify whether the subject is using the normative model or a suboptimal (or linear approximation) model. This is an important point, and it would be greatly strengthened if this claim can be supported directly with simulations and further elucidated with explanations of the specific steps that should be taken in order to identify the strategies.
Specifically, in the case where the clicks are generated with , and the animal performs normatively with , we may fit the normative model and recover reliably. We can compute the maximal accuracy of the normative model given the experimental , and compare it to the accuracy of the animal. However, if we fit the linear model, how should we go from here? How would one, without knowing , and knowing only the clicks/choice data and the experimental parameters and , identify the model that the animal is using?*
Indeed, you raise important points, which we can address by taking a closer look at the accuracy ratios we have plotted in Fig. 3D. We now explain the setup of an experiment which could be run to test the sensitivity of a subject’s evidence accumulation model to determine whether it is normative-like or more sensitive like the linear models we considered. Of course, in this idealized case, we are not considering internal noise, but the principles would extend to noisy models as discussed in Piet et al (2018). In general, the linear model will not adapt well to interspersed trials with different parameters, whereas the normative model will be more robust. This is now explained in the following paragraphs we have added to the end of Section 4:
The preceding point is illustrated by the following thought experiment. Assume a subject is extensively trained on a fixed set of task parameters: , Hz, Hz (peak of the red curve in Fig. 3D). We then introduce some trials with different click rates, say with Hz and Hz, chosen so that is constant across the two conditions. We denote by Acc and Acc the accuracies of an observer using the linear and normative models on trials with a given . Since the subject was trained on click rates that correspond to , their discounting strategy will be adapted to these values. Note that the ratio between Acc and Acc when is the red curve in Fig. 3D. Since the ratio between Acc and Acc is near 1, the linear and normative models cannot be distinguished at . However, a subject using the normative model tuned at , will still perform optimally at , if and are held constant. On the other hand, a linear model optimized at , will no longer be optimal at . This distinction is captured by the drop in the accuracy ratio along the red curve in Fig. 3D.
We can quantify the distinction between the two models by their relative difference:
[TABLE]
More generally, for any decision making model, we may define the quantity
[TABLE]
which will equal 0 if the model used is the normative one. If we compute using responses from a real subject, one can generate curves such as those in Fig. 3D. If the curves are not constant (equal to 1), this would suggest the subject is not using an optimal model. Furthermore, a single value of for which provides evidence that the model is not optimal.
*Minor comments:
Introduction: when the term ‘nonlinear model’ is first used, the authors haven’t yet clarified that here the term is synonymous with ‘normative model.’ Also, some readers might not be familiar with what a 0/1 loss function is, making that part of the Intro a little unclear for them.
It is important to note that the nonlinear model is only normative when the discounting parameter is tuned exactly, and this is why we used this specific phrasing. To clarify this, we have added the word ‘nonlinear’ in parentheses when mentioning the normative model in the prior sentence. We also added the following footnote: The ‘nonlinear’ model here refers to the family of models obtained by tuning the discounting rate away from the value defining the normative model. This detuning results in a model that is not normative.
We describe the 0/1-loss function now in the sentence following its introduction: The 0/1-loss function gives a one unit penalty on trials in which the decision predicted by the model and the data disagree, and no penalty when they agree. Therefore, minimizing this loss function leads to models that best match the trial-to-trial responses in the data rather than the response accuracy.
Page 4, last paragraph: You need the word ‘alone’: the phrase ‘kappa does not predict the response accuracy’ should probably be ‘kappa alone does not predict the response accuracy.’
Thanks for pointing this out. We have modified the text as suggested.
Figure 1, caption: Might be helpful to title panel C as ‘h=1’ and remind readers in the caption that and jointly determine accuracy? And for panel D, perhaps write ‘Maximal accuracy of the ideal observer, at ?
Good suggestion. We have added the title to Fig. 1C and the sentence, “Note that and jointly determine accuracy.” to the caption, and also the qualifier, at , to the caption of Fig. 1D.
*Equation 5: Would be helpful to explicitly say here that F represents the SNR
We have added the phrase “representing the SNR” to the sentence preceding equation 5.
Paragraph immediately after equation 5: as written, it sounds a bit like F depending only on those 2 parameters follows from keeping fixed, although that is not what you mean.
We have changed the confusing sentence to: As indicated, only depends on and .
Paragraph at the end of Section 3: ‘To increase the accuracy of an ideal observer, it is not enough to increase both click rates, for instance.’ I didn’t understand that? If everything else is kept fixed, but both lambdas grow, doesn’t SNR grow and hT stay fixed?
In this situation SNR does not necessarily grow. Indeed, this depends on how the parameters are incremented. For instance, if the lambdas grow along the parabolas shown in Fig. 1D, then SNR stays constant. We have edited the sentence referenced above to explicitly state this. It now reads: To increase the accuracy of an ideal observer, it is not sufficient to increase both click rates, for instance, since the SNR stays constant if and follow the parabolas shown in Fig. 1D.
Figure 3C, label on vertical axis: why is this ‘relative’ accuracy?
The adjective “relative”, was meant to highlight the fact that we computed the curvature of the graphs in Fig. 3B, representing functions of the relative error rather than the actual values of and . We agree that this was confusing, and have removed the word “relative” from the plot and added an explanation to the caption, which now says: Since the functions in panel B do not depend on the actual values of and , but rather the relative distance of these parameters from reference values, what we show in this plot are relative curvatures. We compare relative curvatures as and do not have the same units.
The authors state that for Figure 4E, NL-L and L-NL do not converge to zero, whereas NL-NL and L-L converge to zero eventually. This is not very clear in the figure, and it would help the reader if the number of trials was larger to show this more clearly. This panel shows that for a given number of trials (in the figure, trials), the fits for the linear model were better than the normative model. How sufficient should the number of trials be for the fits to the normative model to be better than the fits to the linear model? In principle, when the normative model is fit to dataset generated by the normative model, it should eventually have lower relative error than the linear model.
We ran an additional 84 fitting procedures per model pair for training sets of 1,000 trials. We did not run more simulations as our Monte Carlo sampling method is costly (these 84 simulations took 24 hours to run on a modern 4-core laptop). We report the resulting relative errors in the new Fig. 4E. This figure now shows that the NL-NL curve does fall below the L-NL curve on average after 1000 trials. We also added the following explanation to appendix F of the revised manuscript: In Fig 4E, up to trial number 500 on the x-axis, the same fits as in panels C-D were used to compute the relative error (y-axis). Because of the high computational cost of our fitting algorithm (Monte Carlo sampling described above), the points for 1000 trials on the x-axis were computed with only 84 independent fits per model pair (as opposed to 500 for the other points of the figure).
Figure 5 needs labels C and D. What was the grid of values used for h and in generating Fig. 5? There are comparisons between 0.1 and 2, but do we see a more deviating trend as a function of noise level? That is, what is the deviation like when noise is 0.5 or 1, for example? It would be great to have a summary plot with a metric of deviation separately for both the linear and normative models. What do we see when the metric is shown as a function of noise? Having this summary figure will help the authors establish the point that ‘the parameters that minimize expected 0/1-loss are biased, and this bias increases with sensory noise,’ and that ‘the NL model exhibits this bias much more strongly than the L model.’
We have added the labels C and D to the appropriate panels in Figure 5. We have also specified the grid values used in the simulations in appendix F, using the following added passage: For the linear model, we used values of between 0 and 10, with increments of 0.1. For the nonlinear model, we used values of between 0 and 2.5, with increments of 0.1.
Regarding the second suggestion of measuring the bias in parameter recovery for intermediate values of noise, we have added the figure that appears below in this response letter, and is now contained in Appendix G to our manuscript. We reference this figure in the following text we have added to Section 7: See Appendix G for a possible metric of the reported bias, and its dependence on sensory noise for each model class (Fig. 6).
Appendix B that derives the SNR has typos. In the sentence right after Eq. (16), the statement is , as T approaches infinity. Should this be ?
Yes, we have checked Appendix B thoroughly, and corrected this typo.
Eq. (17) seems to contain an error that multiplies 2 in front of hTexp(-hT). In the sentence right after Eq. (17), SNR=E[Delta N]/Var[Delta N]. Should this be SNR=E[Delta N]/sqrt(Var[Delta N])? In the equation that follows this sentence, h is not multiplied in the denominator.*
You’re right – we have now corrected these typos, and also spotted a missing factor of 2 in the denominator of the full SNR expression. The equations should all be correct now.
For convenience, one could state what S is also in the Appendix.
We have added Eq. (19) in Appendix B as a reminder of Eq. (4) for the reader.
Appendix F on the details of the simulations could be more detailed. The authors state that when fitting the nonlinear model, Monte Carlos sampling was used. Please explain further on this fitting method to the extent that the reader can replicate.
We have added the following passage to Appendix F: More specifically, the distribution of the decision variable at decision time for a given clicks stimulus, , was estimated by simulating 800 independent trajectories. Thus, each trajectory had its own independent realization of sensory noise but the realization of the stimulus (timing of the clicks) was frozen. Once the density of was estimated, the likelihood term, in Eq. (27), could be estimated. More details on this method, such as how the number of 800 particles was chosen and how this method was validated on the linear model for which the analytical solution is available, may be found in section 3.5.5 of Radillo (2018).
Although the confidence intervals do not seem to contain the true parameter value, are the mismatches statistically significant in Figure 4CD? That is, are the parameter values significantly greater than the true parameter value?
We would like to note that the whiskers of each box in Fig. 4C,D do not represent confidence intervals. Instead, they represent the interval of values that are not considered outliers. More specifically, if are the first and third quartiles respectively, then the whiskers define the interval: .
Instead of testing the hypothesis that the MAP estimates are different than the value, we provide a summary statistic for the training datasets of size 500 trials. We have added the following sentences to the end of section 6: For reference datasets of size 500, 98% of the 500 MAP estimates in the L-NL fits lie strictly above , versus 50.4% for the corresponding L-L fits. Similarly, 86.6% of the estimates in the NL-L fits lie strictly below , versus 44.2% for the corresponding NL-NL fits. We believe this is more informative than a p-value.
Publication Decision 2 from Neurons, Behavior, Data analysis, and Theory on August 26, 2019
Editorial board’s determination: Accept
Comments from the editor. Thanks a lot for your patience with the first round of reviews. I have now checked your revision and have no further comments, which means the manuscript will be transferred to a “provisional acceptance" stage.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Adelson and Bergen [1985] Adelson, E. H. and Bergen, J. R. (1985) Spatiotemporal energy models for the perception of motion. JOSA A , 2 , 284–299.
- 2Balasubramanian [1997] Balasubramanian, V. (1997) Statistical inference, Occam’s razor, and statistical mechanics on the space of probability distributions. Neural Comput. , 9 , 349–368.
- 3Ball and Sekuler [1982] Ball, K. and Sekuler, R. (1982) A specific and enduring improvement in visual motion discrimination. Science , 218 , 697–698.
- 4Barendregt et al. [2019] Barendregt, N. W., Josić, K. and Kilpatrick, Z. P. (2019) Analyzing dynamic decision-making models using Chapman-Kolmogorov equations. ar Xiv preprint ar Xiv:1903.10131 .
- 5Beck et al. [2008] Beck, J. M., Ma, W. J., Kiani, R., Hanks, T., Churchland, A. K., Roitman, J., Shadlen, M. N., Latham, P. E. and Pouget, A. (2008) Probabilistic population codes for bayesian decision making. Neuron , 60 , 1142–1152.
- 6Bialek et al. [2001] Bialek, W., Nemenman, I. and Tishby, N. (2001) Predictability, complexity, and learning. Neural Comput. , 13 , 2409–2463.
- 7Bogacz et al. [2006] Bogacz, R., Brown, E., Moehlis, J., Holmes, P. and Cohen, J. D. (2006) The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. Psychol. Rev. , 113 , 700–765.
- 8Born and Bradley [2005] Born, R. T. and Bradley, D. C. (2005) Structure and function of visual area mt. Annu. Rev. Neurosci. , 28 , 157–189.
