Environmental dynamics impact whether matching is optimal
Yipei Guo, Ann M Hermundstad

TL;DR
This paper explores how environmental changes affect whether animals' matching behavior is optimal for maximizing rewards.
Contribution
The study analytically determines when matching is optimal based on environmental replenishment dynamics.
Findings
Matching is optimal when all options share the same replenishment dynamics.
Optimal policies can deviate from matching when replenishment dynamics differ across options.
Environmental stochasticity can amplify deviations from matching behavior.
Abstract
Foraging animals often sample options that yield rewards with different probabilities. In such scenarios, many animals exhibit “matching,” whereby they allocate their choices such that the fraction of rewarded samples is equal across options. While matching can be optimal in environments with diminishing returns, this condition alone is not sufficient to determine optimality. Moreover, diminishing returns arise when resources deplete and replenish over time, but their form depends on the temporal structure and statistics of replenishment. Here, we investigate how these environmental properties influence whether matching is optimal. We consider an agent that samples options at fixed rates, and we derive the resulting reward probabilities across different types of environments. This allows us to analytically determine conditions under which the optimal policy exhibits matching. When all…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6|
| the rate at which an agent samples an option |
| of sampling attempts at that option per unit time | |
|
| the probability that any given sampling attempt results in a successful collection |
|
| the average number of |
| the product of the sampling rate and the collection probability | |
|
| specifies how changes in sampling rate impact changes in collection rate; a positive |
| marginal gain indicates that a higher sampling rate leads to a higher collection rate |
- —Howard Hughes Medical Institute10.13039/100000011
- —Agency for Science, Technology, and Research10.13039/501100001348
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDiffusion and Search Dynamics · Animal Behavior and Reproduction · Evolutionary Game Theory and Cooperation
Introduction
To successfully forage for food and other resources, animals must contend with a wide range of environmental settings that vary in their resource availability. Given limited time and energy, a foraging animal must decide whether and when to explore these different settings in order to maximize its chance of success in collecting resources. In laboratory settings, this form of decision making is often studied by presenting animals with multiple simultaneously available options that deliver reward—a proxy for an available resource in the environment (1)—with different frequency, and asking how animals allocate their choices to these different options (2–6). Such experiments have shown that animals tend to exhibit “matching” behavior, an empirical observation that animals allocate their choices in proportion to the number of rewards that they collect from different options (7–16). This phenomenon was initially observed in pigeons (7–13) but was subsequently observed in flies, mice, rats, monkeys, and humans (17–22). In the original formulation of the “matching law,” if an animals makes attempts at sampling an option i, and only of those attempts are successful and yield reward, then matching is achieved when (7). Equivalently, this implies that the fraction of successful attempts is the same across all options (note that here and below, we will always use the subscript i to index options; all other subscripts will be used to name quantities, such as the number of “successful” attempts, ).
While data from early experiments in pigeons agree well with the matching law, many subsequent experiments revealed deviations from precise matching. In particular, animals often have a tendency to sample better options less frequently than one would expect from precise matching, a phenomenon known as “under-matching” (16, 21, 23). Given the prevalence of these different experimental observations, it is natural to ask why matching is so commonly observed, whether it is a feature of optimal behavior, and under what conditions we should expect strict matching versus deviations from it.
Many previous studies have explored the relationship between matching and optimality in the context of specific experimental setups and task structures. These include the commonly used concurrent variable-ratio schedule, where a reward is available at an option after a random number of attempts at that option, and the concurrent variable-interval schedule, where a reward is available after a random amount of time has passed (2–6). Recent work has considered matching within a more general context, where it has been tied to optimality through diminishing returns (24). However, diminishing returns are not sufficient to guarantee that optimality gives matching (24), and they can manifest in different ways depending on how resources are depleted and replenished in the environment.
In this work, we explore how different environmental dynamics affect whether the optimal policy gives rise to matching. We model an environment that produces and replenishes resources over time. These resources can be collected at different sites (“options”), that can be governed by different replenishment dynamics. We decompose these dynamics into separate contributions that control the structure, statistics, and overall quality of the replenishment process, and we ask how an agent should best allocate its choices to exploit these properties and maximize the number of resources that it can collect. To this end, we consider a simple, memoryless agent that samples each available option at a fixed rate, subject to a maximum total sampling rate. We then derive conditions under which the optimal policy that maximizes resource collection across options will also exhibit matching. We show that while the optimal policy gives rise to matching in many types of environments, this may not be the case when the nature of the replenishment process differs across the options. In such cases, we show how the observed deviations from matching depend on the quality and reliability of the environment. In doing so, we provide concrete predictions about the impact of different environmental manipulations on both optimal and matching behavior.
Results
We consider an environment that contains N distinct options, each of which delivers discrete resources that replenish over time. At any given time, each option can be in one of two states: empty (does not contain a resource) or full (contains a resource). An agent can collect a resource by sampling an option that is in the full state; the option then immediately depletes to the empty state (Fig. 1A). A replenishment process governs the time at which an empty option reverts back to the full state. The success of the agent in collecting resources depends on the agent’s behavioral policy, which determines how the agent samples different options over time, and on the dynamics of the replenishment process, which governs when resources become available at different options. Below, we first characterize these two components and then study their interaction.
Depleting options can be characterized in terms of the structure and statistics of replenishment. A) We consider an environment with N options that deliver discrete resources and that replenish after resources are collected. Each option can be in one of two states: empty or full. An option is always in the full state after a replenishment event. After a sampling event, the option transitions to the empty state. If an option is full when sampled, the sampling event was successful and the resource is collected; if an option is empty when sampled, the sampling event was unsuccessful and nothing is collected. B) We differentiate three distinct features of the replenishment process: (i) the replenishment structure, which refers to when the process resets and specifies the meaning of the replenishment time T (left); (ii) the replenishment statistics, which refers to the form of the distribution P(T;{θ}) (middle; different lines illustrate different possible forms of P(T;{θ})); and (iii) the replenishment rates, which refer to the parameters {θ} of that distribution (right; different lines illustrate different parameter choices for the same form of P(T;{θ})). The first two features control qualitative features of the replenishment process; the third feature quantitatively determines the overall quality of the environment, with higher replenishment rates parameterizing higher quality environments that replenish more quickly. C) Each replenishment structure can yield different numbers of collections, even for the same sampling times (arrows and circles, in red). D) Higher quality environments replenish more quickly (arrows, in blue), and thus yield more collections for the same sampling times (circles, in red). Illustrated for replenishment structure B.
Dynamics of replenishment. We characterize three distinct features of the replenishment process (Fig. 1B): (i) the structure of replenishment, which controls when the process resets by specifying how the replenishment time T is measured relative to the agent’s choices or outcomes; (ii) the statistics of replenishment, which specifies the form of the distribution of replenishment times; and (iii) the rates of replenishment, which specify the parameters of the distribution and control the overall quality of the environment (ie whether resources replenish quickly or slowly).
Since sampling, collection, and replenishment are the only events that can affect an option’s state, they serve as natural anchors from which to measure replenishment time. We therefore consider three distinct types of replenishment structures (Fig. 1C):
The process resets after each resource collection (Fig. 1C, top row). Imagine the scenario where a worker always keeps an eye on an item on a shelf in a grocery store, and only restocks the item whenever it runs out. The replenishment process (which involves the worker going to the storage room, retrieving the item, and bringing it to the shelf) resets whenever the item is depleted from the shelf, and T is measured from the most recent collection event. In this case, any sampling attempts made between the last collection and the next replenishment event would be unsuccessful. This corresponds to the scenario implemented under the interval schedules that are commonly used to study decision-making in animals (25, 26).The process resets independently of the agent’s actions or the state of the option, in which case T is measured from the most recent replenishment event (Fig. 1C, middle row). For example, suppose we were to leave a small pail out in the open to catch water whenever it rains. Whether we choose to sample the pail (ie check on the pail and use up any available water) does not affect when it is next going to rain.The process resets after each sampling attempt, regardless of whether a resource was collected (Fig. 1C, bottom row). In natural environments, this scenario could arise in settings where the agent’s actions disrupt the replenishment process. For example, suppose one wishes to collect honey from a beehive. The production of honey requires a large population of bees, but the sampling process scares them away. After every sampling event, it takes some time for the bees to return and start producing honey. In this case, the replenishment process resets whenever sampling occurs, regardless of whether honey is collected, and T is measured from the most recent sampling event. A key feature of this type of replenishment structure is that a higher sampling rate does not necessarily lead to a higher number of collections, because over-sampling can result in repeated disruptions of the replenishment process. Given the same distribution and the same sampling times, this structure results in more unsuccessful attempts than structures (A) and (B) (Fig. 1C).
Within each of these structures, the replenishment times are fully specified by the distribution . Higher replenishment rates θ give rise to higher quality environments in which the agent collects more resources under the same sampling times (Fig. 1D).
Behavioral policy. The success of the agent in collecting resources depends on how its behavioral policy interacts with the replenishment processes described above. To characterize this interaction, we focus on a class of “fixed-sampling-rate” policies that specify the frequency with which the agent samples each option. This class of policies has been studied previously (24) and enables us to analytically derive conditions under which optimal behavior exhibits matching.
Within this class, the agent’s behavioral policy is fully specified by a fixed vector of sampling rates . We assume that the agent operates under a constrained energetic budget, such that the overall sampling rate cannot exceed a maximum value: (Fig. 2A). This constrains the long-term average number of sampling attempts per unit time. These rates govern when and which option the agent samples relative to the underlying dynamics of the replenishment processes and thus determine whether or not the agent will collect a resource from any given sampling event. Figure 2B illustrates this for a single option that replenishes after a fixed time interval. If the agent samples infrequently, it is guaranteed to collect a resource. However, as the agent increases its sampling rate, a larger fraction of its samples are unsuccessful because the option has been depleted and has not yet replenished. As a result, the average probability that a given sampling event results in a successful collection—which we refer to as the collection probability —decreases with the sampling rate (Fig. 2C, top). The rate at which resources are collected, , depends on both and and can therefore increase or decrease with depending on how decays with (Fig. 2C, middle). However, the collection rate always initially increases with sampling rate, and the marginal gain always initially decreases with sampling rate (Fig. 2C, middle and bottom). Both of these properties are characteristic of diminishing returns; more formally, an option that exhibits diminishing returns will satisfy . Box 1 summarizes these key quantities and their relationships.
Sampling rates interact with the replenishment process to determine how many resources the agent collects. A) We assume that the agent chooses each of the N options with a fixed sampling rate pi, subject to a constrained energetic budget ∑ipi≤1. B) Illustration of a simple environment in which a single option replenishes at fixed time intervals. Increasing the sampling rate leads to a reduction in the collection probability, an increase in the total number of collections, and a decrease in the marginal gain (ie the increase in number of collections for an increase in sampling rate). C) Features of collection as a function of sampling rate, shown for two different options that differ in their replenishment dynamics but both exhibit diminishing returns. Top: because it takes time for resources to replenish, the collection probability Pc,i at a given option i monotonically decreases with sampling rate pi. Middle: the collection rate ci, which has contributions from both pi and Pc,i (and can therefore increase or decrease with pi), initially increases with pi at the same rate for all options. Bottom: the rate at which ci increases with pi—which we refer to as the marginal gain gi—decreases as pi increases. D) Under the optimal policy that maximizes the net collection rate, the marginal gain is equalized across options. Left: for the two options illustrated in (C), the optimal sampling rates p1<0.5 and p2*>0.5 yield the same marginal gain across options (ie dc1/dp1|p1*=dc2/dp2|p2*). Altering the sampling rates away from these values, for example by increasing p2* and decreasing p1*, would lead to an increase in the collection rate at option 2 (red) but would be outweighed by a larger decrease in the collection rate at option 1 (blue). Right: As a result of the properties illustrated in the left panel, the optimal policy p*→ that maximizes the net collection rate must satisfy Eq. 1, such that the marginal gain is equalized across options (ie g1=g2=g*). E) A policy exhibits matching if the collection probabilities are equalized across options (ie Pc,1=Pc,2=Pc). F) Optimality gives matching if collection probabilities are equalized across options under the optimal policy, such that the optimal sampling rates that satisfy gi=g* also satisfy Pc,i=Pc. This is always satisfied when the function g(Pc) is the same across options.*
We define the optimal allocation of sampling rates to be the one that maximizes the net collection rate across all options, (note that for these and other quantities, we will often drop the explicit dependence on for notational simplicity). This optimal policy satisfies the condition for all options i; in other words, any changes in sampling rates will decrease the net collection rate. Since the initial marginal gain takes on the same value for all possible replenishment dynamics that we consider here, this condition implies that the marginal gain should be equalized across options (SI Section 1) (24):
where is the lowest non-negative value that can be achieved without exceeding the total sampling rate budget.
This optimality condition can be intuitively understood by considering the two different options shown in Fig. 2C. The first of these options, in black, exhibits a more rapid decay in collection probability with sampling rate. As a result, the marginal gains of both options can be equalized if the agent samples the second option more frequently than the first (Fig. 2D); for a fixed maximal energy budget , this is achieved when and . To maintain this energy budget, an increase in one sampling rate (eg ) must be offset by a decrease in the other ( ). This will necessarily lead to a decrease in the net collection rate, because the small increase in collection rate from increasing will be offset by a larger decrease in collection rate from decreasing . As a result, any deviation away from this policy that produces equalized marginal gains will lead to an overall reduction in the net collection rate, implying that this policy is optimal.
In comparison, if a policy gives rise to matching, the collection probabilities will be equalized across options (Fig. 2E):
Thus, optimality gives rise to matching if collection probabilities are matched under the optimal policy (ie if both the optimality and matching conditions are satisfied).
In any given environment, the optimal policy—and the corresponding marginal gain and collection probabilities —will depend on the features of replenishment at each option (Fig. 1B). If the relationship between the marginal gain and the collection probability is the same across options, optimality will always give rise to matching, since any policy that equalizes marginal gains (including the optimal policy) will also equalize collection probabilities (Fig. 2F). However, in general, may not be the same across all options. In what follows, we derive for different types of environments, and we use it to study how the properties of replenishment affect whether optimality gives rise to matching.
Box 1: definition of key quantities
Optimality gives matching when all options share the same qualitative features of replenishment
To determine whether the optimal policy exhibits matching for any given environment, we first derive the expression for the collection probability given a policy of fixed sampling rates . We then use this to compute the collection rate and marginal gain for each option. In principle, each of these quantities can depend on the specific form and rate of replenishment at each option, and there is no guarantee that a given policy will generically equalize both the collection probability and the marginal gain. However, we will show that in some settings, both the collection probability and marginal gain can be written as functions of the policy alone, without additional dependence on the form or rates of replenishment. In such cases, the marginal gain can be expressed as a function of the collection probability (ie ), such that optimality always gives matching.
If all options replenish after resource collection, the optimal policy exhibits matching
Because resources take time to replenish, any sampling attempt made between a resource collection and a replenishment event will be unsuccessful (all other attempts are, by definition, successful). When the replenishment process is triggered by the collection itself (replenishment structure A), the number of unsuccessful attempts is a function of the replenishment time T. The steady-state collection probability when sampling an option i can thus be written as (Fig. 3A):
where indicates an average over the distribution of replenishment times.
The optimal policy exhibits matching in many settings. A) When options replenish after each collection, the collection probability can be written as the average number of collections per samples within a replenishment time T. For a fixed sampling rate p, this depends only on the average replenishment time ⟨T⟩ and not on the functional form of P(T). B) Consider three options that differ in the form and rate of replenishment (upper). The collection probability, collection rate, and marginal gain are identical for the two options that share the same rate of replenishment, even though they are governed by different forms of replenishment statistics P(T) (middle). All three options share the same relationship between the marginal gain and the collection probability, such that matching is always optimal (lower). C) When options replenish independently of the agent, the collection probability can be written as the average probability of sampling at least once between replenishment events, divided by the average number of samples between replenishment events. D) Consider three options that share the same form of replenishment statistics, but differ in their rates (top and middle rows). As long as the options are governed by the same replenishment structure (shown here for structures B and C), they share the same relationship between the marginal gain and the collection probability, and hence, optimality always gives rise to matching. Note that when plotting relationships between marginal gain and collection probability, we accounted for the maximum sampling rate constraint and hence only included regions where pi≤1. E) When options replenish after each sampling attempt, the collection probability can be written as the average probability that a replenishment will fall between two sampling attempts, which depends on the distribution of first return times.
In general, is dependent on the form of the policy. However, for the fixed sampling rate policy that we consider here, . Therefore, Eq. 3 can be written as (Fig. 3A):
With this simplification, it is straightforward to show that when options replenish after resource collection, the optimal policy always exhibits matching. To see this, we can use the expression for to write the collection rate which can be used to calculate the marginal gain:
Because depends only on , the optimal policy (that satisfies ) will also give rise to matching ( ). This is true regardless of the values of and the form of , and even if the form of differs across options (Fig. 3B).
If all options are governed by the same replenishment process, the optimal policy exhibits matching
When the replenishment process is triggered by events other than collections, it is not straightforward to calculate the distribution of times between a collection and the next replenishment. Instead, we use other properties of the replenishment process to directly compute the collection probability.
Options replenish independently of the agent. When the replenishment process resets independently of the agent’s choices (replenishment structure B), only the first of the sampling attempts that fall between two consecutive replenishment events will be successful; all other attempts will be unsuccessful. The average probability of collecting a resource within an interval T is thus the average probability that there is at least one sampling attempt in that interval, divided by the average number of sampling attempts in that same interval (Fig. 3C):
This derivation does not rely on any properties of the behavioral policy, and thus holds for any arbitrary policy type. For the fixed sampling rate policy we consider here, the average number of attempts that fall between two consecutive replenishment events is , and the average probability that there are no sampling attempts within that same interval is . Moreover, if all options have the same form of replenishment statistics and differ only in their replenishment rates, we can write , where is a parameter that controls the replenishment rate of each option. After defining the re-scaled variables and , the average collection probability can be written as:
and the corresponding marginal gain is given by:
Because both functions depend solely on , they are both equalized when , and thus the optimal policy again exhibits matching (Fig. 3D).
Options replenish after each attempt. When the replenishment process resets after every sampling attempt (replenishment structure C), the probability of collecting a resource on any given attempt can be written as the probability that replenishment has occurred since the previous attempt. We refer to the time between consecutive attempts as the “first return time” w, which follows a distribution . We define to be the probability that a replenishment r has occurred within a given duration w. We can then compute the average collection probability at an option i by averaging over the distribution of first return times (Fig. 3E):
The distribution of first return times is solely a property of the policy. For example, if an agent can accurately track time and always returns to an option after a fixed duration, is a delta function; if an agent tries to return to an option after a fixed duration but has a noisy estimate of time, might be normally distributed. Here, since we have assumed that sampling attempts follow a Poisson process, the distribution of first return times is given by .
In contrast, the replenishment probability depends solely on the statistics of replenishment through . As a result, the collection probability will generically depend on the form of . However, if all options again have the same form of replenishment statistics and only differ in their replenishment rates, we can write . Again defining re-scaled variables and , the collection probability can be written as:
and the corresponding marginal gain as:
As before, because both and depend only on , they are both equalized when for all (Fig. 3D).
In sum, when all options replenish independently of the agent’s actions, or when they all replenish after the agent samples an option (regardless of outcome), the optimal policy exhibits matching if all options share the same form of replenishment statistics, even if they differ in their replenishment rates.
When optimality deviates from matching, relative collection probabilities depend only on the qualitative features of replenishment
In the previous section, we saw that the relationship between the marginal gain (which governs optimality) and the collection probability (which governs matching) differs depending on the replenishment structure. As a result, if individual options are governed by different replenishment structures, the optimal policy does not generally give rise to matching, even if the options share the same replenishment statistics (Fig. 4A; note the exception when replenishment follows a memoryless Poisson process). Even within a given replenishment structure, this relationship can depend on the form of replenishment (Eqs. 7 and 8 for structure B, and Eqs. 10 and 11 for structure C). As a result, if individual options are governed by the same replenishment structures but different forms of replenishment, the optimal policy does not generally give rise to matching (Fig. 4B; note the exception when all options are governed by replenishment structure A).
When the optimal policy deviates from matching, the rank-ordering of collection probabilities depends only on the qualitative features of replenishment. Optimality can deviate from matching when options differ in their replenishment structure but share the same statistics (A), or when they share the same replenishment structure but differ in their statistics (B–D). A) To illustrate the effect of replenishment structure, we consider three options that are each governed by different replenishment structures but share the same replenishment statistics P(T~), where T~=T/⟨T⟩ (left column). In general, the optimal policy does not necessarily give rise to matching, and the ordering of Pc depends on the replenishment structure (middle panel). The righthand panels show examples for other replenishment statistics; note that in the special case where the replenishment statistics follow a Poisson process (upper right), the optimal policy will always exhibit matching (with Pc,i=Pc). B) To illustrate the effect of replenishment statistics, we consider five options that are governed by the same replenishment structure but that differ in the form of their replenishment statistics P(T) and corresponding replenishment probabilities Pr(r=1|w) (left column). Here again, the optimal policy does not always give rise to matching, and the ordering of Pc depends on the statistics of replenishment (middle panel). Since the optimal solution satisfies the condition that gi is matched across options, the ordering of the collection probability under the optimal policy depends on the shape of gi(Pc,i), which in turn depends on both the structure and statistics of replenishment. When replenishment is governed by structure C (middle panel), the ordering of collection probabilities depends on the shape of Pr(r=1|w) (with Pc,1<Pc,2<Pc,3<Pc,4<Pc,5). When replenishment is governed by structure B (lower right), the ordering depends on the shape of P(T~) (with Pc,2<Pc,1<Pc,3<Pc,4<Pc,5). Note that in the special case where replenishment is governed by structure A (upper right), the optimal policy always exhibits matching (with Pc,i=Pc). C) To illustrate the effect of replenishment rates, we consider two options that share the same replenishment structure (structure C) but differ in the form and rates of their replenishment statistics. As in (B), the optimal policy does not generically give rise to matching, and the ordering of collection probabilities depends on the shape of Pr(r=1|w) (with Pc,1<Pc,2). D) More generally, the optimal policy depends on the replenishment rates θ1 and θ2 (left panel; white denotes equal sampling rates for the two options). Under this policy, increasing the replenishment rate of either option leads to an increase in the collection probabilities of both options. However, the collection probability from option 1 is always lower than from option 2. In other words, the rank-ordering of collection probabilities is always preserved (middle panel; white denotes equal collection probabilities for the two options). The optimal policy therefore gives rise to under-matching when the collection rate from option 2 is higher (since in this case, the agent perceives option 2 to be more rewarding but yet does not sample it enough to lower its collection probability to match that of option 1). When the collection rate from option 1 is higher, the optimal policy gives rise to over-matching (right panel; white denotes equal collection rates for the two options). E) Individual contributions of the ratios shown in (D).
Furthermore, when optimality does not exhibit matching, the rank-ordering of collection probabilities depends only on these two qualitative features—the replenishment structure and the form of the replenishment statistics—and not on the values of the replenishment rates. To illustrate this point, we consider two options that are governed by the same replenishment structure but different replenishment statistics (Fig. 4C). Depending on their specific replenishment rates, the optimal policy involves sampling one or the other option more frequently (Fig. 4D and E, left panels). Under this policy, increasing the replenishment rate of either option will lead to an increase in the average collection probabilities of both options. Nevertheless, the same option always has the higher collection probability, regardless of which option is of higher quality (Fig. 4D and E, middle panels). As a result, the optimal policy can give rise to under- or over-matching depending on which option provides a higher collection rate (Fig. 4D and E, right panels), but the ordering of collection probabilities at optimality remains the same and can be predicted based on its relationship with the marginal gain (Fig. 4C).
Low-quality environments produce larger deviations between optimality and matching
So far, we have explored the conditions under which the optimal policy does or does not exhibit matching. Here, we ask how much the optimal policy deviates from the matching policy that best maximizes the overall collection rate. This question is especially relevant because matching has been thought to be a feature of optimal or near-optimal behavior (4, 27), and simple neural network models for decision making have been proposed to account for the matching phenomenon (28).
As we have seen in the previous sections, any differences between optimality and matching fundamentally arise from differences in the relationship between the marginal gain and the collection probability across options. To better understand these differences, it is useful to consider general features of that hold for all replenishment structures and statistics.
In the limit that the sampling rate goes to zero, the collection probability always goes to one (ie in the absence of any sampling, an option is guaranteed to be in the full state, and thus any sampling event is guaranteed to be successful). In the limit that the sampling rate becomes infinite, both the collection probability and its derivative always go to zero (ie under infinitely rapid sampling, an option is guaranteed to be in the empty state, and thus any sampling event is guaranteed to be unsuccessful). We can use these two sets of limits to bound the marginal gain:
This can be seen by noting that can be written as , and taking the corresponding limits and .
The fact that the marginal gain always goes to one as the sampling rate goes to zero—regardless of the structure or statistics of replenishment—can be intuitively understood in terms of how the sampling rate impacts the collection rate . An increase in sampling rate can impact the collection rate in two competing ways: (i) more frequent sampling leads to more opportunities for collecting a resource, thereby promoting an increase in , and (ii) more frequent sampling reduces the probability that any given sampling event will lead to a collection, thereby promoting a decrease in . In the limit that the sampling rate is low, the first factor dominates the change in . In this limit, the average duration between sampling events is long, the resource is likely to be available when sampled, and thus any increase in sampling rate is likely to increase the collection rate. Thus, .
Although the boundary values of are the same for all environments, the derivatives of depend on the details of the replenishment process. However, in the limit that the sampling rate goes to zero, the derivative of does not depend on the structure nor the statistics of the replenishment process:
This can be seen by noting that , and taking the corresponding limit .
Together, these constraints on the shape of (Eqs. 13 and 14) imply that optimal and matching policies become indistinguishable as the optimal collection probability goes to one, because the policy that equalizes the marginal gain will also equalize the collection probability (Fig. 5A). This limit is achieved when the environment is of sufficiently high quality and options replenish frequently. Indeed, even in settings that give rise to deviations between optimality and matching (Fig. 5B), these deviations tend to be small in high quality environments with high replenishment rates (Fig. 5C, upper right corners of heatmaps). In contrast, when the marginal gain at optimality is low, the collection probabilities across options can differ substantially, deviating markedly from matching (Fig. 5B). This can arise when at least one option is of low quality, such that the collection probability from that option is low even under optimal sampling (Fig. 5C, left panel; Fig. 5D). In such cases, the optimal and matching policies can also differ substantially (Fig. 5C, middle panel; Fig. 5E). When both options are of low quality, the overall collection rate is low under the optimal policy, but is much lower under the best matching policy (Fig. 5C, right panel; Fig. 5F).
Low quality environments can lead to strong deviations between optimality and matching. A) The marginal gain is guaranteed to go to zero as the collection probability goes to zero, regardless of the replenishment process. This situation can arise in low-quality environments where options replenish infrequently; in this regime, optimality can deviate significantly from matching (lower left). At the other extreme, as the collection probability goes to one, both the marginal gain and its slope have fixed limits regardless of the replenishment process. This situation can arise in high-quality environments where options replenish frequently; in this regime, optimality and matching are similar (lower right). B) We consider an environment with two options that share the same replenishment structure (structure C) but differ in their replenishment statistics (inset). Due to the differences in replenishment statistics, the optimal policy (horizontal black dashed line) does not exhibit matching. The “best matching policy” that maximizes the net collection rate is marked by the vertical gray dashed line (main panel). C) In low-quality environments where replenishment rates are low, the optimal and best matching policies can differ substantially from one another (lower left regions of heatmaps). In these regimes, the optimal policy has a low marginal gain (left), the sampling rates under the best matching policy deviate from optimal sampling rates (middle), and the net collection rate under the best matching policy is much lower than under the optimal policy (right). D–F) We consider a specific example environment where θ1=0.0225 and θ2=0.045 (indicated by the white square markers in (C). Circles and stars indicate the best matching and optimal policies, respectively. D) Under the best matching policy, the collection probability is the same for both options (gray dashed horizontal line; see inset). Under the optimal policy, the collection probability differs substantially between the two options (stars). Inset shows expanded version of dashed box in main panel. E) In this environment, the collection rate at option 2 changes nonmonotonically with the sampling rate (dotted line). As a result, the optimal policy involves maximizing the collection rate at option 2 (open star), and allocating the remaining sampling rate budget to option 1 (filled star). Compared to the optimal policy, the best matching policy over-samples option 2 and under-samples option 1. F) For any given value of the sampling rate p1, we compare the net collection rate ⟨c⟩ for an optimizing agent (ie an agent that adopts the value of p2 that maximizes the collection rate; black curve) with that of the matching agent (ie an agent that adopts the value of p2 such that the collection probabilities are equalized across options; gray curve). Note that both the optimizing and matching agents are subject to the same constraint on the total sampling rate; ie p2≤1−p1. This constraint places a maximum limit on the value of p1 for the matching agent; above this value, matching cannot be achieved without p2 exceeding 1−p1. The values of p1 for the optimal and best matching policies are indicated by the gray and black dashed vertical lines, respectively.
Environmental fluctuations affect whether the optimal policy exhibits matching
So far, we have assumed that replenishment processes are stationary. However, in natural settings, the environment can fluctuate and can impact the replenishment process. We thus asked whether random fluctuations in the replenishment rates affect the relationship between optimality and matching.
We consider the scenario where all options share the same stationary replenishment structure and statistics, but where the replenishment rates can change at regular intervals. Whenever a change occurs, the replenishment rate of option i is drawn from a distribution with mean and variance . In this case, the conditions for optimality and matching are given by (see SI Section 1.1):
where the angular brackets denote an average over the distribution of replenishment rates , and where the marginal gain and collection probability are defined as before (Eqs. 1 and 2).
In previous sections, we showed that the marginal gain and collection probability can both be expressed as functions of a single variable , regardless of the replenishment structure (Eqs. 7—8, 10–11). This implies that if the distribution of replenishment rates is controlled by two parameters that jointly specify the mean and variance of the replenishment rates, the average marginal gain and average collection probability will depend only on and (see SI Section 2.1). Thus, when options are governed by the same replenishment structure and statistics but differ in their mean replenishment rates , the optimal policy gives matching if all options fluctuate to the same degree (ie if is matched across options). If options fluctuate by differing degrees, the optimal policy need not exhibit matching.
In such cases, the relative ordering of often determines the relative ordering of collection probabilities. For example, when options replenish after each collection (structure A) or when replenishment is governed by a Poisson process, the relationship between the marginal gain and collection probability is given by (from Eq. 5):
where is the variance in the collection probability due to fluctuations in the corresponding replenishment rate . Under these conditions, options with higher fluctuations—which in turn leads to higher variance in collection probabilities—will yield lower average collection probabilities under the optimal policy.
This finding holds more generally for other replenishment structures and statistics (see SI Section 2.2). To illustrate this, we consider two options that both replenish after each attempt, with replenishment times drawn uniformly (equivalently, we can consider options that replenish on regular intervals independently of the agent, with ). We assume that the first option does not fluctuate ( ), but the second option fluctuates uniformly over a range (such that ; Fig. 6A). As these fluctuations increase, the optimal sampling rate of the fluctuating option can increase or decrease depending on the average replenishment rate (Fig. 6B, left). However, the average collection probability is always lower from the fluctuating option compared to the reliable option (Fig. 6B, middle). As a result, larger fluctuations can lead to under-matching if the fluctuating option is perceived to be less rewarding (ie yields a lower collection rate), or over-matching if the fluctuating option is perceived to be more rewarding (ie yields a higher collection rate) (Fig. 6B, right).
Environmental fluctuations impact whether the optimal policy exhibits matching. A) We consider an environment with two options: the first option replenishes at a constant rate θ¯1, while the second option replenishes at a rate that is drawn from a uniform distribution centered about θ¯2 with width Δ2, such that Δ~2=Δ2/θ¯2=12CVθ2. Both options replenish after each sample (structure C), with replenishment times drawn from a uniform distribution parameterized by θ¯ (this equivalently describes two options that replenish independently of the agent (structure B) with fixed replenishment times specified by θ¯, as schematized). For B–D), we set θ¯1=0.3. B) The optimal sampling rate of the fluctuating option can decrease or increase with the degree of fluctuation, depending on the average replenishment rate (left). However, the relative probability of collecting resources from the second option always decreases as the fluctuations increase (middle). As a result, the optimal policy can either give rise to over- or under-matching, depending on whether the fluctuating option is perceived to be more or less rewarding (based on the relative collection rates) than the stationary option (right). C) For a given marginal gain, the collection probability always decreases as fluctuations increase. D) In low-quality environments where the marginal gain is low, fluctuations have a large impact on the relative collection probabilities (left panel). By contrast, in high quality environments where the marginal gain is high, the relative collection probabilities are only minimally affected by fluctuations (right panel).
To understand this result, we solve for the relationship between the average marginal gain and the average collection probability as a function of the degree of fluctuations (Fig. 6C). For any fixed average marginal gain, larger fluctuations lead to a lower average collection probability. However, lower marginal gains—which arise in lower quality environments (Fig. 5A)—lead to more pronounced differences in collection probabilities between the two options (Fig. 6D, compare left and right panels). Together, this suggests that fluctuations drive deviations from matching, and these deviations are more pronounced in low-quality environments with low replenishment rates.
Discussion
Since the first observation of the matching phenomenon in 1961, it has received widespread attention, and the degree of matching is now commonly reported in experiments where animals choose between multiple depleting options. However, despite the prevalence of these matching studies, the relationship between matching and optimality is not well understood. This is in part because matching is an emergent phenomenon that depends on how an animal’s choices interact with environmental dynamics to govern the availability of resources. As a result, it is often unclear how different experimental designs “should” impact behavior, and whether one should be surprised by observations of matching. In this work, we address these questions by considering a large range of environments that differ in the structure and statistics of replenishment, and we ask whether matching is observed under the optimal policy for these different environments. Across these settings, we identify a range of conditions under which matching is optimal, and we isolate key environmental properties that govern deviations from matching.
Many experiments have been carried out with memoryless replenishment processes, where empty options are replenished at a fixed rate or probability per time, and where replenishment times are drawn from an exponential (or geometric, for tasks with discrete trials) distribution (2–5). In such scenarios, the optimal fixed-sampling-rate policy is known to exhibit matching (2, 3, 24). Nevertheless, our analysis shows that the optimal policy gives rise to matching across a range of other replenishment processes that vary in their structure, statistics, and rates, so long as the qualitative nature of the process is the same across options. This regime—in which all options share the same structure and statistics—is typical of experiments involving more than two options, and indeed, approximate matching is often observed in these settings (12, 13). In cases where deviations from matching have been observed (19, 29), our results suggest that these deviations may reflect the effect of other task features, such as delays in reward delivery (19) or variations in the spatial separation between options (29). If options are instead governed by different types of replenishment processes, the optimal policy can deviate significantly from the matching policy, especially in low-quality environments where average replenishment rates are low. In such cases, the rank-ordering of collection probabilities depends only on the qualitative nature of the replenishment process, and not on the replenishment rates of different options. These findings provide testable predictions about the signatures of optimal behavior, and can serve as a guide for designing experiments that would yield large differences between optimality and matching.
Our characterization of optimality across different environments aligns with broader goals of understanding behavior in more naturalistic settings. Real environments not only consist of different food and water sources governed by different replenishment processes, but the rate at which these resources replenish can also be subject to weather and other environmental conditions. To better understand behavior in such scenarios, we explored how fluctuations in replenishment rates affect the relationship between optimality and matching. We found that the relative degree of fluctuation predicts relative differences in optimal collection probabilities—in particular, options with a higher degree of fluctuation always have a lower collection probability under the optimal policy, regardless of whether fluctuations preferentially impact low- versus high-quality options.
We interpreted these fluctuations as reflecting external variability in the environment itself, but they can equivalently be interpreted as reflecting internal uncertainty in the agent’s belief about the environment. More specifically, the optimal policy for a scenario in which replenishment rates are drawn from some distribution is the same as the optimal policy for a scenario where represents the agent’s belief about what the replenishment rates could be. This latter situation can arise during learning in a novel environment, when animals must discover the quality of different options through the outcomes of their actions. During early stages of learning, animals must make decisions based on the outcomes of sparse samples. In such settings, animals are likely to have a higher probability of collecting resources from higher-quality options, and are likely to bias their sampling toward those options (29). This biased sampling would in turn lead to lower uncertainty in the estimated replenishment rates of high-quality options. Our results imply that carrying out the optimal policy under such uncertainties would continue to yield higher average collection probabilities for higher-quality options. This corresponds to under-matching, which has been observed in many previous experiments (16, 21, 23). Furthermore, as animals update their beliefs and converge toward correct estimates of the true replenishment rates, relative differences in uncertainty between high- and low-quality options—and thus the degree of under-matching—would be expected to decrease. Indeed, this has been observed during early learning of a novel environment with multiple options (29), although in that setting, behavior was best described by a policy with memory of recent choice. Previous work has shown that under-matching can arise from learning over multiple time scales, which can be optimal in environments where reward rates regularly switch between two options (6). Our work provides a complementary explanation for the ubiquity of under-matching: under-matching is a feature of optimality even in non-changing environments when the animal is more uncertain about the quality of the less rewarding options. Additional environmental dynamics—such as regular switches in reward rates—would further impact the agent’s belief about the underlying replenishment rates.
While we considered a broad range of environments that differ in the structure, statistics, and reliability of replenishment, this space is not exhaustive. In our model, whenever an option is replenished (ie in the full state), sampling at that option always yields one unit of resource. However, in natural settings—such as when a predator hunts for prey—success is not guaranteed, even when the resource is present. This scenario has been studied in dynamic foraging tasks in which rewards are delivered probabilistically (20, 30); within our framework, this scenario could be treated by modulating the probability of collecting a reward. Moreover, different options may yield different amounts of reward even when resources are collected. In such cases, the optimal sampling rates might no longer exhibit matching, as was shown to be the case for a variable-interval task (ie when replenishment follows a Poisson process; (31)). We further assumed that resources remain available until collected by the agent. In real environments, however, resources may deplete passively due to external factors such as natural decay or consumption by other competitors. Incorporating such depletion dynamics would enable our framework to capture a wider class of foraging and operant conditioning paradigms, including variable ratio schedules, which arise in the limit that the passive depletion rate approaches one (31). Finally, we assumed that options deplete and replenish independently. However, in many real-world scenarios, reward availability may be correlated across options. For example, Vertechi et al. (30) studied a setting in which only one of two foraging sites can deliver reward at any given time—a situation akin to a predator tracking prey that can only occupy one region at a time. Our framework can be extended to include such coupling (eg through correlations in replenishment rates across options), as well as other environmental factors such as passive depletion and variability in reward probability and magnitude, which together could be used to study relationships between optimality and matching across a wider range of environments.
Previous theoretical studies that relate matching and optimality have often focused on policies in which choice probabilities can change over time (31–34). These include learning-based policies, in which choice probabilities evolve through continuous updating of value estimates based on reward feedback from past actions (31, 32), and state-dependent policies, in which choice probabilities depend directly on stored variables that reflect internal (33) or external (34) states. For example, when replenishment rates can take different sets of values, the agent can tether its policy to an internal belief about a latent environmental state that governs the rates across all options (34). In contrast, we have assumed that agents adopt a fixed-sampling-rate policy, even when replenishment rates fluctuate or change periodically. Such a policy is computationally compact, does not rely on memory of past rewards and choices, and has been described as having zero policy complexity, as it does not require distinguishing between different environmental states (34). This simplicity offers a useful baseline for comparison with more complex policies and, importantly, enables precise derivations of optimality and matching across a wide range of environments. For example, while previous work has shown how policy compression gives under-matching in dynamic foraging tasks in which the set of replenishment rates is symmetric across both options (34), our analysis shows that a fixed sampling rate policy can also give over-matching when the distributions of replenishment rates differ between options. Nevertheless, real animals can indeed adopt more complex behavioral strategies and may be subject to other behavioral constraints. For example, many animals have been observed to repeat their most recent choices, a phenomenon sometimes known as “stickiness” or “perseverance” (35–39), and logistic regression models fitted to behavioral data have often detected choice- and reward-history-dependent effects on animals’ next choices (29, 36–38). Other experiments have shown that animals can track time (40, 41), which may allow them to exploit temporal regularities in the replenishment process. Our approach can potentially be adapted to explore the role of policy structure on optimality and matching, which remains an interesting question for future work.
Supplementary Material
pgaf392_Supplementary_Data
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Stephens DW . 2008. Decision ecology: foraging and the ecology of animal decision making. Cogn Affect Behav Neurosci. 8(4):475–484.19033242 10.3758/CABN.8.4.475 · doi ↗ · pubmed ↗
- 2Shimp CP . 1969. Optimal behavior in free-operant experiments. Psychol Rev. 76(2):97.
- 3Staddon J, Motheral S. 1978. On matching and maximizing in operant choice experiments. Psychol Rev. 85(5):436.
- 4Baum WM . 1981. Optimization and the matching law as accounts of instrumental behavior. J Exp Anal Behav. 36(3):387–403.16812255 10.1901/jeab.1981.36-387PMC 1333108 · doi ↗ · pubmed ↗
- 5Staddon JE, Hinson JM, Kram R. 1981. Optimal choice. J Exp Anal Behav. 35(3):397–412.16812224 10.1901/jeab.1981.35-397PMC 1333092 · doi ↗ · pubmed ↗
- 6Iigaya K, et al 2019. Deviation from the matching law reflects an optimal strategy involving learning over multiple timescales. Nat Commun. 10(1):1466.30931937 10.1038/s 41467-019-09388-3PMC 6443814 · doi ↗ · pubmed ↗
- 7Herrnstein RJ . 1961. Relative and absolute strength of response as a function of frequency of reinforcement. J Exp Anal Behav. 4(3):267.13713775 10.1901/jeab.1961.4-267PMC 1404074 · doi ↗ · pubmed ↗
- 8Catania AC . 1963. Concurrent performances: a baseline for the study of reinforcement magnitude 1. J Exp Anal Behav. 6(2):299–300.14019311 10.1901/jeab.1963.6-299PMC 1404290 · doi ↗ · pubmed ↗
