Random Utility and Limited Consideration

Victor H. Aguiar; Maria Jose Boccardi; Nail Kashaev; Jeongbin Kim

arXiv:1812.09619·econ.GN·July 5, 2022

Random Utility and Limited Consideration

Victor H. Aguiar, Maria Jose Boccardi, Nail Kashaev, Jeongbin Kim

PDF

Open Access

TL;DR

This paper develops a unified framework for modeling decision-making with limited consideration, demonstrating that traditional random utility models cannot explain observed behavior, while the logit attention model can, based on novel experimental data.

Contribution

It introduces a theoretical and statistical framework that unifies models of random consideration and extends them to include preference heterogeneity.

Findings

01

RUM cannot explain the observed population behavior.

02

The logit attention model fits the data well.

03

Experimental data shows variation in choice sets and attention frames.

Abstract

The random utility model (RUM, McFadden and Richter, 1990) has been the standard tool to describe the behavior of a population of decision makers. RUM assumes that decision makers behave as if they maximize a rational preference over a choice set. This assumption may fail when consideration of all alternatives is costly. We provide a theoretical and statistical framework that unifies well-known models of random (limited) consideration and generalizes them to allow for preference heterogeneity. We apply this methodology in a novel stochastic choice dataset that we collected in a large-scale online experiment. Our dataset is unique since it exhibits both choice set and (attention) frame variation. We run a statistical survival race between competing models of random consideration and RUM. We find that RUM cannot explain the population behavior. In contrast, we cannot reject the hypothesis…

Tables5

Table 1. Table 1 : Lotteries measured in tokens, expected values, and variance

Lottery		Expectation	Variance	Preference Rank $u (x) = \frac{x^{1 - σ}}{1 - σ}$ with $σ$
				-2	0	0.25	0.30	0.50	0.75
(1)	$\frac{1}{2} 50 + \frac{1}{2} 0$	25.000	625.00	1	1	2	5	5	6
(2)	$\frac{1}{2} 30 + \frac{1}{2} 10$	20.000	100.00	5	5	5	2	1	1
(3)	$\frac{1}{4} 50 + \frac{1}{4} 30 + \frac{1}{4} 10 + \frac{1}{4} 0$	22.500	368.75	3	3	4	4	3	4
(4)	$\frac{1}{4} 50 + \frac{1}{5} 48 + \frac{3}{20} 14 + \frac{2}{5} 0$	24.125	511.73	2	2	1	3	4	5
(5)	$\frac{1}{5} 48 + \frac{1}{4} 30 + \frac{3}{20} 14 + \frac{1}{4} 10 + \frac{3}{20} 0$	21.625	251.11	4	4	3	1	2	3
(o)	12 with probability 1	12.000	0.00	6	6	6	6	6	2

Table 2. Table 2 : Testing Results under Preference Stability

Notes: Number of bootstrap replications=1000.
Model	$T_{n}$	p-value
$RUM$	3231.59	<0.001
$LA$	24959.06	0.524
$EBA$	24840.23	0.001

Table 3. Table 3 : Example 5 Stochastic choice rule and random consideration set probability. p 𝑝 p is consistent with LA - B LA - B \mathrm{LA}\text{-}\mathrm{B} but cannot be generated by RAM.

	${a, b, c}$	${a, b}$	${a, c}$	${b, c}$	${a}$	${b}$	${c}$	$\emptyset$
$a$	0.305	0.339	0.157		0.208
$b$	0.250	0.339		0.227		0.208
$c$	0.255		0.300	0.341			0.345
$o$	0.190	0.322	0.543	0.432	0.792	0.792	0.655	1
$η (D)$	0.20	0.30	0.01	0.10	0.05	0.05	0.10	0.19

Table 4. Table 4 : Average number of observations per alternative/choice set

Choice set	$N$	$N / \| A \|$	Choice set	$N$	$N / \| A \|$
o12345	171	28.50	o12	131	43.67
o2345	155	31.00	o13	118	39.33
o1345	154	30.80	o14	125	41.67
o1245	149	29.80	o15	116	38.67
o1235	156	31.20	o23	112	37.33
o1234	143	28.60	o24	123	41.00
o345	131	32.75	o25	120	40.00
o245	118	29.50	o34	121	40.33
o235	125	31.25	o35	122	40.67
o234	116	29.00	o45	119	39.67
o145	112	28.00	o1	155	77.50
o135	123	30.75	o2	154	77.00
o134	120	30.00	o3	149	74.50
o125	121	30.25	o4	156	78.00
o124	122	30.50	o5	143	71.50
o123	119	29.75

Table 5. Table 5 : The table displays the proportion of rejections at the 10 10 10 percent and 5 5 5 percent significance levels for LA - B LA - B \mathrm{LA}\text{-}\mathrm{B} . Sample size=4000. Number of MC replications=500. Number of bootstrap replications=500

	Significance level
Process	10%	5%
$λ = 0.25$	1	1
$λ = 0.50$	0.464	0.73

Equations120

p (a, A) = ≻\in R (X) \sum π (≻) D \subseteq A \sum m_{A} (D) \mathds 1 (a ≻ b, \forall b \in D)

p (a, A) = ≻\in R (X) \sum π (≻) D \subseteq A \sum m_{A} (D) \mathds 1 (a ≻ b, \forall b \in D)

m_{A} (D) = ψ η (D), C \in g (D, A) \sum η (C)

m_{A} (D) = ψ η (D), C \in g (D, A) \sum η (C)

m_{A} (D) = \frac{η ( D )}{\sum _{C \subseteq A} η ( C )} > 0

m_{A} (D) = \frac{η ( D )}{\sum _{C \subseteq A} η ( C )} > 0

η (A) = a \in X ∖ A \prod (1 - γ (a)) b \in A \prod γ (b)

η (A) = a \in X ∖ A \prod (1 - γ (a)) b \in A \prod γ (b)

m_{A} (D) = C : C \cap A = D \sum η (C)

m_{A} (D) = C : C \cap A = D \sum η (C)

η (A) = \mathds 1 (A = X) .

η (A) = \mathds 1 (A = X) .

M^{L} = ⎩ ⎨ ⎧ m : m_{A} (D) = ψ^{L} η (D), C \in g^{L} (D, A) \sum η (C) for some η \in Δ (2^{X}) ⎭ ⎬ ⎫ .

M^{L} = ⎩ ⎨ ⎧ m : m_{A} (D) = ψ^{L} η (D), C \in g^{L} (D, A) \sum η (C) for some η \in Δ (2^{X}) ⎭ ⎬ ⎫ .

m_{A} = m \in Δ (2^{A}) arg max D \subseteq A \sum [m (D) α (D) - K (m (D))] .

m_{A} = m \in Δ (2^{A}) arg max D \subseteq A \sum [m (D) α (D) - K (m (D))] .

m_{A} (D) = \frac{exp ( θ α ( D ))}{\sum _{C \subseteq A} exp ( θ α ( C ))} .

m_{A} (D) = \frac{exp ( θ α ( D ))}{\sum _{C \subseteq A} exp ( θ α ( C ))} .

p_{o} = ψ_{\emptyset} (η) .

p_{o} = ψ_{\emptyset} (η) .

m_{A} (\emptyset) = φ (η (\emptyset), C \subseteq A \sum η (C)),

m_{A} (\emptyset) = φ (η (\emptyset), C \subseteq A \sum η (C)),

η^{L} (D) = B \subseteq D \sum (- 1)^{∣ D ∖ B ∣} φ_{2}^{- 1, L} (η^{L} (\emptyset), p (o, D)) .

η^{L} (D) = B \subseteq D \sum (- 1)^{∣ D ∖ B ∣} φ_{2}^{- 1, L} (η^{L} (\emptyset), p (o, D)) .

m_{A}^{L} (D) = ψ^{L} η^{L} (D), C \in g^{L} (D, A) \sum η^{L} (C) .

m_{A}^{L} (D) = ψ^{L} η^{L} (D), C \in g^{L} (D, A) \sum η^{L} (C) .

p_{π}^{L} (a, A) = \frac{p ( a , A ) - \sum _{C \subset A} m _{A}^{L} ( C ) p _{π}^{L} ( a , C )}{m _{A}^{L} ( A )} .

p_{π}^{L} (a, A) = \frac{p ( a , A ) - \sum _{C \subset A} m _{A}^{L} ( C ) p _{π}^{L} ( a , C )}{m _{A}^{L} ( A )} .

p (a, A) = D \subseteq A \sum m_{A} (D) p_{π} (a, D),

p (a, A) = D \subseteq A \sum m_{A} (D) p_{π} (a, D),

p_{f} (a, A) = ≻\in R (X) \sum π_{f} (≻) D \subseteq A \sum m_{f, A} (D) \mathds 1 (a ≻ b, \forall b \in D) .

p_{f} (a, A) = ≻\in R (X) \sum π_{f} (≻) D \subseteq A \sum m_{f, A} (D) \mathds 1 (a ≻ b, \forall b \in D) .

p_{f} (a, A) = ≻\in R (X \cup {o}) \sum π_{o} (≻) \mathds 1 (a ≻ b, \forall b \in A),

p_{f} (a, A) = ≻\in R (X \cup {o}) \sum π_{o} (≻) \mathds 1 (a ≻ b, \forall b \in A),

B_{1, k, l} = \mathds 1 (a \in A) \mathds 1 (a ≻_{l} c, \forall c \in A),

B_{1, k, l} = \mathds 1 (a \in A) \mathds 1 (a ≻_{l} c, \forall c \in A),

G_{1}=\left[\begin{array}[]{cc}B_{1}&0_{d_{p}\times d_{m}}\\ 0_{d_{m}\times\left\lvert X\right\rvert!}&I_{d_{m}}\end{array}\right],

G_{1}=\left[\begin{array}[]{cc}B_{1}&0_{d_{p}\times d_{m}}\\ 0_{d_{m}\times\left\lvert X\right\rvert!}&I_{d_{m}}\end{array}\right],

G=\left[\begin{array}[]{cc}B&0_{d_{f}\cdot d_{p}\times d_{f}\cdot d_{m}}\\ 0_{d_{f}\cdot d_{m}\times\left\lvert X\right\rvert!}&I_{d_{f}\cdot d_{m}}\end{array}\right].

G=\left[\begin{array}[]{cc}B&0_{d_{f}\cdot d_{p}\times d_{f}\cdot d_{m}}\\ 0_{d_{f}\cdot d_{m}\times\left\lvert X\right\rvert!}&I_{d_{f}\cdot d_{m}}\end{array}\right].

p (o, A) = m_{A} (\emptyset) = φ (C \subseteq A \sum η (C), η (\emptyset)) .

p (o, A) = m_{A} (\emptyset) = φ (C \subseteq A \sum η (C), η (\emptyset)) .

\hat{P}_{f} = (\overset{p}{^}_{f} (a, A))_{A \in A, a \in A \cup {o}} .

\hat{P}_{f} = (\overset{p}{^}_{f} (a, A))_{A \in A, a \in A \cup {o}} .

T_{n} = n v \in \mathds R_{+}^{d} min \overset{g}{^}^{L} - G v^{2},

T_{n} = n v \in \mathds R_{+}^{d} min \overset{g}{^}^{L} - G v^{2},

n [v - τ_{n} ι / d] \in \mathds R_{+}^{d} min \overset{g}{^}^{L} - G v^{2};

n [v - τ_{n} ι / d] \in \mathds R_{+}^{d} min \overset{g}{^}^{L} - G v^{2};

T_{n, l}^{*} = n [v - τ_{n} ι / d] \in \mathds R_{+}^{d} min \overset{g}{^}_{l}^{L, *} - \overset{g}{^}^{L} + \overset{η}{^}_{τ_{n}} - G v^{2}, l = 1, \dots, L;

T_{n, l}^{*} = n [v - τ_{n} ι / d] \in \mathds R_{+}^{d} min \overset{g}{^}_{l}^{L, *} - \overset{g}{^}^{L} + \overset{η}{^}_{τ_{n}} - G v^{2}, l = 1, \dots, L;

p (a, A) = D \subseteq A \sum m_{A} (D) ≻\in R (X) \sum \tilde{π} (≻) \mathds 1 (a ≻ b, \forall b \in D)

p (a, A) = D \subseteq A \sum m_{A} (D) ≻\in R (X) \sum \tilde{π} (≻) \mathds 1 (a ≻ b, \forall b \in D)

D \subseteq A \sum m_{A} (D) ≻\in R \sum \tilde{π} (≻) \mathds 1 (a ≻ b, \forall b \in D) = \frac{1}{∣ R ( X ) ∣} ≻\in R (X) \sum [D \subseteq A \sum m_{A} (D) \mathds 1 (a ≻ b, \forall b \in D)] .

D \subseteq A \sum m_{A} (D) ≻\in R \sum \tilde{π} (≻) \mathds 1 (a ≻ b, \forall b \in D) = \frac{1}{∣ R ( X ) ∣} ≻\in R (X) \sum [D \subseteq A \sum m_{A} (D) \mathds 1 (a ≻ b, \forall b \in D)] .

D \subseteq A \sum m_{A} (D) \mathds 1 (a ≻ b, \forall b \in D) = p (a, A) \mathds 1 (a ≻ a) = p (a, A)

D \subseteq A \sum m_{A} (D) \mathds 1 (a ≻ b, \forall b \in D) = p (a, A) \mathds 1 (a ≻ a) = p (a, A)

\frac{1}{∣ R ( X ) ∣} ≻\in R (X) \sum [D \subseteq A \sum m_{A} (D) \mathds 1 (a ≻ b, \forall b \in D)] = p (a, A)

\frac{1}{∣ R ( X ) ∣} ≻\in R (X) \sum [D \subseteq A \sum m_{A} (D) \mathds 1 (a ≻ b, \forall b \in D)] = p (a, A)

\frac{1}{∣ R ( X ) ∣} ≻\in R (X) \sum [D \subseteq A \sum m_{A} (D) \mathds 1 (a ≻ b, \forall b \in D)] = \frac{1}{∣ R ( X ) ∣} ≻\in R (X) \sum p (a, A) = p (a, A) .

\frac{1}{∣ R ( X ) ∣} ≻\in R (X) \sum [D \subseteq A \sum m_{A} (D) \mathds 1 (a ≻ b, \forall b \in D)] = \frac{1}{∣ R ( X ) ∣} ≻\in R (X) \sum p (a, A) = p (a, A) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEconomic and Environmental Valuation · Decision-Making and Behavioral Economics · Environmental Education and Sustainability

Full text

\gamemathtrue

Random Utility and Limited Consideration††thanks: This paper was previously circulated as “Does Random Consideration Explain Behavior when Choice is Hard? Evidence from a Large-scale Experiment.” We are grateful to an editor and 4 anonymous referees for insightful comments and suggestions. We would like to thank Roy Allen, Jose Apesteguia, Miguel Ballester, Levon Barseghyan, Juan Dubra, Mikhail Freer, Yoram Halevy, Yuichi Kitamura, Paola Manzini, Marco Mariotti, John Rehbeck, and Jörg Stoye for useful comments and suggestions. We also thank the participants of the Barcelona GSE Summer Forum (Stochastic Choice), BRIC 2019, IWABE 2019, ASSA 2020 (Econometrics of Decision and Demand) for useful feedback. Mingshi Kang provided excellent research assistance. Aguiar and Kashaev gratefully acknowledge financial support from the Western Social Science Faculty grant and Social Sciences and Humanities Research Council. ††thanks: This study is approved by the Caltech’s IRB No. 18-0812.

Victor H. Aguiar Maria Jose Boccardi Nail Kashaev Jeongbin Kim Department of Economics, University of Western Ontario, [email protected], [email protected]. The experiment was run prior Boccardi joined AmazonDepartment of Economics, University of Western Ontario, [email protected] of Marketing, National University of Singapore, [email protected]

(This version: June 2022/ First version: November 2018)

Abstract

The random utility model (RUM, McFadden and Richter, 1990) has been the standard tool to describe the behavior of a population of decision makers. RUM assumes that decision makers behave as if they maximize a rational preference over a choice set. This assumption may fail when consideration of all alternatives is costly. We provide a theoretical and statistical framework that unifies well-known models of random (limited) consideration and generalizes them to allow for preference heterogeneity. We apply this methodology in a novel stochastic choice dataset that we collected in a large-scale online experiment. Our dataset is unique since it exhibits both choice set and (attention) frame variation. We run a statistical survival race between competing models of random consideration and RUM. We find that RUM cannot explain the population behavior. In contrast, we cannot reject the hypothesis that decision makers behave according to the logit attention model (Brady and Rehbeck, 2016).

JEL classification numbers: C90, C12, D81, D12.

Keywords: random utility, experimental discrete choice, random consideration sets, frames.

1. Introduction

A fundamental question in social science is how to describe the behavior of a population of decision makers (DMs). The random utility model (RUM, McFadden and Richter, 1990) is the standard tool to describe behavior.111 $\mathrm{RUM}$ was first proposed by Block and Marschak (1960) and Falmagne (1978) in an environment similar to ours. $\mathrm{RUM}$ assumes that DMs behave as if they maximize their preferences over their choice set. However, $\mathrm{RUM}$ may fail at describing behavior if DMs do not consider all available alternatives. For instance, DMs may behave as if they use a two-stage procedure: first simplifying choice by using a consideration set, and only then choosing the best alternative among those considered. Thus, DMs may choose dominated alternatives.222For evidence of choice of dominated alternatives see De Los Santos et al. (2012), Honka (2014), Heiss et al. (2016), Ho et al. (2017), Honka et al. (2017), Hortaçsu et al. (2017) and Barseghyan et al. (2019a). A large literature, pioneered by Masatlioglu et al. (2012) and Manzini and Mariotti (2014), has proposed theories of consideration-mediated choice. These theories accommodate departures from $\mathrm{RUM}$ caused by inattention, feasibility, categorization, and search.333See, for instance, Aguiar et al. (2016), Brady and Rehbeck (2016), Aguiar (2017), Lleras et al. (2017), Caplin et al. (2019), Horan (2019), Kovach and Ülkü (2020), and Cattaneo et al. (2020). In contrast to $\mathrm{RUM}$ ,444For evidence for $\mathrm{RUM}$ see Kitamura and Stoye (2018) and McCausland et al. (2018). little is known about the empirical validity of these models. Our work aims to fill this important gap in the literature.

Methodologically, we provide a unifying theoretical framework that generalizes well-known theories of random consideration. We unify these theories with a new concept called attention-index. The attention-index is a net measure of how enticing a collection of alternatives is, or how costly it is to pay attention to it. We show how to test these theories statistically, and how to recover the preference distribution and consideration rules. Our framework extends many theories of consideration-mediated choice to allow for preference heterogeneity. This allows us to take these theories of individual behavior to the population level, thus permitting the use of cross-sectional datasets to test them. Following McFadden and Richter (1990) and Kitamura and Stoye (2018), we take seriously the fact that all theories of stochastic choice have as their primitive the unobserved distribution over choices that can only be estimated in finite samples by sample frequencies of choice. Thus, to test these models in finite samples, we need to account for sampling variability.

Empirically, we design a large experiment555Other experiments that we are aware of that have collected stochastic choice data focusing on choice set variation are Apesteguia et al. (2018) ( $87$ individuals) and McCausland et al. (2018) ( $141$ participants). In contrast to our work, both focus mainly on binary choice sets and goodness-of-fit measures (including the computation of Bayes factors). with two independent sources of exogenous variation: (i) full variation in choice sets (menus), and (ii) variation in frames. We conducted this experiment online in Amazon Mechanical Turk (MTurk), collecting 12297 independent choice observations from 2135 individuals. A frame consists of observable information that is irrelevant in the rational assessment of the alternatives (Salant and Rubinstein, 2008).666We interpret the notion of rational assessment to be an assessment compatible with consequentialism and $\mathrm{RUM}$ . Full variation in choice sets means that all possible choice sets are observed by the researcher. It allows us to test consideration-mediated choice theories in a large cross-section of heterogeneous individuals. Frame variation means that we vary the complexity of the description of alternatives without affecting the relevant payoffs. This is equivalent to varying the cost of consideration. In this sense, we induce an attention frame. This variation in frames allows us to differentiate between $\mathrm{RUM}$ and models of limited consideration since consideration could change with frames but preferences must remain stable.

In general, without frame variation, many models of consideration are empirically indistinguishable from $\mathrm{RUM}$ . For instance, if a dataset, without frame variation, is consistent with the classical consideration model of Manzini and Mariotti (2014), then it is also consistent with $\mathrm{RUM}$ . However, this model and $\mathrm{RUM}$ will typically recover a distinct distribution of preferences. Varying frames, we can test whether the distribution of preferences remains the same across frames. This will imply that the model in Manzini and Mariotti (2014) and $\mathrm{RUM}$ may have different empirical implications in our experimental setup. The same reasoning applies to any model of limited consideration.

In our experimental design, we introduce three attention-cost/complexity frames for every choice set. Each DM faces all three frames. These treatments require the DM to solve a simple cognitive task to understand the alternatives. The consideration cost is progressively reduced across frames while we keep the choice set fixed. The fact that the choice remains the same is not explicitly stated in the experiment instructions. Within this design we can understand how changing consideration costs across frames may affect choices while we keep (the distribution of) preferences fixed. Under our incentives protocol (pay-at-random across tasks), and under consequentialism, the distribution of preferences must remain constant regardless of the frame. By exploiting this feature of our design, we show that, in our sample, $\mathrm{RUM}$ fails to describe the population behavior, while the logit attention model of Brady and Rehbeck (2016) describes it well.

Our paper generalizes the methodology of Cattaneo et al. (2020) by allowing heterogeneity in preferences when the choice sets include a default alternative. We show that in our setting, if one assumes independence of preferences and consideration, at the population level, testing a consideration-mediated choice theory with heterogeneous preferences and a dominated default is equivalent to testing whether this hypothetical full-consideration distribution over choices is consistent with $\mathrm{RUM}$ . We, therefore, use this result to test these models using the framework of Kitamura and Stoye (2018).

To exploit all possible implications of the limited-consideration models of interest, we need full variation in choice sets. A limited consideration model may describe behavior well for a nonexhaustive dataset, but it may fail to do so for an extended one. That is, one may have false positives when observing choices from a nonexhaustive set of menus, as discussed at length in De Clippel and Rozen (2018) and Cattaneo et al. (2020). Nonetheless, full exogenous variation in choice sets is an important data feature that is usually not satisfied in field data. Our testing procedure exploits rich experimental variation to increase the statistical power of our tests.

In addition, our experiment introduces a dominated default alternative that works as an opportunity cost of paying attention. For any choice set and frame, the default is always present and shown first. Moreover, it is pre-selected as the default choice–if the subject decides to skip the task, she is informed that the default alternative will be chosen for her. This design allows us to use the default alternative as the opportunity cost of incurring in the cost of consideration and understanding the other alternatives in the choice set. The set of alternatives in our experiment consists of lotteries. Hence, we use a degenerate lottery as the default due to its simplicity. In this sense, we believe the default alternative in our design has effectively zero cost of consideration. This dominated default is key to disentangle the distribution of preferences from random consideration.777We formulate a sensitivity analysis when the default is not dominated in Appendix B.

We use these theoretical and experimental innovations to test two well-known models of random consideration: (i) the logit attention model of Brady and Rehbeck (2016) ( $\mathrm{LA}$ ), and (ii) the version of elimination-by-aspects model of Tversky (1972) characterized by Aguiar (2017) ( $\mathrm{EBA}$ ).888We refer to $\mathrm{EBA}$ as the version of the original model in Tversky (1972) by Cattaneo et al. (2020) (Example $6$ ). The model in Aguiar (2017) coincides with $\mathrm{EBA}$ in the special case where there is a dominated default with a restriction that any category that does not contain the default has zero mass. The $\mathrm{EBA}$ model characterization for the case without a default remains an open question. There is increasing interest in incorporating limited consideration in discrete choice.999See, for instance, Goeree (2008), Barseghyan et al. (2019a, b), Dardanoni et al. (2020), and Abaluck and Adams (2021). In particular, the influential and tractable model of Manzini and Mariotti (2014) ( $\mathrm{MM}$ ) has become an important tool for the analysis of limited attention in empirical work (e.g., Dardanoni et al., 2020, Abaluck and Adams, 2021, and Kashaev and Lazzati, 2021). However, $\mathrm{MM}$ is highly stylized and assumes that consideration is driven by an item-dependent parameter (i.e., independence in consideration). We investigate from an experimental perspective whether this strong assumption is effective in explaining choice when consideration is hard. To do so we consider two extensions of $\mathrm{MM}$ that allow for substitution and complementarity in consideration, the $\mathrm{LA}$ and $\mathrm{EBA}$ models. These two generalizations have the property that their intersection is exactly the $\mathrm{MM}$ model (Suleymanov, 2018). This implies that $\mathrm{MM}$ explains the population behavior if and only if both the $\mathrm{LA}$ and $\mathrm{EBA}$ models explain it.

We test these models and the benchmark $\mathrm{RUM}$ conditioning on the frame. Crucially, we require the underlying preference relation to be stable among frames while allowing the consideration rules to vary with the frame. Our main findings are: (i) We reject the hypothesis that $\mathrm{RUM}$ provides a good description of population behavior. (ii) In contrast, the $\mathrm{LA}$ model with heterogeneous preferences cannot be rejected at the $5$ percent significance level. (iii) However, we reject the hypothesis that $\mathrm{EBA}$ , and hence $\mathrm{MM}$ , describe the population behavior.

Our work contributes to the recent experimental literature on stochastic choice, limited consideration, and departures from $\mathrm{RUM}$ . Even though by now, limited attention in many environments is well documented, it is less clear what structural models of limited attention should the practitioner use. We see our contribution as providing answers to this second issue.101010There is a vast literature documenting departures from fully rational behavior, but it is not focused on limited consideration. See Rieskamp et al. (2006) for a survey. We hope that our findings about which models of limited consideration are successful empirically will inform future empirical work in the field. For instance, our findings have already been used to motivate the choice of the parametric specification of limited consideration in the recent work of Abaluck and Adams (2021).111111Abaluck and Adams (2021) structurally estimate a model of discrete choice with limited consideration under the $\mathrm{LA}$ and $\mathrm{MM}$ model.

The paper proceeds as follows. Section 2 presents our model. Section 3 details frame variation and our testing procedure. Section 4 presents our experiment. Section 5 presents the testing results. Finally, Section 6 concludes. All proofs and additional results are in the appendix.

2. Environment – Model

We consider a finite choice set $X$ and we denote the outside alternative or default as $o\notin X$ . We let the set of all possible choice sets be $\mathcal{A}=2^{X}\setminus\{\emptyset\}$ , where $2^{X}$ denotes the set of all subsets of $X$ . A probabilistic choice rule is a mapping $p:X\cup\{o\}\times\mathcal{A}\mapsto[0,1]$ . The probabilistic choice rules for a given choice set add up to 1, $\sum_{a\in A}p(a,A)+p(o,A)=1$ . Moreover, $p(a,A)=0$ if $a\notin A$ . We fix $p(o,\emptyset)=1$ . A complete stochastic choice rule is a vector $P=\left(p(a,A)\right)_{A\in\mathcal{A},a\in A\cup\{o\}}$ . For identification purposes, we treat $P$ as a known object. In practice, we do not observe $P$ , but can consistently estimate it by the collection of sample frequencies $\hat{P}$ (see Section 3.2 for further details).

2.1.

Random Behavioral Model

We consider an environment where DMs, faced with a choice set $A\in\mathcal{A}$ , first pick $D\subseteq A$ (consideration set) and then choose the alternative in $D$ that maximizes their preferences. With probability $\pi(\succ)$ , DMs are endowed with preferences $\succ\in X\times X$ drawn from the set of all linear orders (strict preferences) on $X$ , $R(X)$ .121212Linear orders are complete, reflexive, transitive, and antisymmetric orders. Note that since $o\not\in X$ , following Manzini and Mariotti (2014), we implicitly assume that the default is picked if and only if nothing else is considered. A typical interpretation of this situation is the sleeping agent behavior (see, for instance, Abaluck and Adams, 2021). When the agent is sleeping (i.e., she considers the empty set) she chooses the default alternative. Otherwise, the agent wakes up and considers some nonempty set and maximizes her preferences in her consideration set. In the second case, the default is assumed to be dominated by the rest of alternatives. We show how to relax this assumption in Appendix B.

The distribution $\pi\in\Delta(R(X))$ fully captures preference heterogeneity.131313For any set $C$ , $\Delta(C)$ denotes the set of all probability distributions (simplex) on $C$ . $\mathds{1}\left(\,B\,\right)$ denotes indicator of the statement $B$ and is equal to $1$ if $B$ is true and is equal to zero otherwise. The distribution over random consideration sets given the menu $A$ is fully characterized by $m_{A}:2^{A}\to[0,1]$ , $\sum_{D\subseteq A}m_{A}(D)=1$ . In other words, $m_{A}$ is an element of the simplex $\Delta\left(2^{A}\right)$ . Let $m$ denote the complete collection of those distributions for all possible menus. That is, $m=\left(m_{A}(D)\right)_{A\in\mathcal{A},D\in 2^{A}}$ . We assume that the random consideration sets and random preferences are independent.

Definition 1 (Random Behavioral Model, $\mathrm{B}$ -rule).

A complete stochastic choice rule $P$ is a $\mathrm{B}$ -rule if there exists a pair $(m,\pi)$ such that

[TABLE]

for all $a\in X$ and $A\in\mathcal{A}$ .

This choice rule is illustrated in Figure 1. Definition 1 implicitly assumes that the random consideration set rule and the heterogeneous preferences are independent. Independence is a good starting assumption in the sterile environment of our experiment, as we want to achieve a decomposition of any observed probabilistic choice rule into its consideration (captured by $m$ ) and preference (captured by $\pi$ ) components. Independence has been assumed successfully in the structural work of Abaluck and Adams (2021). Also, we are interested in modeling decision-making in two-stages where DMs simplify a hard choice task using fast-and-frugal heuristics (consideration) that are independent of preferences, and then choose rationally from the simplified choice set. If the researcher observes additional information (e.g. age, gender, education, and income levels of individuals), then random consideration rule and random preferences need to be independent only conditionally on those observables.141414See Kashaev and Aguiar (2021) for a study of the correlation between preferences and consideration.

Independence holds trivially for the case of homogeneous preferences, such as all models covered by the Random Attention Model (RAM) of Cattaneo et al. (2020). Moreover, as the following lemma demonstrates, the $\mathrm{B}$ -rule does not have empirical content even under the independence assumption.

Lemma 1.

Every complete stochastic choice rule $P$ is a $\mathrm{B}$ -rule.

Without additional restriction on $\pi$ and $m$ the model is not falsifiable. That is, any $P$ can be generated by some independent $\pi$ and $m$ . We will impose constraints on $m$ that will allow us to test an important class of random consideration sets models without restricting heterogeneity in preferences.

2.2.

Attention-Index Consideration Set Rule

We restrict $m$ by considering a family of consideration set rules that are governed by an attention-index. The attention-index $\eta\in\Delta(2^{X})$ is a distribution over the power set. The value $\eta(D)$ captures the unconditional attention that DMs pay to the set $D\in 2^{X}$ . The attention-index of a set is a net measure of its attractiveness with respect to how hard it is to consider it. The attention-index measures how enticing a consideration set is, and how complex it is to understand. Therefore, $\eta(C)>\eta(D)$ means that $C$ , in net terms, attracts more attention than $D$ .

Definition 2 (Attention-index representation).

A consideration set rule $m$ admits an attention-index representation if there exists and attention-index $\eta$ , a link function $\psi$ , and an index correspondence $g$ such that $g(D,A)\subseteq 2^{X}\setminus D$ and

[TABLE]

for all $A\in\mathcal{A}$ and $D\subseteq A$ .

In what follows, we assume that the link function $\psi$ and the correspondence $g$ are known. A given link function captures the particular way in which the attention-index shapes consideration given a choice set or menu. In other words, the link function transforms unconditional attention into conditional (on the choice set) attention. The index set $g(D,A)$ captures the collection of sets that are used in the attention aggregator $\sum_{C\in g(D,A)}\eta(C)$ . However, we do not assume that $\eta$ is known and do no impose any restrictions on it, so the setting is still semiparametric.

Next, we define several important models of limited consideration that admit the attention-index representation. They only differ in how they use the attention-index to form the consideration set at a given menu.

Definition 3.

The logit attention ( $\mathrm{LA}$ , Brady and Rehbeck, 2016), the choice set independent ( $\mathrm{MM}$ , Manzini and Mariotti, 2014), the random consideration ( $\mathrm{EBA}$ , Tversky, 1972, Aguiar, 2017151515The model in Aguiar (2017) is a special case of the model in Tversky (1972) and coincides with it when the attention-index of only sets that contain the default is nonzero.), and the full consideration ( $\mathrm{FC}$ ) models admit an attention-index representation such that:

•

$m\in\mathcal{M}^{\mathrm{LA}}$ if and only if there exists $\eta\in\Delta(2^{X})$ such that for

[TABLE]

for all $A\in\mathcal{A}$ and $D\in 2^{A}$ ;

•

$m\in\mathcal{M}^{\mathrm{MM}}$ if and only if $m\in\mathcal{M}^{\mathrm{LA}}$ with

[TABLE]

for a given $\gamma:X\to(0,1)$ and for all $A\in 2^{X}$ ;

•

$m\in\mathcal{M}^{\mathrm{EBA}}$ if and only if there exists $\eta\in\Delta(2^{X})$ such that

[TABLE]

for all $A\in\mathcal{A}$ and $D\in 2^{A}$ ;

•

$m\in\mathcal{M}^{\mathrm{FC}}$ if and only if $m\in\mathcal{M}^{\mathrm{EBA}}$ with

[TABLE]

The $\mathrm{MM}$ -rule imposes independence in consideration across items, making the model highly stylized and tractable. Both the $\mathrm{LA}$ - and the $\mathrm{EBA}$ -rules generalize this item-independence to allow for substitution and complementarity of attention between different items. The $\mathrm{LA}$ -rule predicts that the probability of considering $D$ is proportional to the attention-index value $\eta(D)$ . The $\mathrm{EBA}$ -rule predicts that the probability of considering a subset $D$ given menu $A$ is equal to the probability that $D$ is the intersection of the subset of alternatives (considered randomly using the attention-index) and the choice set.161616Note that for the $\mathrm{LA}$ -model $\psi^{\mathrm{LA}}(x,y)=x/(x+y)$ and $g^{\mathrm{LA}}(D,A)=\{C\in 2^{X}\setminus{D}\>:\>C\cap A=C\}$ , and for the $\mathrm{EBA}$ -model $\psi^{\mathrm{EBA}}(x,y)=x+y$ and $g^{\mathrm{EBA}}(D,A)=\{C\in 2^{X}\setminus{D}\>:\>C\cap A=D\}$ . In our application, we focus on these four particular models because they allow us to learn about the true data generating process governing our experimental application systematically. 171717 $\mathrm{LA}$ and $\mathrm{EBA}$ are completely distinct generalizations of $\mathrm{MM}$ . Thus, rejecting or accepting either of these models is very informative. However, our approach extends to any model of consideration that admits an attention-index representation.181818For instance, one can generate a continuum of such models by taking all possible convex combinations of consideration rules $m^{\mathrm{LA}}$ and $m^{\mathrm{EBA}}$ that are generated by the same attention-index $\eta$ .

Given the definition of the attention-index representation, we can define a restricted version of the $\mathrm{B}$ -rule ( $\mathrm{L\text{-}B}$ -rule). Let $\mathcal{M}^{\mathrm{L}}$ be the set of all consideration rules induced by a given link function $\psi^{\mathrm{L}}$ , the correspondence $g^{\mathrm{L}}$ , and all attention-indices $\eta$

[TABLE]

Definition 4 ( $\mathrm{L\text{-}B}$ -rule).

A complete stochastic choice rule $P$ is $\mathrm{L\text{-}B}$ -rule if $P$ is a $\mathrm{B}$ -rule with $m\in\mathcal{M}^{\mathrm{L}}$ .

We believe that random consideration rules that admit an attention-index representation have several theoretical advantages over other generalizations in the literature. First, they cover, as special cases, well-known models of consideration and allow us to introduce a new class of semiparametric models of limited consideration with heterogeneous preferences. In other words, the $\mathrm{L\text{-}B}$ -rule unifies existing models into a common structure. This structure helps us to understand the common traits of these models, to classify them, and to provide an identification and testing framework to possibly new models covered in this structure.

Second, they enable the unique identification of the consideration rule and the underlying stochastic choice full consideration probability from a cross-section of choices with menu variation and heterogeneous preference (see Theorem 3). We are not aware of any other work that achieves the same and is more general than ours. Allowing heterogeneous preferences is important in analysis of any datasets, and is unavoidable in cross-sections. An important alternative generalization is RAM, which imposes only a shape constraint on random consideration. However, it does not allow heterogeneous preferences, nor it obtains point identification of consideration or the underlying full consideration choice rule. Moreover, RAM is completely uninformative about preferences, when the stochastic choice rule is regular,191919Regular random choice means that $p(a,A)\geq p(a,B)$ for any two menus $A\subseteq B$ and any $a\in A$ . as in the random utility models.202020Kashaev and Aguiar (2021) extend RAM to allow for heterogeneous preferences and show that when the stochastic choice rule is regular nothing can be said as well about random consideration and preferences even under independence among them. Also, the $\mathrm{L\text{-}B}$ -rule allows cycles of probabilities that can violate regularity while RAM cannot (see Appendix C.2).

Third, the concept of attention-index is intuitive, simple, and of behavioral interest, as it provides a useful (unconditional) index of attention for any subset of alternatives. The link function and the attention-index follow the tradition of classical stochastic choice theory of simple scalability where the probability of choice of an alternative in a menu is a nonlinear function (a link function) of some scale/index (Krantz, 1965, Tversky, 1972). In the simple scalability tradition, the scale/index captures the intensity of the stimuli associated with a particular alternative. A classical example is the logit model of choice. Here, we apply the same intuition to consideration–the attention-index captures the net attractiveness of a consideration set.

Fourth, consideration rules that admit an attention-index representation are compatible with optimal random consideration of a representative DM. Here, we show that a $\mathrm{L\text{-}B}$ -rule can be obtained as the result of allocating attention optimally (see Example 2).

In addition, consideration rules that admit an attention-index representation are compatible with a flexible interpretation of randomness. In our framework, the randomness due to limited consideration can arise both at the individual and population level. Indeed, consideration can be random at the individual level and independent and identically distributed (i.i.d.) at the population level; and consideration can be deterministic at the individual level but heterogeneous at the population level. The next example shows how the latter case can be described by a $\mathrm{L\text{-}B}$ -rule.

Example 1 (Heterogeneous Categorization).

Consider a population of DMs with two types of agents endowed with different deterministic attention rules. Assume that half of the DMs are fully attentive, while the other half follows a rule of thumb: they pay full attention to option $b$ if it is present in a given choice set, else the consideration set is empty. The DMs pick the best alternative, according to the (heterogeneous) preference realization governed by some $\pi\in\Delta(R(X))$ . If the consideration set is empty, then the outside option is selected. This population has heterogeneous (deterministic) consideration, however, the population behavior can be fully captured by a random consideration rule with the $\mathrm{EBA}$ restriction. Namely, let $\eta(X)=\frac{1}{2}$ and $\eta(\{b\})=\frac{1}{2}$ . Then the $\mathrm{EBA}\text{-}\mathrm{B}$ -rule can describe this population behavior.

The next example demonstrates that the consideration rules that admit an attention-index representation can be derived as a solution to the problem where representative DMs optimally allocate their attention.

Example 2 (Costly Attention Allocation).

Consider a (representative) DM whose preferences are given by a (mean) utility $u:X\to{\mathds{R}}$ and additive random vector $\boldsymbol{\xi}=(\boldsymbol{\xi}_{x})_{x\in X}$ such that the random utility of a given item $x$ is given by $u(x)+\boldsymbol{\xi}_{x}$ . The taste shocks $\boldsymbol{\xi}$ are distributed with respect to some continuous distribution.212121This guarantees that the implied random utility rule $\pi$ exists because ties have zero probability. When the DM is faced with a menu $A$ , she needs to allocate her attention, measured by $m_{A}\in\Delta(2^{A})$ , over all possible consideration sets in $A$ (including the empty set). The attractiveness of a set $D$ is captured by the McFadden’s surplus of a given set defined by $\alpha(D)=\mathds{E}\left[\max_{x\in D}\left(u(x)+\boldsymbol{\xi}_{x}\right)\right]$ for all $D\subseteq 2^{A}\setminus\{\emptyset\}$ , where the expectation is taken with respect to $\boldsymbol{\xi}$ . The attractiveness of the empty set is normalized to be [math], $\alpha(\emptyset)=0$ . The surplus $\alpha(D)$ is a measure of average attractiveness, capturing how enticing a consideration set is for the representative DM. The difficulty of picking a consideration set, or the cost of attention, is captured by a cognitive cost function $K:[0,1]\to{\mathds{R}}\cup\{\infty\}$ . If $D$ is considered with probability $m(D)$ , then the cognitive cost is $K(m(D))$ . The cost function is menu independent, but depends on the allocated attention $m(D)$ . Following Fudenberg et al. (2015), we assume that $K$ is convex. In this case, DM’s problem is to find $m_{A}\in\Delta(2^{A})$ that maximizes the expected attractiveness of the menu, given the cognitive cost of processing it. Formally,

[TABLE]

When $K(t)=0$ for all $t\in[0,1]$ , the solution is such that $m_{A}(A)=1$ . That is, the DM is consistent with full consideration ( $\mathrm{FC}$ ). When $K(t)=-t\log(t)/\theta$ , where $\theta$ is the cost parameter, we get that

[TABLE]

That is, optimal consideration in this case is consistent with $\mathrm{LA}$ with the attention-index $\eta(D)=\exp{(\theta\alpha(D))}$ . Note that in this definition of $\eta$ , the independence assumption is still satisfied since $\eta(D)$ is an aggregate quantity and only depends on the mean utilities $(u(x))_{x\in X}$ and the distribution of $\boldsymbol{\xi}$ but not on the realizations of $\boldsymbol{\xi}$ (e.g., when $\boldsymbol{\xi}$ follows the Gumbel distribution, then $\alpha(D)=\log\sum_{x\in D}e^{u(x)}$ ). Of course, we can replace $\alpha(D)$ with any other measure of attractiveness and all our derivations will go through. This means we can generate any model consistent with $\mathrm{LA}$ this way. Finally, when $K(t)=\frac{1}{2}t^{2}$ , we can get a special case of the $\mathrm{EBA}$ model.222222The $\mathrm{EBA}$ model then requires that the DM does not re-optimize in smaller menus $A\subset X$ . Instead, she uses the heuristic that whatever category she draws at $X$ it is intersected with the given menu to obtain the consideration set.

Finally, the attention-index representation also has several econometric advantages. It allows for a significant reduction of the dimensionality of the consideration set rules, making them tractable. In general, the number of parameters controlling the random consideration is $\sum_{A\subseteq X}2^{\left\lvert A\right\rvert}-1$ . The single-attention index and the link function reduce the number of unknown parameters to $2^{\left\lvert X\right\rvert}-1$ . It also leads to statistical testing (i.e. we can take into account sample variability) of known models of random consideration in a cross-section dataset of choices (see Section 5). Thus, one can confront existing models to experimental datasets in a competitive fashion to guide the exploration of models of consideration sets and to inform which models are more successful.

2.3.

Characterization and Identification of the $\mathrm{L\text{-}B}$ -model

In this section, we answer the following questions: (i) When can we recover different consideration rules from the data? (ii) What are their observable implications? We answer these questions by decomposing the observed probabilities of choice $P$ into an attention rule $m$ and a distribution of preferences $\pi$ . In other words, we recover from the dataset $P$ the primitives of the $\mathrm{B}$ -rule that generated it, and provide necessary and sufficient conditions that guarantee that a dataset $P$ can be generated by a $\mathrm{L\text{-}B}$ -rule.

Our starting point is to exploit the fact that if a consideration rule $m$ admits an attention-index representation, then the probability of choosing the default alternative is completely determined by the attention-index $\eta$ . In particular, the probability of choosing the outside option is independent of the distribution of preferences due to the independence assumption that we have imposed between preferences and consideration. In addition, recall that in our model the outside option is only chosen when nothing else in the menu is considered.232323For an extension that relaxes this assumption see Appendix B. If we denote $p_{o}=(p(o,A))_{A\in\mathcal{A}}$ and $\psi_{\emptyset}(\eta)=\left(\psi\left(\eta(\emptyset),\sum_{C\in g(\emptyset,A)}\eta(C)\right)\right)_{A\in\mathcal{A}}$ , abusing notation we can write the system of equations

[TABLE]

When $\psi_{\emptyset}$ is invertible, we can uniquely recover the random consideration rule from the probability of choosing the outside alternative from all different menus. Since our objective is the identification of the consideration rule, we provide a natural restriction on the attention-index rule that guarantees invertibility of $\psi$ . This restriction is satisfied by the models of interest in this paper (but not restricted to them).

Definition 5 (Totally monotone consideration).

A consideration rule $m$ admitting an attention-index representation is totally monotone if, we can write, for all $A\in\mathcal{A}$

[TABLE]

where $\varphi:[0,1]\times[0,1]\to[0,1]$ is a strictly monotone in each argument function.

The probability of not considering any object, conditional on a given choice set, is assumed to be a monotonic function of the cumulative probability of paying attention to at least some alternative in the menu (according to $\eta$ ), and of the probability of not considering anything (unconditionally). Crucially, the dependence on the correspondence $g$ disappears in a totally monotone attention-index representation, with respect to the general attention-index representation. Note, this happens only for the case of not considering anything (i.e., the consideration set is $\emptyset$ ).

Totally monotone attention-index rules are such that the random consideration is monotone as in Cattaneo et al. (2020), namely $m_{A}(\emptyset)\leq m_{B}(\emptyset)$ if $B\subseteq A$ , when $\varphi$ is strictly increasing in the first entry. However, in this case, they imply more. Since, the mapping $m_{(\cdot)}(\emptyset):\mathcal{A}\to[0,1]$ is a function of the cumulative probability of considering at least one item in any given menu (i.e., a function of $\sum_{C\subseteq A}\eta(C)$ ), the behavior of the probability of choosing the outside option will be restricted. For instance, for the case of $\mathrm{EBA}$ it will satisfy a form of marginal decreasing propensity of choice (Aguiar, 2017).

We highlight that total monotonicity is testable. Observe that the marginal probability of choosing the outside option, when a new set of alternatives is added to a menu, is weakly decreasing (e.g., $\Delta_{C}p(o,A)=p(o,A\cup C)-p(o,A)\leq 0$ and $\Delta_{D}(\Delta_{C}p(o,A)\leq 0$ )).242424Note, however, that this restriction does not mean that $m$ satisfies the monotonicity property for other menus different from the empty set. Alternatively, $\varphi$ can be strictly decreasing in the first entry, in which case it provides an antithetic behavior to that of RAM (yet testable). In this sense, this restriction on attention is neither weaker nor stronger than the monotonicity restriction in Cattaneo et al. (2020).

Strict monotonicity of $\varphi$ in each of its entries implies the invertibility of $\psi$ .252525This is a consequence of Mobius invertibility of the mapping $v(\cdot)=\sum_{C\subseteq\cdot}\eta(C)$ (Chateauneuf and Jaffray, 1989). This sufficient condition for invertibility of the link function $\psi$ is mild. It holds in all the examples of interest in this work. Importantly, it is a testable restriction. Note that since the inverse of $\psi$ is known under the model of interest, we can compute a candidate $\eta$ from the data $P$ . If the computed $\eta$ is not an element of the simplex $\Delta(2^{X})$ then $\psi$ is not invertible.

The next lemma shows that the models we consider admit a totally monotone attention-index representation.

Lemma 2.

Any $m\in\mathcal{M}^{\mathrm{L}}$ , $\mathrm{L}\in\{\mathrm{LA},\mathrm{MM},\mathrm{EBA},\mathrm{FC}\}$ , admits a totally monotone attention-index representation with

•

$\varphi^{\mathrm{LA}}(\eta_{o},t)=\frac{\eta_{o}}{t}$ ;

•

$\varphi^{\mathrm{EBA}}(\eta_{o},t)=1-t+\eta_{o}$ ;

•

$\varphi^{\mathrm{MM}}(\eta_{o},t)=\varphi^{\mathrm{LA}}(\eta_{o},t)$ * and $\varphi^{\mathrm{MM}}(\eta_{o},t)=\varphi^{\mathrm{EBA}}(\eta_{o},t)$ ;*

•

$\varphi^{\mathrm{FC}}(\eta_{o},t)=\varphi^{\mathrm{EBA}}(\eta_{o},t)$ .

The proof is omitted because of its simplicity for the cases of $\mathrm{LA}$ , $\mathrm{EBA}$ , and $\mathrm{FC}$ . For the case of $\mathrm{MM}$ the statement follows from Brady and Rehbeck (2016) and Aguiar (2017).262626Note that since $\mathrm{MM}$ and $\mathrm{FC}$ are special cases of $\mathrm{LA}$ and $\mathrm{EBA}$ respectively, they share the same link function. However, empirically we will be able to differentiate among them because of the additional restrictions they pose on $\eta$ .

The key assumption in this section has been that the default that is always present is always dominated. We formulate a sensitivity analysis when the default is not dominated in Appendix B.

2.4.

Characterization of the $\mathrm{L\text{-}B}$ -model

As a preliminary step for characterizing the $\mathrm{L\text{-}B}$ -model, we construct a candidate calibrated attention-index $\eta^{\mathrm{L}}$ from the data $P$ . Informally, this calibrated (revealed) attention-index is the result of inverting the link function $\varphi$ with respect to the probability of choosing the default alternative. The link function invertibility is a consequence of the monotonicity assumptions and the existence of a unique Mobius inverse of the cumulative attention-index $v(\cdot)=\sum_{C\subseteq\cdot}\eta(C)$ (Chateauneuf and Jaffray, 1989). We do this recursively. For a given $\varphi^{\mathrm{L}}$ , let $\varphi^{-1,\mathrm{L}}_{1}$ and $\varphi^{-1,\mathrm{L}}_{2}$ be the inverses of $\varphi^{\mathrm{L}}$ with respect to the first and the second argument, respectively. Let $\left\lvert A\right\rvert$ denote the cardinality of a finite set $A$ .

Definition 6 (Calibrated attention-index).

For given $P$ , $\eta^{\mathrm{L}}:2^{X}\to{\mathds{R}}$ is such that (i) $\eta^{\mathrm{L}}(\emptyset)=\varphi^{-1,\mathrm{L}}_{1}(p(o,X),1)$ , and (ii) for all $D\in 2^{X}\setminus X$

[TABLE]

The calibrated attention-index depends only on the dataset $P$ and the model $\mathrm{L}$ . If the calibrated attention-index of a set is negative, then $P$ could not have been generated by model $\mathrm{L}$ . This testable implication is analogous to the Block and Marschak (1960) inequalities.

Now, we construct an object, $m^{\mathrm{L}}$ , that is a distribution over consideration sets if the model $\mathrm{L}$ is correctly specified.

Definition 7.

For a given $P$ , let $m^{\mathrm{L}}=(m_{A}^{\mathrm{L}}(D))_{A\in\mathcal{A},D\in 2^{A}}$ , where $m^{\mathrm{L}}_{A}:2^{A}\to{\mathds{R}}$ is such that for all $A\in\mathcal{A}$ and $D\in 2^{A}$

[TABLE]

We can apply this generic formula for totally monotone attention-index rules to the specific models of interest.

Example 3.

•

$m^{\mathrm{LA}}_{A}(D)=\frac{\eta^{\mathrm{LA}}(D)}{\sum_{C\subseteq A}\eta^{\mathrm{LA}}(C)}$ , where $\eta^{\mathrm{LA}}(D)=\sum_{B\subseteq D}(-1)^{\left\lvert D\setminus B\right\rvert}\frac{p(o,X)}{p(o,B)}$ ;

•

$m^{\mathrm{MM}}_{A}(D)=\frac{\eta^{\mathrm{MM}}(D)}{\sum_{C\subseteq A}\eta^{\mathrm{MM}}(C)}$ , where $\eta^{\mathrm{MM}}(D)=\prod_{a\in X\setminus D}\left(1-\gamma^{\mathrm{MM}}(a)\right)\prod_{b\in D}\gamma^{\mathrm{MM}}(b)$ , and $\gamma^{\mathrm{MM}}:X\rightarrow{\mathds{R}}$ such that $\gamma^{\mathrm{MM}}(a)=1-\frac{p(o,A)}{p(o,A\setminus{\{a\}})}$ for some $A\in\mathcal{A}$ that contains $a$ ;

•

$m^{\mathrm{EBA}}_{A}(D)=\sum_{C:C\cap A=D}\eta^{\mathrm{EBA}}(C)$ , where $\eta^{\mathrm{EBA}}(D)=\sum_{A\subseteq D:D\in\mathcal{A}}(-1)^{\left\lvert D\setminus{A}\right\rvert}p(o,X\setminus{A})$ ;

•

$m^{\mathrm{FC}}_{A}(D)=\mathds{1}\left(\,A=D\,\right)$ .

In general, $m^{\mathrm{L}}$ may not be a distribution (some components may be negative or greater than $1$ ) since $m^{\mathrm{L}}$ is calibrated from observed frequencies. Moreover, $m^{\mathrm{LA}}$ or $m^{\mathrm{MM}}$ may not be well-defined if probabilities of choosing the outside option for some choice sets are zero.

To be able to estimate $m^{\mathrm{L}}$ from the data with probability approaching 1, we need the following definition that formalizes the above discussion.

Definition 8 (Well-defined $m^{\mathrm{L}}$ ).

$m^{\mathrm{L}}$ is well-defined if $m_{A}^{\mathrm{L}}\in\Delta(2^{A})$ for all $A\in\mathcal{A}$ .

We are ready to state our main result.

Theorem 1.

For every link function $\mathrm{L}$ , the following are equivalent:

(i)

$P$ * is a $\mathrm{L\text{-}B}$ -rule;* 2. (ii)

$m^{\mathrm{L}}$ * is a well-defined and $P$ is a $\mathrm{B}$ -rule described by $(m^{\mathrm{L}},\pi)$ .*

Theorem 1 provides a full characterization of well-defined models with link functions. If $P$ is a $\mathrm{L\text{-}B}$ -rule, then $m^{\mathrm{L}}$ has to be well-defined. Theorem 1 implies that to test a given model one does not need to consider all possible distributions over considerations sets. It suffices to check the unique distribution that is calibrated from observed $P$ according to Definition 7.

Initially, we had to find two objects (the distribution over preferences $\pi$ and the distribution over consideration sets $m$ ) to make the data consistent with the model. Now we just need to find $\pi$ . In other words, we simplified the testing problem. Unfortunately, the testing problem is still not tractable since the set of all possible distributions over preferences $\Delta(R(X))$ is big. To solve this problem we introduce another fictitious object.

Definition 9.

For given model $\mathrm{L}$ and $P$ , let $P_{\pi}^{\mathrm{L}}=(p_{\pi}^{\mathrm{L}}(a,A))_{A\in\mathcal{A},a\in X}$ , where $p_{\pi}^{\mathrm{L}}:X\times\mathcal{A}\to{\mathds{R}}$ is such that for all $A\in\mathcal{A}$ and $a\in A$

[TABLE]

Note that when $P$ has been generated by a $\mathrm{L\text{-}B}$ -rule, $P_{\pi}^{\mathrm{L}}$ corresponds to the underlying full-consideration random utility rule. In fact, we can write a $\mathrm{L}\text{-}\mathrm{B}$ model equivalently as:

[TABLE]

where $p_{\pi}(a,A)=\sum_{\succ\in R(X)}\pi(\succ)\mathds{1}\left(\,a\succ b,\>\forall b\in A\,\right)$ is the underlying $\mathrm{FC}$ distribution over (nondefault) choices that is weighted by the random consideration rule $m$ to produce the observed behavior. When $P$ has been generated by this $\mathrm{L}\text{-}\mathrm{B}$ -rule, it follows that $p_{\pi}^{\mathrm{L}}=p_{\pi}$ . That is why we call $P_{\pi}^{\mathrm{L}}$ the calibrated full consideration rule.

Similar to $m^{\mathrm{L}}$ , $P_{\pi}^{\mathrm{L}}$ has interpretation when the $\mathrm{L\text{-}B}$ -rule is consistent with the data. The next theorem provides the last missing piece of our characterization before testing.

Theorem 2.

Suppose that for given model $\mathrm{L}$ and stochastic choice rule $P$ (i) $m^{\mathrm{L}}$ is well-defined, (ii) $m^{\mathrm{L}}_{A}(A)>0$ for all $A\in\mathcal{A}$ . Then the following are equivalent:

(i)

$P$ * is a $\mathrm{L\text{-}B}$ -rule;* 2. (ii)

$P_{\pi}^{\mathrm{L}}$ * is a $\mathrm{FC}\text{-}\mathrm{B}$ -rule.*

Note that both $m^{\mathrm{L}}\geq 0$ and $P_{\pi}^{\mathrm{L}}$ can be computed from $P$ . Thus, Theorem 2 implies that to test a given model $\mathrm{L}$ it is necessary and sufficient to test whether $m^{\mathrm{L}}$ is well defined, and whether calibrated $P_{\pi}^{\mathrm{L}}$ is a full consideration rule. Theorems 1 and 2 provide a generalization of the characterization results in Manzini and Mariotti (2014), Brady and Rehbeck (2016), and Aguiar (2017). Moreover, they provides a unified result for all models that admit a (totally monotone) attention-index representation.272727Note that Theorem 2 simplifies the testing problem because it avoids the problem of computing the distribution over choices for every consideration set in every menu. We only need to focus on computing the distribution over choices in each menu.

In practice, we do not observe $P$ , but can consistently estimate it by the collection of sample frequencies $\hat{P}$ . In Section 3.2 we discuss how to test the $\mathrm{L\text{-}B}$ -rule accounting for sampling variability in $\hat{P}$ .

2.5.

Identification

Assuming independence between the distribution of preferences and the random consideration set rule, we uniquely identify the consideration set rule from $P$ if it is a $\mathrm{L\text{-}B}$ -rule, for all models with totally monotone link functions. Moreover, if there is a positive mass of individuals that consider all alternatives in the choice set, the recoverability of preferences is as good as in the case of full consideration.

Theorem 3 (Identification).

Suppose that for a given model $\mathrm{L}$ (i) $P$ is a $\mathrm{L\text{-}B}$ -rule and (ii) $m^{\mathrm{L}}_{A}(A)>0$ for all $A\in\mathcal{A}$ . If $P$ is described by $(m,\pi)$ and $(m^{\prime},\pi^{\prime})$ , then $m=m^{\prime}$ and $p_{\pi}=p_{\pi^{\prime}}$ .

We underline that we achieve a unique decomposition of the dataset $P$ into its attention and preference components. Identification of preferences and consideration rules is not a trivial task. Even for simple datasets where stochastic behavior arises from only one channel (for example limited consideration), models that only allow for stochastic behavior because of preference heterogeneity (e.g. $\mathrm{RUM}$ ), or because of random consideration/attention without additional assumptions (e.g. RAM) may fail to identify underlying preferences even when they perfectly describe observed choices. Our framework shows that the recoverability of preferences is as good as the $\mathrm{RUM}$ benchmark in stark contrast with RAM, where nothing can be learned about preferences for regular random choice.

3. Frames and Testing

3.1.

Frames

In this section, we introduce another source of variation in the data–attention frames. This additional source of variation will allow us to differentiate between behavior consistent with $\mathrm{RUM}$ and behavior consistent with $\mathrm{L\text{-}B}$ . Following Salant and Rubinstein (2008), we define the extended choice set as a pair of a choice set $A\in\mathcal{A}$ and an attention frame $f\in F$ . In this extended environment, for a given frame $f\in F$ , we can define a probabilistic choice rule $p_{f}$ . Similarly, a complete stochastic choice rule with frame $f\in F$ is $P_{f}=(p_{f}(a,A))_{A\in\mathcal{A},a\in A\cup\{o\}}$ .

The elements of $F$ contain descriptions of physical items that only vary in presentation but not in the information they contain. Our interpretation of attention frames is the same as Bhattacharya et al. (2021). These descriptions are available to DMs and should not affect their preferences, but may influence their attention.

Example 4 (Stochastic choice rule with frames).

Let $X=\{a,b,c\}$ and consider two frames: (i) $f$ that describes $a=1$ token, $b=2$ tokens and $c=3$ tokens; (ii) and $f^{\prime}$ that describes $a=3-2$ tokens, $b=10-8$ tokens, and $c=2+1$ tokens. An example of an (incomplete) stochastic choice rule with frame $f$ is $P_{f}=(p_{f}(a,\{a,b\})=0,p_{f}(b,\{a,b\})=1,p_{f}(b,\{b,c\})=0,p_{f}(c,\{b,c\})=1).$ An example of an (incomplete) stochastic choice rule with frame $f^{\prime}$ is $P_{f^{\prime}}=(p_{f^{\prime}}(a,\{a,b\})=1/2,p_{f^{\prime}}(b,\{a,b\})=1/2,p_{f^{\prime}}(b,\{b,c\})=1/2,p_{f^{\prime}}(c,\{b,c\})=1/2).$

In Example 4, $f$ and $f^{\prime}$ present the same information about the same alternatives in two distinct ways. In particular, different frames correspond to different numbers of arithmetic operations used to describe the value of the option. The same value is expressed in each frame with a different sequence of arithmetic operations.

The $\mathrm{B}$ -rule with frame $f$ can be defined analogously to the definition of the $\mathrm{B}$ -rule in Section 2. A complete stochastic choice rule with frame $f$ , $P_{f}$ , is a $B$ -rule if there exists a frame dependent distribution over random consideration sets, $m_{f}$ and a frame dependent distribution over strict linear orders $\pi_{f}$ such that

[TABLE]

We take the stand that preferences should not depend on the way the information is presented. Only attention may change due to frames. We formalize these ideas by the assumption of consequentialism, which is a common implicit assumption in the RUM framework of McFadden and Richter (1990).

Definition 10 (Consequentialism).

A collection of $\mathrm{B}$ -rules $(P_{f})_{f\in F}$ described by $((m_{f},\pi_{f}))_{f\in F}$ is said to satisfy consequentialism if there exists $\pi$ such that $\pi_{f}=\pi$ for all $f\in F$ .

Consequentialism means that an attention frame does not alter the payoffs that a DM obtains from choosing a given alternative. In the same way that the standard rational choice framework imposes frame independence (Salant and Rubinstein, 2008), the classical RUM imposes consequentialism (McFadden and Richter, 1990).

At this point, it is useful to formally define the Random Utility Model ( $\mathrm{RUM}$ ) over the whole choice set $X\cup\{o\}$ . $\mathrm{RUM}$ treats the default alternative as just another item with no special status. That is, it is not assumed to be a dominated alternative. Let $R(X\cup\{o\})$ be a set of linear orders over the extended choice set $X\cup\{o\}$ .

Definition 11 (Random Utility Model, $\mathrm{RUM}$ ).

A collection of complete stochastic choice rules with frame $(P_{f})_{f\in F}$ is consistent with random utility if there exists $\pi_{o}\in\Delta(R(X\cup\{o\}))$ such that

[TABLE]

for all $a\in A$ , $A\in\mathcal{A}$ , and all $f\in F$ .

Note that $\mathrm{RUM}$ satisfies consequentialism and $\pi_{o}$ does not depend on the frame. The $\mathrm{L\text{-}B}$ -rule extends the RUM framework to allow for frame dependence only through the random consideration rule.

By assuming that preferences satisfy consequentialism, we are essentially making two assumptions: (i) we assume that the distribution of preferences does not depend on how the alternatives are presented; (ii) consideration in turn can depend on payoff irrelevant information. These assumptions are important, since our experiment has three treatments where a subject must perform a different number of arithmetic operations to evaluate the monetary value of a prize.

These assumptions are auxiliary and are not necessary for either $\mathrm{RUM}$ or $\mathrm{L\text{-}B}$ -rules. However, these assumptions are a reasonable baseline to compare $\mathrm{RUM}$ and random consideration models. The reason these assumptions are not immediately implied is because $\mathrm{RUM}$ is often viewed as descriptive. Taking this one step further, one could assume that the distribution of preferences varies with the difficulty of evaluating a task in our experiment. This would violate our assumption of consequentialism. While frame-dependent preference could be a reasonable assumption in some settings, it would limit the ability to predict counterfactual choices. For example, one would need to identify preferences in each setting to get a prediction on choices.

Specifically, frame variation allows us to differentiate between $\mathrm{L\text{-}B}$ -rules and $\mathrm{RUM}$ as follows. Without it $\mathrm{LA}$ is not nested by nor nests $\mathrm{RUM}$ . For example, $\mathrm{LA}$ allows for attraction effect which violates regularity and therefore is inconsistent with $\mathrm{RUM}$ . Also, their intersection is nonempty because $\mathrm{MM}$ is both consistent with $\mathrm{RUM}$ and $\mathrm{LA}$ (Manzini and Mariotti, 2014, Brady and Rehbeck, 2016). Moreover, for a fixed frame $\mathrm{EBA}$ is nested in $\mathrm{RUM}$ and nests $\mathrm{MM}$ , therefore its intersection with $\mathrm{LA}$ is nonempty (Aguiar, 2017) (see Figure 2).282828Suleymanov (2018) shows that $\mathrm{MM}$ is the intersection of $\mathrm{LA}$ and $\mathrm{EBA}$ . However, with frame variation this is no longer true: $\mathrm{RUM}$ and $\mathrm{EBA}$ intersect, but are not nested within each other.292929All these relationships are preserved when allowing for heterogeneous preferences under the independence assumption of preferences and attention. The reason is that the outside probability does not depend on the distribution of preferences. For more details see Appendix C.

3.2.

Testing Procedure

Theorem 2 allows us to test whether for a given frame $f$ , a given stochastic choice rule $P_{f}$ is a $\mathrm{L\text{-}B}$ -rule: it is necessary and sufficient to test whether $m_{f}^{\mathrm{L}}$ is well-defined (satisfies a set of linear inequalities) and $P_{f,\pi}^{\mathrm{L}}$ is consistent with the full consideration model. Note that the full consideration model is equivalent to the random utility model without outside option. Testing for $\mathrm{RUM}$ is a well-understood problem and amounts to solving a quadratic optimization with cone constraints (see McFadden and Richter, 1990 and Kitamura and Stoye, 2018). The approach proposed by Kitamura and Stoye (2018) allows us to test these conditions while accounting for sampling variability induced by using $\hat{P}_{f}$ instead of unknown $P_{f}$ .

We, however, need to slightly modify the testing procedure in Kitamura and Stoye (2018) to take into account the frame variation. First we describe the testing procedure for the fixed frame and then extended it to environments with frame variation.

To introduce the testing procedure, we need to define several objects. Note that, for a fixed frame $f$ , the calibrated full consideration rule, $P_{f,\pi}^{\mathrm{L}}$ , is a vector of length $d_{p}=\sum_{k=1}^{\left\lvert X\right\rvert}k{\left\lvert X\right\rvert\choose k}$ .303030 ${n\choose k}=\dfrac{n!}{k!(n-k)!}$ and $n!=1\cdot 2\cdot\dots\cdot n$ . The $k$ -th element of $P_{f,\pi}^{\mathrm{L}}$ corresponds to some pair $(a,A)$ such that $a\in A$ .

Let $B_{1}$ be the matrix of the size $d_{p}\times\left\lvert X\right\rvert!$ such that $(k,l)$ element of it is equal to

[TABLE]

where $k$ corresponds to a pair $(a,A)$ such that $a\in A$ , and $\succ_{l}$ is $l$ -th linear order on $X$ . We define $G_{1}$ as the matrix of size $(d_{p}+d_{m})\times d_{1}$ , where $d_{1}=\left\lvert X\right\rvert!+d_{m}$ and $d_{m}=\sum_{A\subseteq X}2^{\left\lvert A\right\rvert}$ is the dimension of $m_{f}^{\mathrm{L}}$ , such that

[TABLE]

where $0_{d_{p}\times d_{m}}$ denotes the zero matrix of size $d_{p}\times d_{m}$ , and $I_{d_{m}}$ denotes the identity matrix of size $d_{m}\times d_{m}$ . The next result establishes an equivalent characterization of the $\mathrm{L\text{-}B}$ -rule via $m_{f}^{\mathrm{L}}$ and $P_{f,\pi}^{\mathrm{L}}$ . Let ${\mathds{R}}^{d_{1}}_{+}$ denote component wise nonnegative elements of the $d_{1}$ -dimensional Euclidean space ${\mathds{R}}^{d_{1}}$ .

Theorem 4.

For a fixed frame $f$ the following are equivalent:

(i)

$P_{f,\pi}^{\mathrm{L}}$ * is a $\mathrm{FC}\text{-}\mathrm{B}$ -rule and $m_{f}^{\mathrm{L}}$ is well-defined;* 2. (ii)

$\inf_{v\in{\mathds{R}}^{d_{1}}_{+}}\left\lVert g_{f}^{\mathrm{L}}-G_{f}v\right\rVert=0$ , where $g_{f}^{\mathrm{L}}=(P_{f,\pi}^{\mathrm{L}\prime},m_{f}^{\mathrm{L}\prime})^{\prime}$ .

Proof.

See McFadden and Richter (1990) and Kitamura and Stoye (2018). ∎

Theorem 4 implies that we can test the null hypothesis that $\inf_{v\in{\mathds{R}}^{d_{1}}_{+}}\left\lVert g_{f}^{\mathrm{L}}-G_{1}v\right\rVert=0$ . Fortunately, this testing problem can be directly cast to the testing problem in Kitamura and Stoye (2018).

To take into account the frame variation and consequentialism, we need to modify the matrix $G_{1}$ . Let $d_{f}=\left\lvert F\right\rvert$ and $B$ be a matrix of size $d_{f}\cdot d_{p}\times\left\lvert X\right\rvert!$ that consists of $d_{f}$ matrices $B_{1}$ stacked together. That is, $B=\left(B_{1}^{\prime}\>B_{1}^{\prime}\>\dots\>B_{1}^{\prime}\right)^{\prime}$ . Let $d=\left\lvert X\right\rvert!+d_{f}\cdot d_{m}$ . Similarly to $G_{1}$ define,

[TABLE]

Note that when $d_{f}=1$ (i.e., the frame is fixed), then $G=G_{1}$ .

Corollary 1.

The following are equivalent:

(i)

$(P_{f})_{f\in F}$ * satisfies consequentialism; $P_{f,\pi}^{\mathrm{L}}$ is a $\mathrm{FC}\text{-}\mathrm{B}$ -rule and $m_{f}^{\mathrm{L}}$ is well-defined for all $f\in F$ .* 2. (ii)

$\inf_{v\in{\mathds{R}}^{d}_{+}}\left\lVert g^{\mathrm{L}}-Gv\right\rVert=0$ , where $g^{\mathrm{L}}=\left((P^{\mathrm{L}}_{f,\pi})^{\prime}_{f\in F},m_{f}^{\mathrm{L}})^{\prime}_{f\in F}\right)^{\prime}$ .

4. The Experiment

Our testing approach does not have requirements in terms of repeated individual choice data. Exploiting this feature, our experiment was designed to study the performance of different theories of random consideration sets with few observations per individual. In particular, we conducted the experiment in Amazon MTurk for a large cross-section with at most two (disjoint) choice sets per individual (see Section 4.1). The large sample size of the dataset generated by our experiment is fundamental for ensuring high statistical power using the tools in Kitamura and Stoye (2018).

All sessions were run between August 25, 2018 and September 17, 2018 on the MTurk platform with surveys designed in Qualtrics.313131By clicking the link on the MTurk page, subjects were randomly directed to one of the treatments implemented by Qualtrics. After completing their task, subjects were also asked to complete a short survey regarding their demographic information. Subjects were not allowed to participate in the experiment more than once. Only subjects living in the USA were recruited. We surveyed $2135$ individuals. They were paid on average $\$ 1.09 $as a result of$ $0.25 $for participation fee and the outcome of a randomly selected task that pays a minimum of$ $0 $and a maximum of$ $2 $. All payments were made in USD. The average duration of the session was$ 251.68 $seconds (slightly over 4 minutes).323232The average duration of each task is about$ 23 $seconds, and the duration is significantly correlated with the length of the choice set and the frame. This means that our average payment per hour is roughly$ $15$.

The payment in our experiment is comparable to other well-known experiments conducted in MTurk. To name a few, Horton et al. (2011) studied behavior in MTurk using games with the payment range between $\$ 0.40 $and$ $1.60 $. They find that behavior in MTurk is consistent with behavior in the lab, where the stakes of games are ten times bigger. They also estimate the median minimum wage in MTurk as$ $0.14 $per hour.333333The minimum wage here refers to the reservation wage the MTurk subjects have for performing a given task. Dean and McNeill ([2014](#bib.bib17)) conducted experiments of decision-making. The average payment for completing 15-min long tasks was between$ $1.35 $and$ $1.55 $including the show-up fee of$ $0.25 $. Kim ([2016](#bib.bib37)) conducted an experiment in MTurk for several weeks with one 10-min task each week. The average earnings from each week’s task were below$ $1.00 $. Rand *et al.* ([2012](#bib.bib48)) also conducted a public good game with MTurkers and the payment range was between$ $0.90 $and$ $1.50 $including the show-up fee of$ $0.50$.

4.1.

Experimental Design

In our experiment, we have two independent sources of exogenous variation: full variation in choice sets, and variation in frames. Recall that full variation in choice sets means that all possible choice sets are observed, and variation in frames means that we vary observable information without affecting the rational assessment of the alternatives. These two sources of variation allow us to test consideration-mediated choice theories in a large cross-section of heterogeneous individuals, and differentiate these theories from $\mathrm{RUM}$ . The latter is possible since consideration is allowed to change with frames, but preferences must remain stable due to consequentialism. We vary the frame through changes in the cost of consideration.

To induce preference heterogeneity, we consider lottery alternatives with different expected values and variances. Table 1 shows the alternatives and implied preference rankings if DMs are expected utility maximizers with CRRA Bernoulli utility function. The outside option is dominated for moderate levels of risk aversion (e.g. Holt and Laury, 2002).343434Recall that the assumption that the default is dominated is a testable assumption in our framework. In addition, without cost of consideration treatments, the outside alternative is easier to understand than the rest because of its simplicity. Hence, it works as a consideration-reference point in the sense of Suleymanov (2018).

Let $X=\{l_{1},l_{2},l_{3},l_{4},l_{5}\}$ be the set of all nondefault alternatives, and let $o$ be the default/outside option. All menus $A\in\mathcal{A}$ are observed in the sample. The outside option is always present and is shown first, while the order of other alternatives is randomized. Menus can be thought as different treatments.

Our primitive to test $\mathrm{L\text{-}B}$ is $\hat{P}=\left(\hat{p}(a,A)\right)_{a\in A\cup\{o\},A\in\mathcal{A}}$ , therefore we proceeded with stratified sampling, setting the minimal number of observations per choice set to be proportional to its cardinality, i.e. $n_{A}=\lambda(\left\lvert A\right\rvert+1)$ with $\lambda\geq 30$ . This design requires a minimum of $\sum_{A\in\mathcal{A}}\left\lvert A\right\rvert=3330$ tasks.

For each menu, the DM faced three consideration frames or cost-treatments: High (H), Medium (M), and Low (L). These frames/cost-treatments were induced by introducing a $k$ -length two digit addition/subtraction to compute each prize in the lottery. The length $k$ was set equal to $5$ , $3$ , and $1$ , for the high, the medium, and the low cost, respectively. Since in our experiment attention frames only change the complexity of the description of lotteries, we assume that preferences of DMs do not depend on the way alternatives are described, thus are consistent with consequentialism.

The numbers for the cognitive task were randomly generated. Examples can be seen in Figure 3. The default alternative $o$ was presented as is, and there was no need to solve an arithmetic problem to understand it across the different levels of cost.

To prevent possible learning that could attenuate consideration costs, subjects were faced with disjoint choice sets. That is, subjects were either presented with the full choice set and the outside option ( $X\cup\{o\}$ ); or a partition of $X$ (presented at random order), i.e. $A_{j}\cup\{o\},A_{k}\cup\{o\}$ with $A_{j}\cup A_{k}=X$ and $A_{j}\cap A_{k}=\emptyset$ . Our experimental design is summarized in Figure 4.

The default alternative

Our design allows us to use $o$ as the opportunity cost of incurring in the cost of consideration and understanding the other lotteries in the choice set. We use a degenerate lottery as the default due to its simplicity. In this sense, we believe the alternative $o$ in our design has effectively zero cost of consideration. Recall, that for any choice set/frame the outside option is always present and shown first. Moreover, it is pre-selected as the default alternative. If the subject decides to skip the task, she is informed that $o$ will be chosen for her.

4.2.

Sample

The sample consists of 2135 individuals that selected alternatives from one or two choice sets for all costs of attention, as shown in Figure 4, for a total of 12297 observations. The number of observations per alternative/choice set are shown in Table 4. Based on these observations the primitive for our analysis is the collection of observed frequencies $(\hat{p}(a,A))_{a\in A,\,A\in\mathcal{A}}$ . We compute these frequencies for all costs. Unless otherwise stated $\hat{p}(a,A)$ refers to observed frequency in the data pooled across attention costs.

Figure 5 summarizes the distribution of gender, age, education, ethnicity, labor, and income in our sample. Our subjects are a diverse sample of US individuals. By design, demographics are balanced across consideration cost treatments and choice sets (that can also be thought of as treatments).

4.3.

Descriptive Analysis: Evidence for Costly Consideration and Frame Effects

In this section, we describe the behavior of individuals in our sample and present suggestive evidence that our cost treatments effectively induce costly consideration or frame effects. In particular, we observe that the consideration cost treatments/frames: (i) affect the choice frequency of the outside option; (ii) have a heterogeneous effect on the choice frequencies of all other alternatives; (iii) and affect the patterns of choice with respect to the size of the menu. Moreover, all these effects depend monotonically on the level of difficulty of choice induced by each treatment.

Under the null hypothesis of full consideration and consequentialism, the observed frequency of choice of the outside option should remain the same across frames. The reason is that the choice menu remains the same across cost treatments, and payment is at random. However, the outside option is chosen more often as the cost increases (see Figure 6). This is evidence against full consideration and in favor of frame-dependent choice.

We remind the reader here that low, medium, and high cost corresponds to $1$ , $3$ , and $5$ arithmetic operations required to understand the monetary prize of each lottery, respectively. The monotone relation shows that our treatments were effective, and that the frequency of choice of the default is in fact ordered in the way it was expected.

Figure 7 shows the effect of the frames/cost-treatments on the choice frequencies. The harder it is to understand the lotteries,353535Here simplicity comes in the form of how easy (number of arithmetic operations) it is to compute expectation, variance, and expected utility of the lottery in terms of the number of prizes and whether the probabilities are uniform on the support of the lottery or not. the more likely subjects opt to not consider them and instead choose the outside option. These results support the effectiveness of the induced treatments.

Figure 7 also shows that the effect of the cost treatment is not homogeneous across alternatives. The choice frequency of lottery $1$ (the simplest to understand after the default) increases with the cost treatment; the cost does not have a significant impact on the probability of selecting lottery $4$ ; while it has a negative impact on the other lotteries.

Overall, the probability of selecting any given lottery does not necessarily decrease with the size of the menu, suggesting that regularity is violated in our sample, as shown in Figure 8. Indeed, we test $\mathrm{RUM}$ later and confirm that these violations are significant, since we reject $\mathrm{RUM}$ at the $5$ percent significance level.

Notice that, for a fixed frame/attention cost, some lotteries are harder to compute than others. For instance, lottery $2$ can pay $30$ or $10$ tokens, while lottery $3$ can pay $50$ , $30$ , $10$ , or [math]. So for every frame, lottery $3$ is harder to compute than lottery $2$ . Our attention-index framework allows for alternatives or lotteries with heterogeneous complexity (e.g., the simplest model $\mathrm{MM}$ has a lottery-specific attention parameter). For instance, the simplest lottery $1$ is picked more often as the cost rises, while the hardest to compute lottery $5$ displays the opposite pattern. These observations confirm the presence of heterogeneous attention patterns.

4.4.

Evidence for Total Monotonic Attention

Recall that, under the null hypothesis, the dataset is generated by an attention-index model with a link function that is totally monotone. Hence, the frequency of choosing the outside option is equal to a monotone transformation of cumulative attention associated with the attention-index $\sum_{C\subseteq A}\eta(C)$ :

[TABLE]

In Figure 8, we observe that the frequency of choice of the outside option is not a decreasing function of the cardinality of the choice set. The latter is inconsistent with total monotonicity, and, thus, we may naively conclude that neither $\mathrm{RUM}$ , $\mathrm{LA}$ , nor $\mathrm{EBA}$ can explain our dataset. However, as we confirm in the next section, this slight violation of monotonicity is an artifact of sample variability, and it is not statistically significant.

Choice overload refers to the case when the propensity of not choosing any alternative (i.e., the probability of picking the default alternative) increases with the size of the choice sets (Iyengar and Lepper, 2000). Our findings are informative on whether this effect, which may be present at the individual level, still matters at the population level. Note that neither $\mathrm{RUM}$ nor any model of limited consideration that we study (including Cattaneo et al., 2020) can rationalize choice overload.363636Chernev et al. (2015) provides a meta-data analysis of the determinants of choice overload. Existing models of limited consideration are fundamentally at odds with choice overload, since one of the important reasons to form a consideration set is to simplify choice.373737Other models of stochastic choice that are not models of limited consideration usually can accommodate choice overload. See, Fudenberg et al. (2015), Echenique et al. (2018), Kovach and Tserenjigmid (2018), and Natenzon (2018). We find no statistical support for choice overload in our dataset. However, our choice set is of moderate size.

4.5.

**Differentiating Between $\mathrm{LA}$ and ** $\mathrm{EBA}$

In Figure 8, we also plot the frequencies of choice of the nondefault alternatives and find that there are violations of total monotonicity (e.g., lottery $3$ ). This evidence suggests that $\mathrm{EBA}$ cannot describe this dataset because total monotonicity should hold for all alternatives (Block and Marschak, 1960, Aguiar, 2017). However, this is not enough to conclude that $\mathrm{EBA}$ cannot describe the dataset because of sample variability. Nonetheless, we reject the null hypothesis that $\mathrm{EBA}$ can describe this dataset. In addition, we must highlight that $\mathrm{LA}$ is the only candidate that can accommodate the observed nonmonotonicity of the nondefault alternatives observed in this dataset. We confirm this insight in our formal testing section by showing that $\mathrm{LA}$ does a good job at describing this dataset.

The nonmonotonicity that we observe in Figure 8 is usually called attraction effect (Huber et al., 1982). The attraction effect refers to a phenomenon when, as a new alternative is added to the choice set, the probability of the existing items is boosted. Our findings support the presence of the attraction effect in our experimental sample.

5. Testing Random Consideration Models

In this section, we report the results of testing the ability of $\mathrm{RUM}$ , $\mathrm{LA}$ , and $\mathrm{EBA}$ to describe our experimental data. We test these models without imposing any restrictions on preferences except stability over frames. Unless otherwise stated, the tested hypothesis is that, for a particular specification of our model (consideration set stochastic rule), there exists $(m,\pi)$ that is a $\mathrm{L\text{-}B}$ representation for behavior.

5.1.

Econometric Testing

For every frame $f$ , although $P_{f}$ is not observed, the realized choice frequencies $\hat{P}_{f}$ are. For every $A\in\mathcal{A}$ and $f\in F$ , let $n_{f,A}$ denote the number of individuals in the sample that faced choice set $A$ and frame $f$ , and let $\mathbf{a}_{i,f,A}$ , $i=1,\dots,n_{A}$ be the observed choice of individual $i$ from choice set $A\cup\{o\}$ and frame $f$ . We assume that the researcher observes a cross-section of observations (i.e., i.i.d. observations) for every menu and frame.383838For a given frame, this is a standard stochastic choice dataset in the literature on limited consideration. Then we define the estimated stochastic choice rule as

[TABLE]

with $\hat{p}_{f}(a,A)=n_{f,A}^{-1}\sum_{i=1}^{n_{f,A}}\mathds{1}\left(\,\mathbf{a}_{i,f,A}=a\,\right)$ for any $a\in A\cup\{o\}$ .

Given the model of interest $\mathrm{L}$ and the estimator of $P_{f}$ , $\hat{P}_{f}$ , we can compute the estimators of $m_{f}^{\mathrm{L}}$ and $P_{f,\pi}^{\mathrm{L}}$ , $\hat{m}_{f}^{\mathrm{L}}$ and $\hat{P}_{f,\pi}^{\mathrm{L}}$ , using Definitions 7 and 9.393939To compute $\hat{P}_{f,\pi}^{\mathrm{L}}$ , in our empirical application, we minimized the Euclidean distance between $\hat{P}_{f}$ and $\left(\sum_{D\subseteq A}\hat{m}_{f}^{\mathrm{L}}p_{f,\pi}(a,D)\right)_{a\in A,A\in\mathcal{A}}$ subject to $p_{f,\pi}(a,A)\geq 0$ , $\sum_{a\in D}p_{f,\pi}(a,D)=1$ for all $a$ and $A$ , and $p_{f,\pi}(a,D)=0$ for all $D$ and $a\not\in D$ . Given the results of Corollary 1, a natural test statistic is

[TABLE]

where $n=\min_{f}(\sum_{A}n_{f,A})$ is the smallest sample size across frames and $\hat{g}^{\mathrm{L}}=\left(\left(\hat{P}_{f\pi}^{\mathrm{L}}\right)_{f\in F}^{\prime},\left(\hat{m}_{f}^{\mathrm{L}}\right)_{f\in F}^{\prime}\right)^{\prime}$ .

Let $\hat{g}^{\mathrm{L},*}_{l}$ , $l=1,\dots,L$ , be bootstrap replications of $\hat{g}^{\mathrm{L}}$ . Let $\tau_{n}\geq 0$ be a tuning parameter and $\iota$ be a vector of ones of dimension $d$ .404040In our empirical application we conducted tests for different values of $\tau_{n}$ (e.g., $\tau_{n}=\sqrt{\dfrac{\log(\min_{f,A}n_{f,A})}{\min_{f,A}n_{f,A}}}$ following Kitamura and Stoye (2018), and $\tau_{n}=0$ ). The results are qualitatively the same. To compute the critical values of $\mathrm{T}_{n}$ we follow the bootstrap procedure proposed in Kitamura and Stoye (2018):

(i)

Compute $\hat{\eta}_{\tau_{n}}=Gv_{\tau_{n}}$ , where $v_{\tau_{n}}$ solves

[TABLE] 2. (ii)

Compute the bootstrap test statistic

[TABLE] 3. (iii)

Use the empirical distribution of the bootstrap statistic to compute critical values of $\mathrm{T}_{n}$ .

For a given significance level $\alpha\in(0,1/2)$ , the decision rule for the test is “reject the null hypothesis if $\mathrm{T}_{n}>\hat{c}_{1-\alpha}$ ”, where $\hat{c}_{1-\alpha}$ is an $(1-\alpha)$ -quantile of the empirical distribution of the bootstrap statistic.

We would like to conclude this section by observing that we can test the model conditional on additional observables (e.g., gender, income brackets, and education level). For discrete (or discretized) covariates one just needs to perform the test for a subgroup of the population.

5.2.

Survival Race the $\mathrm{L\text{-}B}$ -rule v.s. $\mathrm{RUM}$ : Stability of Preferences

Without frame variation, many models of consideration are empirically indistinguishable from $\mathrm{RUM}$ . For instance, if a dataset is consistent with $\mathrm{EBA}$ or $\mathrm{MM}$ , for a fixed frame, then it is also consistent with $\mathrm{RUM}$ . However, these models and $\mathrm{RUM}$ will typically recover a distinct distribution of preferences. Varying frames, we can test whether the distribution of preferences remains the same across frames.

$\mathrm{L\text{-}B}$ -model assumes that the distribution of preferences in the population is independent of the consideration rule. In our experiment, the choice sets faced by any subject are exactly the same for the three consideration cost treatments. Given our pay-at-random incentives scheme, choices from each choice set can be considered as i.i.d. draws from the underlying random utility distribution under the null hypothesis of stochastic rationality. Therefore, the independence assumption together with our experimental design imply that if one of the $\mathrm{L\text{-}B}$ theories describes the behavior of the high-cost treatment, it must also describe the behavior of the low-cost treatment. That is, if the independence assumption holds, then the distribution of preferences, $\pi$ , should be invariant to changes in consideration costs for theories that we cannot reject. We check the validity of the different models of interest under this preference stability restriction.

We apply the procedure described in Section 5.1 to test whether the $\mathrm{RUM}$ , $\mathrm{LA}$ , and $\mathrm{EBA}$ models can explain the data with the restriction that the distribution of preferences remains frame-independent (i.e., consequentialism). The results of testing are presented in Table 2. In this table, we report the values of the test statistic and the corresponding p-values coming from the bootstrap distribution ( $1000$ bootstrap replications for every test statistic were conducted) for different models.414141The p-value is interpreted as the probability of observing a realization of the test statistic that is above the one that is actually observed due to sample variability, if the null hypothesis is indeed correct. Then, the smaller the p-value is, the more evidence the researcher has to reject the hypothesis of the validity of a given model. First, we strongly reject $\mathrm{RUM}$ at any reasonable significance level. In other words, for $\mathrm{RUM}$ we reject the hypothesis that the same distribution of preferences can rationalize behavior across consideration cost treatments. In contrast, we cannot reject the $\mathrm{LA}$ model at any standard significance level. In addition, we reject the hypothesis that $\mathrm{EBA}$ explains the population behavior under preference stability. Note that we can discriminate between $\mathrm{RUM}$ and $\mathrm{EBA}$ because of variation in frames– $\mathrm{EBA}$ is more general than $\mathrm{RUM}$ because it allows for flexible attention per frame. So the rejection of $\mathrm{EBA}$ with stable preferences does not follow from the rejection of $\mathrm{RUM}$ . Taken together, our results show that our experimental subjects behave as if they are maximizing their preferences given a consideration set that follows the $\mathrm{LA}$ restriction.

5.3.

Discussion

Our findings strongly support the hypothesis that the population behaves as if it is consistent with the $\mathrm{LA}$ model of limited consideration and has a stable distribution of preferences across frames.424242We also rejected the hypothesis of whether the $\mathrm{LA}$ model with homogeneous preferences can explain the data. See Appendix F. All frame effects observed in our descriptive analysis are fully captured by the variation in the random consideration rule that changes conditional on the frame. In contrast, the traditional $\mathrm{RUM}$ fails to describe the population behavior. To confirm that our testing procedure has power against $\mathrm{LA}$ , in Appendix E, we access the performance of our procedure using Monte Carlos simulation. In particular, we show that our test can reject the false null hypothesis of data being consistent with the $\mathrm{LA}$ model with high frequency in finite samples that are comparable to our experiment.

We highlight that our analysis cannot exclude the possibility that other models of behavior could also explain the population behavior in our experiment. We have only established that the population behavior is as if it is consistent with a $\mathrm{L\text{-}B}$ -rule.

Although we do not impose any restrictions on preferences, e.g expected utility, our results relate to the work of Freeman et al. (2018). They provide an alternative mechanism for the selection of a riskless lottery (default) over dominant risky choices from pairwise comparisons, when binary choice sets are presented as lists. They propose a theoretical explanation of the choice of the riskless choice with a model of reference dependence. The class of reference dependence models used by these authors is a special case of utility maximization. Recall that we find evidence against $\mathrm{RUM}$ in our experiment, thus, ruling out Freeman et al. (2018) mechanism for our environment with costly consideration. In addition, in our experimental design, subjects are not required to choose from lists nor are restricted to pairwise comparisons. For an extended discussion of the role of misperception see Appendix C.3.

We have maintained the assumption that the default alternative is also the worst alternative for both the $\mathrm{LA}$ and $\mathrm{EBA}$ models. However, $\mathrm{RUM}$ allows the default to be ranked arbitrarily. Hence, the main findings that $\mathrm{LA}$ explains the dataset and $\mathrm{RUM}$ fails to do the same are robust to this assumption. We leave it as an open question whether $\mathrm{EBA}$ can explain this dataset if this assumption is relaxed. Nevertheless, we believe that this assumption is reasonable in our experimental setup.

We have done our empirical analysis without conditioning on observable heterogeneity (e.g., age or gender). Attention and preferences may differ across different demographic groups. Methodologically, our tools can be applied after conditioning on observable heterogeneity, as explained in Kitamura and Stoye (2018). The study of consideration set rules and their relation to demographics is beyond the scope of this paper.

It is noteworthy that if individuals have convex risk preferences to mix between lotteries, repetition of discretized choice tasks may make it hard to estimate risk preferences. Feldman and Rehbeck (2020) show that subjects who mix between lotteries in convex budgets sets are more likely to randomize choices in a repeated discretized task. Our experiment that uses different cost-treatments and randomizes the order of lotteries in presentation may alleviate such concerns compared to previous experiments in which the same set of lotteries in a fixed order was repeatedly presented to subjects.434343We thank an anonymous referee for pointing out this issue.

We finish this section by discussing our model and our findings in relation to Rational Inattention (RI) models. Caplin et al. (2019) shows that rational inattentive DMs form (deterministic) consideration sets. Generally, RI primitives cannot be point-identified with standard stochastic choice datasets. Nonetheless, RI models may still have testable implications in standard stochastic choice datasets. In Appendix C.4, we show that a representative RI DM is compatible with deterministic consideration sets (i.e, the presence of zero probability of choice), which is not supported in our data. The case of a population of heterogeneous rational inattentive DMs and the aggregation of such behavior in the population is left for future research.

6. Conclusion

We have designed and implemented a novel experiment with a large sample that allowed us to statistically discern among competing models of population behavior. By exogenously varying choice sets and the frames induced by the cost of considering alternatives, we can disentangle two sources of stochastic behavior: limited consideration and preference heterogeneity. We use this novel dataset to test $\mathrm{RUM}$ and two models of limited consideration, $\mathrm{LA}$ and $\mathrm{EBA}$ .

These models provide testable implications on choices that uniquely identify the stochastic consideration set rule from data. By calibrating consideration given the theory, we show that testing the $\mathrm{L\text{-}B}$ -model can be cast into Kitamura and Stoye (2018) framework for testing $\mathrm{RUM}$ . That is, we show that there exists a stochastic rule (computed from data) that is $\mathrm{RUM}$ if and only if observed choices are generated by a population of individuals consistent with the $\mathrm{L\text{-}B}$ -model.

We provide evidence against classical $\mathrm{RUM}$ , since consideration costs are binding for some individuals in the population. In contrast, we find support for the $\mathrm{LA}$ model with heterogeneous preferences. Crucially, we cannot reject that the distribution of preferences implied by $\mathrm{LA}$ is the same across all attention frames. This means that once we disentangled attention and preferences under $\mathrm{LA}$ , the recovered distribution of preferences does not change with the frame.

Appendix A Proofs

A.1.

**Proof of Lemma **1

We define $m_{A}(\{a\})=p(a,A)$ , and $m_{A}(D)=0$ for all $D\subseteq A$ , $D\neq\{a\}$ . Let $\tilde{\pi}\in\Delta(R(X))$ be the uniform distribution. The pair $((m_{A})_{A\in\mathcal{M}},\tilde{\pi})$ is a $\mathrm{B}$ -rule. We now prove that it generates any data $P$ . By definition if $P$ can be generated by a $\mathrm{B}$ -rule, then we have that

[TABLE]

for all $A$ and $a\in A$ . Rearranging and replacing the choice of $\tilde{\pi}$ in the above equation we get that

[TABLE]

For given $\succ$ and $m_{A}(\{a\})=p(a,A)$ , we have

[TABLE]

because $\succ$ includes the diagonal $a\succ a$ for all $a\in X$ .

This implies that

[TABLE]

given that

[TABLE]

A.2.

**Proof of Theorem **1

(i) implies (ii). A complete stochastic choice rule $P$ is a $\mathrm{B}$ -rule if there exists a pair $(m,\pi)$ such that

[TABLE]

for all $a\in X$ and $A\in\mathcal{A}$ , where we exchanged the summation operator with respect to the consideration sets and the linear orders exploiting independence.

Note that we can write the probability of the default alternative as $p(o,A)=1-\sum_{a\in A}p(a,A)$ . This implies that

[TABLE]

where the summation operator with respect to the items $a\in A$ can be exchanged with the summation over consideration sets. This is possible because the latter summation does not depend on the items $a\in A$ .

Now, we notice that $\sum_{a\in A}\sum_{\succ\in R(X)}\pi(\succ)\mathds{1}\left(\,a\succ b,\>\forall\>b\in D\,\right)=1$ for all $D\subseteq A$ . This implies that the default probability does not depend on the distribution of preferences and can be written in terms of the cumulative distribution of the consideration set distribution:

[TABLE]

We let the capacity $\varphi^{*}:2^{X}\to[0,1]$ be defined by $\varphi^{*}(A)=p(o,A)$ .

The fact that $\eta=\eta^{\mathrm{L}}$ under the correct specification of the link function follows from our monotonicity assumptions and the existence of a unique Mobius inverse of the mapping $v(\cdot)=\sum_{C\subseteq\cdot}\eta(C)$ (Shafer, 1976, Chateauneuf and Jaffray, 1989). We provide specific derivations for each of the models of interest in this paper, to connect them with the existing literature, but they follow directly from the general $\eta^{\mathrm{L}}$ formula.

For given $\mathrm{L}\in\{\mathrm{LA},\mathrm{MM},\mathrm{EBA},\mathrm{FC}\}$ and $P$ :

•

If $m\in\mathcal{M^{\mathrm{LA}}}$ , then $m_{A}(D)=\frac{\eta(D)}{\sum_{C\subseteq A}\eta(C)}$ for some $\eta\in\Delta(2^{X})\cap{\mathds{R}}_{++}$ .

This means that $\frac{\varphi^{*}(X)}{\varphi^{*}(A)}=\sum_{D\subseteq A}\eta(D)$ . Then by Shafer (1976) it must be that

[TABLE]

•

If $m\in\mathcal{M}^{\mathrm{MM}}$ , then $m_{A}(D)=\frac{\eta(D)}{\sum_{C\subseteq A}\eta(C)}$ for some $\eta\in\Delta(2^{X})\cap{\mathds{R}}_{++}$ with,

[TABLE]

and $\gamma:X\rightarrow(0,1)$ . This implies by simple computation that

[TABLE]

for some $A\in\mathcal{A}$ that contains $a$ ;

•

If $m\in\mathcal{M}^{\mathrm{EBA}}$ , then $m_{A}(D)=\sum_{C:C\cap A=D}\eta(C)$ for some $\eta\in\Delta(2^{X})$ . Then

[TABLE]

Using Shafer (1976) and Chateauneuf and Jaffray (1989) we conclude that

[TABLE]

•

If $m$ is $\mathrm{FC}$ , then obviously $m_{A}(D)=\mathds{1}\left(\,A=D\,\right)$ .

To establish that $m=m^{\mathrm{L}}$ for given $\mathrm{L}\in\{\mathrm{LA},\mathrm{MM},\mathrm{EBA},\mathrm{FC}\}$ and $P$ , we exploit the uniqueness of $m$ , which is a consequence of the invertibility of the Mobius transform and the completeness of $P$ . In particular, if $(m,\pi)$ and $(m^{\prime},\pi)$ represent the same $P$ then it must be that $m^{\prime}=m$ for the cases of $\mathrm{L}\in\{\mathrm{LA},\mathrm{MM},\mathrm{EBA},\mathrm{FC}\}$ . To see that this is true recall that if $P$ is a $\mathrm{L\text{-}B}$ -rule with $(m,\pi)$ , then $1-\sum_{D\subseteq A,D\neq\emptyset}m_{A}(D)=\varphi^{*}(A)$ . This is exactly the same for the case where there is homogeneity in the preferences such that there is a linear order $\succ\in R(X)$ such that $\pi(\succ)=1$ . Since this equivalence does not depend on the distribution of preferences and due to the completeness of the dataset, we can use this fact to apply known results from the consideration set literature regarding the uniqueness of $m$ .

Now, by the Mobius inverse, it follows that

[TABLE]

for all $D\in 2^{X}$ . In particular,

•

By Theorem $3.1$ in Brady and Rehbeck (2016), it must be that $m$ is uniquely identified by

[TABLE]

where $\eta\in\Delta(2^{X})\cap{\mathds{R}}_{++}$ follows from the requirement that $\sum_{B\subseteq D}(-1)^{\left\lvert D\setminus B\right\rvert}\frac{p(o,X)}{p(o,B)}>0$ for all $D\in 2^{X}$ .

•

Given $\gamma^{\mathrm{MM}}(a)=1-p(o,{a})\in(0,1)$ for all $a\in X$ (which is well-defined by the completeness of $P$ ) and $\eta^{\mathrm{MM}}(D)=\prod_{a\in X\setminus D}\left(1-\gamma^{\mathrm{MM}}(a)\right)\prod_{b\in D}\gamma^{\mathrm{MM}}(b)$ , it follows that $m$ is uniquely identified by

[TABLE]

for all $A\subseteq D$ . Note that $\prod_{b\in\emptyset}\gamma^{mm}(b)=1$ by convention. Also observe that uniqueness follows from Theorem $3.3$ in Brady and Rehbeck (2016) since the $\mathrm{MM}$ restriction is a special case of the $\mathrm{LA}$ restriction.

•

Given $\eta^{\mathrm{EBA}}(D)=\sum_{A\subseteq D:D\in\mathcal{A}}(-1)^{\left\lvert D\setminus{A}\right\rvert}(1-p(o,X\setminus{A}))\geq 0$ it follows by Theorem $1$ in Aguiar (2017) that $m$ is uniquely identified by

[TABLE]

for all $D\subseteq A$ , where $D\neq\emptyset$ and $m_{A}(\emptyset)=1-\sum_{D\subseteq A,D\neq\emptyset}m_{A}(D)$ .

•

The case of $\mathrm{FC}$ is trivial.

A.3.

**Proof of Theorem **2

(i) implies (ii). If $P$ is a $\mathrm{L\text{-}B}$ -rule then by Theorem 1, under conditions (i) and (ii), it must be that

[TABLE]

where $p_{m,\pi}(a,A)=\sum_{D\subseteq A}m_{A}^{\mathrm{L}}(D)[\sum_{\succ\in R(X)}\pi(\succ)\mathds{1}\left(\,a\succ b\forall b\in D\,\right)]$ . Following the recursive formula, we can show that

[TABLE]

This implies that $P^{\mathrm{L}}$ is a $\mathrm{FC}\text{-}\mathrm{B}$ -rule.

(ii) implies (i). Under conditions (i) and (ii), the fact that

[TABLE]

implies that for all $A\in\mathcal{A}$ and all $a\in A$ ,

[TABLE]

If $P^{\mathrm{L}}$ is a $\mathrm{FC}\text{-}\mathrm{B}$ -rule, then it implies that there exists $\pi\in\Delta(R(X))$ such that

[TABLE]

Hence, $P$ is a $\mathrm{L\text{-}B}$ -rule. In fact, for all $A\in\mathcal{A}$ and all $a\in A$ , it must be that the pair $(m^{\mathrm{L}},\pi)$ generates the dataset $P$ :

[TABLE]

A.4.

**Proof of Theorem **3

We first prove that if $P$ is described by $(m,\pi)$ and $(m^{\prime},\pi^{\prime})$ , then it must be that $m=m^{\prime}$ . This follows from Chateauneuf and Jaffray (1989). In particular, Brady and Rehbeck (2016) shows the identification results for $\mathrm{L}=\mathrm{LA}$ , while Aguiar (2017) provides identification results for $\mathrm{L}=\mathrm{EBA}$ . For $\mathrm{L}=\mathrm{MM}$ the result holds trivially.

Fixing $m$ , if $P$ is described by both $(m,\pi)$ and $(m,\pi^{\prime})$ , then

[TABLE]

and

[TABLE]

for all $a\in A$ and nonempty $A\subseteq X$ , which follows from Definition 9. By condition (ii), $m_{A}^{\mathrm{L}}(A)>0$ and using the recursive definitions above for binary sets, we can see that $p_{\pi}^{\mathrm{L}}(a,\{a,b\})=p_{\pi^{\prime}}^{\mathrm{L}}(a,\{a,b\})$ for any $a,b\in X$ . For a fixed $m$ the recursive formula leads to the equivalence $p^{\mathrm{L}}_{\pi^{\prime}}=p^{\mathrm{L}}_{\pi}$ .

Appendix B Sensitivity Analysis for the Default

One key assumption in our setup is that the outside option is only picked under full consideration if the choice set only contains the outside option. In our notation, we write this as $p_{\pi}(o,\emptyset)=1$ and $p_{\pi}(o,A)=0$ for all $A\neq\emptyset$ . In this section, we relax this assumption to allow the probability of choosing the default to satisfy $p_{\pi}(o,\emptyset)=1$ and $p_{\pi}(o,A)=\frac{e}{\left\lvert A\right\rvert}$ with $e\in[0,1)$ for all $A\neq\emptyset$ . The sensitivity parameter $e$ is interpreted as the fraction of DMs that choose the dominated default even when there are other available alternatives. This is a violation of the assumption that the default is the worst alternative. Then this implies that under the null of consistency with the $\mathrm{B}$ -rule the probability of choosing the default is:

[TABLE]

This assumption is compatible with $\mathrm{RUM}$ and implies that the probability of $o$ being the best alternative in all linear orders is constant across menus under full consideration.444444Indeed, this assumption allows the default to be chosen when compared to other alternatives. This is achieved by putting a mass (equal to $e/\left\lvert A\right\rvert$ ) on the event that $o$ is the first among all alternatives in a given menu. This is a restriction on the distribution of preferences. Notice, that under this assumption we can calibrate $m_{A}(\emptyset)$ for all $A\in\mathcal{A}$ :

[TABLE]

Then for a given sensitivity parameter $e\in[0,1)$ , we can calibrate the empirical attention-index $\eta^{\mathrm{L}}$ without any changes. Next, we can compute $p_{\pi}$ given the calibrated $m^{\mathrm{L}},$ for $A\cup\{o\}$ , and we can test $\mathrm{RUM}$ here using the tools we have developed. In our current empirical results, since the $\mathrm{LA}$ model passes for $e=0$ , it is unnecessary to do this sensitivity analysis.

Appendix C Comparison with Models of Stochastic Choice

In this appendix we analyze the connection between the three consideration-mediated choice theories discussed in this paper and models that allow for stochastic behavior exclusively in preferences or in consideration.

C.1.

Comparison with RUM

As explained previously, randomness arising from limited consideration as in $\mathrm{EBA}$ and $\mathrm{MM}$ can be rationalized under the umbrella of random utility for a fixed frame. However, $\mathrm{LA}$ allows for behavior that is inconsistent with regularity. Therefore $\mathrm{LA}$ is not nested in $\mathrm{RUM}$ . By construction our model $\mathrm{L\text{-}B}$ generalizes $\mathrm{FC}$ by allowing for independent variation in choices due to limited consideration. In particular, $\mathrm{L\text{-}B}$ is $\mathrm{RUM}$ defined over $X$ (what we call, equivalently, $\mathrm{FC}$ ) when the stochastic choice rule is such that $m_{A}(D)=\mathds{1}\left(\,D=A\,\right)$ . We call this model $\mathrm{FC}$ .

Moreover, the $\mathrm{L\text{-}B}$ model is more general than $\mathrm{RUM}$ . This follows from the analysis in previous sections. In particular, fixing preferences, $\pi(\succ_{i})=\mathds{1}\left(\,\succ_{i}=\succ\,\right)$ for $\succ_{i}\in R(X)$ , $\mathrm{L\text{-}B}$ with $\mathrm{L}=\mathrm{LA}$ reduces to original $\mathrm{LA}$ model by Brady and Rehbeck (2016), and therefore potentially inconsistent with $\mathrm{RUM}$ . Of course, when we add frame variation we can distinguish between $\mathrm{EBA}$ and $\mathrm{RUM}$ .

C.2.

Comparison to RAM

Cattaneo et al. (2020) extends many theories of consideration by proposing the Random Attention Model (RAM). The authors allow for random consideration maps in the context of limited attention models. RAM abstracts away from the particular consideration-set-formation rule by considering a class of nonparametric random attention rules. The authors acknowledge that RAM is best suited for eliciting information about preference ordering of a single decision-making unit when her choices are observed repeatedly, which justifies the preference homogeneity assumption in their setting.

Many of the canonical models of limited attention proposed in the literature satisfy the Monotonic Attention property of Cattaneo et al. (2020). For instance, RAM nests $\mathrm{LA}$ , $\mathrm{MM}$ and $\mathrm{EBA}$ without preference heterogeneity among other salient models of consideration sets. Additionally, RAM is a strict generalization of $\mathrm{RUM}$ . However, our $\mathrm{L\text{-}B}$ is not nested in RAM, see Cattaneo et al. (2020) for a complete description of its relationship to the literature.

Here we show that, in the presence of preference heterogeneity RAM may fail to rationalize behavior that can be explained by $\mathrm{L\text{-}B}$ . First, we formally define the restrictions imposed by RAM.

RAM imposes a * monotonic attention * restriction on consideration rules: the probability of paying attention to a particular subset does not decrease when the total number of possible consideration sets decreases. Formally,

Definition 12 (Monotonic Attention).

For any $a\in A\setminus D$ , $m_{A}(D)\leq m_{A\setminus\{a\}}(D)$

Moreover, Cattaneo et al. (2020) provides a characterization of the model in terms of the revealed preference information inferred from data. Formally,

Definition 13 (Revealed Preference, RAM).

Let $p$ be a RAM. Define $P_{R}$ as the transitive closure of $P$ defined as

[TABLE]

Then $a$ is revealed preferred to $b$ of and only if $a\leavevmode\nobreak\ P_{R}\leavevmode\nobreak\ b$ .

Then, a choice rule has a RAM representation if and only if $P_{R}$ has no cycles. The following example of a $\mathrm{L\text{-}B}$ -rule, which results from a $m\in\mathcal{M}^{\mathrm{LA}}$ for two linear orders $\succ_{1}$ and $\succ_{2}$ with $\pi(\succ_{i})=0.5$ with $i=1,2$ , cannot be generated by RAM.

Example 5 (RAM violation).

Let $X=\{a,b,c\}$ and consider a $\mathrm{LA}$ model for the random consideration set probability measure with $\eta(D)$ given as in Table 3. Moreover, consider two preference relations $\succ_{1}$ such that $a\succ_{1}b\succ_{1}c$ , and $\succ_{2}$ such that $c\succ_{2}b\succ_{2}a$ .454545In our experiment, this preference heterogeneity can be explained by heterogeneity in risk aversion. For example, let $a\equiv l_{4}$ , $b\equiv l_{3}$ , $c\equiv l_{2}$ , and assume that DMs are expected-utility-maximizers with CRRA Bernoulli utilities. Then $a\succ b\succ c$ for individuals that are risk-neutrals ( $\sigma=0$ ), while $c\succ b\succ a$ for risk averse individuals ( $\sigma>0.5$ ). Holt and Laury (2002) finds that these types are common in their experiment across payment schemes. The resulting probabilistic choice rule is generated by a $\mathrm{LA}\text{-}\mathrm{B}$ by construction. However, it cannot be rationalized by RAM since both $aPb$ and $bPa$ (i.e., $p(a,\{a,b,c\})>p(a,\{a,c\})$ and $p(b,\{a,b,c\})>p(b,\{b,c\})$ ). This means that a $\mathrm{LA}\text{-}\mathrm{B}$ -rule allows cycles of the revealed preference relation $P$ , which is ruled out by RAM.

C.3.

Consideration Cost and Imperfect Perception

One possible concern with our design is that DMs consider an alternative but misperceives the attributes (i.e., computes the wrong utility). We must point out that this concern applies broadly to any experimental design in which subjects have a nontrivial cognitive task. The following analysis assumes that the consideration cost is fixed. First, we need some preliminaries. For a given distribution of preferences $\pi\in\Delta(R(X))$ , with perfect perception, there exists a random utility array $\mathbf{u}=(\mathbf{u}_{a})_{a\in A}$ supported on ${\mathds{R}}^{\left\lvert A\right\rvert}$ such that for a given menu of alternatives $A\in\mathcal{A}$ :

[TABLE]

Now, miss-perception of any alternative $a\in A$ can be represented by another (possibly wrong) random utility variable $\mathbf{w}_{a}$ supported on the reals. We let $\mathbf{w}=(\mathbf{w}_{a})_{a\in A}$ be the array of such random variables. This random variable represents the subjective value that the DMs assigns to alternative $a$ given her own perception of the item. Hence, $\mathbf{w}$ and $\mathbf{u}$ may be different (even when they may be correlated). Without loss of generality, we can define miss-perception as:

[TABLE]

for all $a\in A$ (and $\mathbf{e}=(\mathbf{e}_{a})_{a\in A}$ ).464646Note that in our experimental design, menus are randomly assigned to a DM. In addition, the presentation of each alternative remains the same across menu, conditional on the cost of consideration. Then, it must be that the distribution of miss-perception (by-design) is the same across menus.

Under the assumption of independence of preference and attention. The only implication of miss-perception is that subjects’ behavior will be governed not by $\pi$ but rather by a different distribution of preferences $\pi_{e}$ such that:

[TABLE]

In other words, the population of DMs behavior captured by $P$ will still be represented by a $\mathrm{L\text{-}B}$ model with $(\pi_{e},m)$ instead of the true $(\pi,m)$ . This means that our design is robust to any arbitrary miss-perception error, in terms of the validity of our conclusions about how good are the different models to describe the population.

The only possible problem induced by miss-perception of alternatives is that we may lose the capacity to identify the true distribution of preferences. This possibility again is unavoidable in any experimental design that has a cognitive task. Nonetheless, this possibility is testable in our framework. In particular, if miss-perception exists, it must depend on the cognitive cost. Hence, we have the triple $(\mathbf{e}_{H},\mathbf{e}_{M},\mathbf{e}_{L})$ that represents the miss-perception random array for the high, medium and low cost, respectively. Then, the distribution of preferences for any $\mathrm{L\text{-}B}$ model must not be stable across attention treatments with corresponding $(\pi_{e_{H}},\pi_{e_{M}},\pi_{e_{L}})$ distribution of preferences (that differ among costs).

However, we cannot reject the null hypothesis that $\mathrm{LA}$ has a stable distribution of preferences among the different cost distributions (i.e., $\pi_{e_{H}}=\pi_{e_{M}}=\pi_{e_{L}}=\pi$ ). In that sense, there is no evidence that miss-perception is important in our design.

C.4.

Relation with Rational Inattention Models

Rational inattention (RI) models have recently gained a lot of interest to model situations when choice is hard. However, RI models usually need very rich datasets to be indentified/tested. That is, generally they cannot be identified with standard stochastic choice datasets. In that sense, we cannot do a full comparison between RI models and our approach, since the dataset requirements are different. However, we can derive some implications of RI behavior for our dataset.

RI is a model for individual behavior. To the best of our knowledge the aggregate implications of this model are unknown. Hence we will focus on comparing our approach to a case of a representative RI behavior. The problem of the representative RI DM is to choose the best possible alternative from a choice set. She has a prior $\mu$ over the true value of alternatives, $V=(v_{k})_{k\in X\cup\{o\}}$ , with $\mu\in\Delta(V)$ . In response to the information structure, the RI DM chooses her optimal information to adquire and optimal action. We focus here on the subclass of RI problems with an additive cost of perception. The result of this problem is a true-value or state dependent stochastic choice rule $p_{v}(\cdot,A)\in\Delta(A\cup\{o\})$ , defined as:

[TABLE]

For the specification of $\kappa$ , we focus on the generalized entropy proposed in Fosgerau et al. (2017), which generalizes widely used entropic cost. Fosgerau et al. (2017) shows that this state-dependent stochastic choice is observationally equivalent to an additive random utility choice rule conditional on the support. That is, if $p_{v}(\cdot,A)\in\Delta(A\cup\{o\})$ (positive probability of choice), then $p_{v}(\cdot,A)$ is a random utility rule. Even when the underlying utility is fixed (and equal to $v$ without loss of generality), there is randomness in choice due to costly information acquisition. The state-dependent stochastic choice only differs from $\mathrm{RUM}$ when there are items in the choice set that are never chosen. Therefore, the RI DM is compatible with deterministic consideration sets. However, in our experiment we do not observe any element chosen with zero probability. In fact, the lowest probability of choice is $6$ percent across all alternatives in $X\cup\{o\}$ and across all choice sets.

We have to aggregate across states to derive testable implications for the representative RI DM for our dataset. This is because in our setup, the experimenter does not know ex-ante the true value of alternatives. Preferences over lotteries (when there is not first-order stochastic dominance ordering among them) is unknown before choice. This is an important difference between our experiment and RI experimental literature, since they generally rely on enhanced datasets. We focus on collecting datasets that replicate standard stochastic choice data.

Using the fact that the sum of random utility rules is also a random utility rule, we notice that the marginal probability of choosing across different states is just the sum over the likelihood of this states (or the distribution of the true preferences). Then, if $P$ admits a representative RI DM:

[TABLE]

where $\rho\in\Delta(V)$ is the objective probability of the unobserved states.

Lemma 3.

If $p_{v}(a,A)>0$ for all $a\in A\cup\{o\}$ , and all $v\in V$ , it follows that if $P$ admits a representative RI DM then, $P$ also admits a $\mathrm{RUM}$ representation.

The proof of this lemma follows from Fosgerau et al. (2017) and Aguiar et al. (2016) that showed that the weighted sum of $\mathrm{RUM}$ is also $\mathrm{RUM}$ . The case in which one allows heterogeneity in discrete consideration sets, induced by RI, is difficult and left for future research.

Appendix D Experiment

D.1.

Sample

The primitive for the considered models is the estimated stochastic choice rule $\hat{p}_{f}(a,A)$ for $f\in\{\text{H, M, L}\}$ . Therefore, for a fixed level of the cost $f$ , the minimal required sample size was calculated to be proportional to the cardinality of the choice set. To maximize the number of observations for a given set of individuals, some individuals faced two decision tasks. To prevent possible learning, these subjects faced disjoint choice sets. That is, every subject faced either the full choice set $X\cup\{o\}$ or two choice sets that only had the outside option in common. Therefore, because of random assignment, in our experiment

(i)

171 subjects faced only the whole choice set (the targeted number is 180); 2. (ii)

757 subjects faced pairs of disjoint choice sets: the set of size 4 and the set of size 1 (the targeted number is 750); 3. (iii)

1207 subjects faced pairs of disjoint choice sets: the set of size 3 and the set of size 2 (the targeted number is 1200).

This implies a total of 2135 subjects (the targeted number is 2130) for a total 4099 choices (the targeted number is 4080). Additionally, demographic data and preferences over binary comparison of lotteries were asked and incentivized. The effective number of observations per alternative/choice set/cost is summarized in Table 4.

Appendix E Performance of the Test

In this section we study the performance of our test in terms of statistical power. We are going to test the null hypothesis of $\mathrm{LA}$ - $\mathrm{B}$ when the true choice process presents choice overload. We consider behavior arising from a mixed population. A fraction $\lambda\in[0,1]$ of the population is consistent with $\mathrm{MM}\text{-}\mathrm{B}$ with $\gamma(x)=1/2$ for all $x\in X$ and preferences consistent with expected utility maximization. The remaining fraction, $\lambda$ , follows simple heuristics such the DM chooses outside option with probability proportional to the cardinality of the set. If she decides to pay attention to the menu, then she chooses uniformly at random from it. The process is then consistent with the following stochastic choice rule

[TABLE]

where $p^{\mathrm{MM}\text{-}\mathrm{B}}(a,A)$ is consistent with $\mathrm{MM}\text{-}\mathrm{B}$ with $(m^{\mathrm{MM}},\pi)$ and $\pi(\succ)=1/120$ for all $\succ\in R(X)$ ; and

[TABLE]

The assumed process implies that a fraction $1-\lambda$ of the population exhibits choice overload.484848This process may intuitively arise when a DM that faced with a choice set only knows the size of the choice set and the alternatives in the grand set $X$ . Knowing about the alternatives implies paying a cost $c$ per alternative. Assume that preferences over information are modelled by a willingness to pay attention variable, $w$ , that is distributed uniformly in $[0,1]$ . Then, given a choice set realization, after knowing $\left\lvert A\right\rvert$ DM $i$ decides to pay attention to choice set $A$ if $w_{i}>\left\lvert A\right\rvert\times c$ . This implies that the DM pays attention and decide in the interior of the set with probability $\sum_{a\in A}p(a,A)=1-c\left\lvert A\right\rvert$ ; and $p(o,A)=c\left\lvert A\right\rvert$ .

As the proportion of the population that exhibits choice overload increases so should increase the probability of rejecting the null that population behavior is generated by $\mathrm{L\text{-}B}$ . On the other extreme, when $\lambda=1$ we should not reject the model. In particular, for any $\lambda<32/39$ the process defined by equation (1) exhibits choice overload. However, for high values of $\lambda$ the magnitude of this effect may not be significant to reject $\mathrm{L\text{-}B}$ .

Table 5 presents the results for power simulations for sample size $4000$ and $\lambda\in\{0.25,0.50\}$ . For 500 replications, the table displays the proportion of simulations that are rejected at the $10$ percent and $5$ percent significance levels. As expected, the fraction of rejections is bigger for smaller values of $\lambda$ . For $\lambda=0.25$ the rejection rate is $100$ percent. We observe that at comparable sample size to our experiment the mixed process is rejected with power close to 1 when the choice overload fraction of DMs is moderate.

Appendix F The $\mathrm{L}$ -model with Homogeneous Preferences

In our setup, given an attention rule $\mathrm{L}$ (e.g., $\mathrm{LA}$ or $\mathrm{MM}$ ) and a given frame, we can recover the underlying full consideration probabilistic choice rule $p^{\mathrm{L}}_{\pi}$ . Under the null hypothesis that our experimental dataset can be generated by a $\mathrm{L}$ -rule with a strict preference relation over $X\cup\{o\}$ (such that $o$ is the worst alternative), it must be that $p^{\mathrm{L}}_{\pi}$ is degenerate (i.e., $p^{\mathrm{L}}_{\pi}(a,A)\in\{0,1\}$ for all $A$ and $a\in A$ ). In particular, if we reject that $p^{\mathrm{L}}_{\pi}(a_{1},\{a_{1},a_{2}\})\in\{0,1\}$ for some binary menu $\{a_{1},a_{2}\}$ and for some attention cost, then we have to reject the $\mathrm{L}$ model with homogeneous preferences.

Given that the only model with heterogeneous preferences that was not rejected in our experiment is $\mathrm{LA}$ , to show the importance of preference heterogeneity, we tested whether the $\mathrm{LA}$ model with homogeneous preferences can explain the data. We took menu $\{1,3\}$ and the high cost frame and computed the implied by the $\mathrm{LA}$ model the full consideration probability $\hat{p}^{\mathrm{LA}}_{\pi}(1,\{1,3\})$ . If the data can be explained by the $\mathrm{LA}$ model with homogeneous preferences, then $\hat{p}^{\mathrm{LA}}_{\pi}(1,\{1,3\})$ should converge in probability to either [math] ( $3\succ 1$ with probability 1) or $1$ ( $1\succ 3$ with probability 1). We tested two hypotheses: (i) ${p}_{\pi}(1,\{1,3\})=0$ and (ii) ${p}_{\pi}(1,\{1,3\})=1$ . Both were rejected at the $5$ percent significance level ( $\text{p-value}<10^{-3}$ ). As a result we can conservatively claim that the null hypothesis that our experimental dataset can be explained by the $\mathrm{LA}$ -rule with a single preference relation is rejected at least at the $5$ percent significance level.

Appendix G Experiment Instructions

Bibliography53

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Abaluck and Adams (2021) Abaluck, J. and Adams, A. (2021). What do consumers consider before they choose? identification from asymmetric demand responses. The Quarterly Journal of Economics , Accepted .
2Aguiar (2017) Aguiar, V. H. (2017). Random categorization and bounded rationality. Economics Letters , 159 , 46–52.
3Aguiar et al. (2016) — , Boccardi, M. J. and Dean, M. (2016). Satisficing and stochastic choice. Journal of Economic Theory , 166 , 445 – 482.
4Apesteguia et al. (2018) Apesteguia, J. , Ballester, M. Á. et al. (2018). Separating Predicted Randomness from Noise . Tech. rep.
5Barseghyan et al. (2019 a) Barseghyan, L. , Coughlin, M. , Molinari, F. and Teitelbaum, J. C. (2019 a). Heterogeneous choice sets and preferences. ar Xiv preprint ar Xiv:1907.02337 .
6Barseghyan et al. (2019 b) — , Molinari, F. and Thirkettle, M. (2019 b). Discrete choice under risk with limited consideration. ar Xiv preprint ar Xiv:1902.06629 .
7Bhattacharya et al. (2021) Bhattacharya, M. , Mukherjee, S. and Sonal, R. (2021). Frame-based stochastic choice rule. Journal of Mathematical Economics , 97 , 102553.
8Block and Marschak (1960) Block, H. D. and Marschak, J. (1960). Random orderings and stochastic theories of responses. Contributions to probability and statistics , 2 , 97–132.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Abstract

1. Introduction

2. Environment – Model

2.1.

Definition 1** (Random Behavioral Model, B\mathrm{B}B-rule).**

Lemma 1**.**

2.2.

Definition 2** (Attention-index representation).**

Definition 3**.**

Definition 4** (L-B\mathrm{L\text{-}B}L-B-rule).**

Example 1** (Heterogeneous Categorization).**

Example 2** (Costly Attention Allocation).**

2.3.

Definition 5** (Totally monotone consideration).**

Lemma 2**.**

2.4.

Definition 6** (Calibrated attention-index).**

Definition 7**.**

Example 3**.**

Definition 8** (Well-defined mLm^{\mathrm{L}}mL).**

Theorem 1**.**

Definition 9**.**

Theorem 2**.**

2.5.

Theorem 3** (Identification).**

3. Frames and Testing

3.1.

Example 4** (Stochastic choice rule with frames).**

Definition 10** (Consequentialism).**

Definition 11** (Random Utility Model, RUM\mathrm{RUM}RUM).**

3.2.

Theorem 4**.**

Proof.

Corollary 1**.**

4. The Experiment

4.1.

The default alternative

4.2.

4.3.

4.4.

4.5.

5. Testing Random Consideration Models

5.1.

5.2.

5.3.

6. Conclusion

Appendix A Proofs

A.1.

A.2.

A.3.

A.4.

Appendix B Sensitivity Analysis for the Default

Appendix C Comparison with Models of Stochastic Choice

C.1.

C.2.

Definition 12** (Monotonic Attention).**

Definition 13** (Revealed Preference, RAM).**

Example 5** (RAM violation).**

C.3.

C.4.

Lemma 3**.**

Appendix D Experiment

D.1.

Appendix E Performance of the Test

Appendix F The L\mathrm{L}L-model with Homogeneous Preferences

Appendix G Experiment Instructions

Definition 1 (Random Behavioral Model, $\mathrm{B}$ -rule).

Lemma 1.

Definition 2 (Attention-index representation).

Definition 3.

Definition 4 ( $\mathrm{L\text{-}B}$ -rule).

Example 1 (Heterogeneous Categorization).

Example 2 (Costly Attention Allocation).

Definition 5 (Totally monotone consideration).

Lemma 2.

Definition 6 (Calibrated attention-index).

Definition 7.

Example 3.

Definition 8 (Well-defined $m^{\mathrm{L}}$ ).

Theorem 1.

Definition 9.

Theorem 2.

Theorem 3 (Identification).

Example 4 (Stochastic choice rule with frames).

Definition 10 (Consequentialism).

Definition 11 (Random Utility Model, $\mathrm{RUM}$ ).

Theorem 4.

Corollary 1.

Definition 12 (Monotonic Attention).

Definition 13 (Revealed Preference, RAM).

Example 5 (RAM violation).

Lemma 3.

Appendix F The $\mathrm{L}$ -model with Homogeneous Preferences