Investigating consumers' store-choice behavior via hierarchical variable   selection

Toshiki Sato; Yuichi Takano; Takanobu Nakahara

arXiv:1704.00665·stat.AP·April 4, 2017·Adv. Data Anal. Classif.

Investigating consumers' store-choice behavior via hierarchical variable selection

Toshiki Sato, Yuichi Takano, Takanobu Nakahara

PDF

Open Access

TL;DR

This paper introduces a hierarchical variable selection method using mixed-integer optimization to improve store-choice models, revealing key consumer preferences across store types based on actual data.

Contribution

It develops two novel MIO models for hierarchical variable selection, enhancing model reliability and computational efficiency in consumer store-choice analysis.

Findings

01

Convenience stores are chosen mainly for accessibility.

02

Drugstores are selected for low-price specific products.

03

Grocery supermarkets are preferred for health foods by women with families.

Abstract

This paper is concerned with a store-choice model for investigating consumers' store-choice behavior based on scanner panel data. Our store-choice model enables us to evaluate the effects of the consumer/product attributes not only on the consumer's store choice but also on his/her purchase quantity. Moreover, we adopt a mixed-integer optimization (MIO) approach to selecting the best set of explanatory variables with which to construct a store-choice model. We devise two MIO models for hierarchical variable selection in which the hierarchical structure of product categories is used to enhance the reliability and computational efficiency of the variable selection. We assess the effectiveness of our MIO models through computational experiments on actual scanner panel data. These experiments are focused on the consumer's choice among three types of stores in Japan: convenience stores,…

Tables9

Table 1. Table 1: Store types and chains

Store type	Store chains
Convenience store	7-Eleven, Lawson, FamilyMart, Ministop, Three-F
Drugstore	Matsumotokiyoshi, Sundrug, Tsuruha, Cosmos, Sugi
Grocery supermarket	Aeon, Seven&I, Uny, Daiei, Izumi

Table 2. Table 2: Datasets on store choice

Abbreviation	Positive example ( $y_{i} > 0$ )	Negative example ( $y_{i} < 0$ )	$n$
CvsD	convenience stores	drugstores	225,630
CvsS	convenience stores	grocery supermarkets	252,513
DvsS	drugstores	grocery supermarkets	139,188

Table 3. Table 3: Consumer demographics

Consumer demographic	Explanatory variables
Gender/family	female, married, with children
Age	20s, 30s, 40s, 50s, over 60s
Income	class 2 or lower, classes 3–4, classes 5–6, classes 7–9,
	class 10 or higher
Cross terms	${male, female} \times {married, with children}$
	${male, female} \times age$
	${male, female} \times income class$

Table 4. Table 4: Large and medium categories of products

Large category	Medium categories
Food	Processed foods, fresh foods, confectioneries, beverages,
	other foods
Commodities	Everyday sundries, healthcare, cosmetics, housewares,
	DIY supplies, pet care/food, other commodities
Cultural supplies	Stationery/office supplies, toys, books, musical instruments,
	information equipment, other cultural supplies
Durables	Furniture, car supplies, watches/clocks/glasses,
	optics/photo, home electronics, other durables
Clothes/accessories/sports	Clothes, bedding, accessories, footwear, sports equipment

Table 5. Table 5: Results of five-fold cross-validation for C-LM set

Dataset	$p$	$s$	Method	$R^{2}$	RMSE	Time (s)
CvsD	69	10	SW	0.3367	2.7090	106.4
			L1	0.3360	2.7104	10.0
			BM	0.3367	2.7090	43.3
			S/WHM	0.3366	2.7091	14.9
		20	SW	0.3384	2.7055	311.1
			L1	0.3375	2.7073	6.8
			BM	0.3381	2.7060	1,000.0
			S/WHM	0.3382	2.7059	1,000.0
CvsS	71	10	SW	0.2105	5.2533	121.9
			L1	0.2090	5.2583	24.7
			BM	0.2105	5.2533	43.7
			S/WHM	0.2106	5.2530	14.4
		20	SW	0.2140	5.2416	360.5
			L1	0.2138	5.2426	14.5
			BM	0.2138	5.2423	1,000.0
			S/WHM	0.2143	5.2406	1,000.0
DvsS	71	10	SW	0.2098	6.8622	66.4
			L1	0.2089	6.8663	28.6
			BM	0.2097	6.8626	18.0
			S/WHM	0.2097	6.8628	3.9
		20	SW	0.2113	6.8555	195.8
			L1	0.2110	6.8570	13.1
			BM	0.2114	6.8552	1,000.0
			S/WHM	0.2116	6.8545	1,000.0
average			SW	0.2535	4.9379	193.7
			L1	0.2527	4.9403	16.3
			BM	0.2534	4.9381	517.5
			S/WHM	0.2535	4.9377	505.5

Table 6. Table 6: Results of five-fold cross-validation for C-LMS set

Dataset	$p$	$s$	Method	$R^{2}$	RMSE	Time (s)
CvsD	234	10	SW	0.3531	2.6752	348.2
			L1	0.3467	2.6886	10.1
			BM	0.3346	2.7132	1,000.0
			SHM	0.3509	2.6798	1,000.0
			WHM	0.3530	2.6755	1,000.0
		20	SW	0.3642	2.6522	1,047.0
			L1	0.3612	2.6586	2.7
			BM	0.3412	2.6997	1,000.0
			SHM	0.3618	2.6573	1,000.0
			WHM	0.3644	2.6518	1,000.0
		50	SW	0.3701	2.6400	6,018.8
			L1	0.3691	2.6422	17.5
			BM	0.3552	2.6708	1,000.0
			SHM	0.3693	2.6417	1,000.0
			WHM	0.3697	2.6408	1,000.0
CvsS	277	10	SW	0.2340	5.1751	551.9
			L1	0.2265	5.2004	22.6
			BM	0.1671	5.3965	1,000.0
			SHM	0.2330	5.1786	1,000.0
			WHM	0.2340	5.1751	1,000.0
		20	SW	0.2440	5.1413	1,439.6
			L1	0.2369	5.1655	15.0
			BM	0.2004	5.2865	1,000.0
			SHM	0.2486	5.1256	1,000.0
			WHM	0.2441	5.1412	1,000.0
		50	SW	0.2511	5.1172	8,331.7
			L1	0.2495	5.1228	29.9
			BM	0.2314	5.1838	1,000.0
			SHM	0.2511	5.1170	1,000.0
			WHM	0.2511	5.1172	1,000.0
DvsS	281	10	SW	0.2199	6.8176	256.4
			L1	0.2179	6.8263	23.9
			BM	0.1916	6.9399	1,000.0
			SHM	0.2176	6.8278	1,000.0
			WHM	0.2199	6.8176	1,000.0
		20	SW	0.2256	6.7927	786.7
			L1	0.2243	6.7984	10.1
			BM	0.2088	6.8658	1,000.0
			SHM	0.2255	6.7930	1,000.0
			WHM	0.2264	6.7889	1,000.0
		50	SW	0.2286	6.7795	4,551.5
			L1	0.2289	6.7782	21.9
			BM	0.2228	6.8045	1,000.0
			SHM	0.2289	6.7782	1,000.0
			WHM	0.2286	6.7792	1,000.0
average			SW	0.2767	4.8656	2,592.4
			L1	0.2734	4.8757	17.1
			BM	0.2503	4.9512	1,000.0
			SHM	0.2763	4.8666	1,000.0
			WHM	0.2768	4.8653	1,000.0

Table 7. Table 7: Explanatory variables selected by WHM in CvsD dataset (positive example: convenience store; negative example: drugstore)

Explanatory variable	Coefficient
other commodities (M)	$5.38$	***
nutritional fortification (S)	$3.82$	***
other housewares (S)	$2.82$	***
intercept term	$2.71$	***
other foods (S)	$2.30$	***
hygiene/medical care (S)	$1.40$	***
processed meats (S)	$1.38$	***
delicacies (S)	$1.19$	***
frozen foods (S)	$1.19$	***
sanitary papers (S)	$1.09$	***
alcohol (S)	$0.91$	***
delicatessen (S)	$0.86$	***
ice cream (S)	$0.74$	***
female $\times$ with children	$- 0.45$	***
food (L)	$- 1.17$	***
tofu/konjac (S)	$- 1.23$	***
everyday sundries (M)	$- 1.27$	***
durables (L)	$- 1.57$	***
other foods (M)	$- 1.67$	***
clothes/accessories/sports (L)	$- 2.44$	***
commodities (L)	$- 4.67$	***

Table 8. Table 8: Explanatory variables selected by WHM in CvsS dataset (positive example: convenience store; negative example: grocery supermarket)

Explanatory variable	Coefficient
other commodities (M)	$6.63$	***
nutritional fortification (S)	$6.50$	***
bread/cereal (S)	$4.59$	***
soup (S)	$3.70$	***
delicatessen (S)	$3.28$	***
intercept term	$3.06$	***
hygiene/medical care (S)	$3.05$	***
noodles (S)	$2.53$	***
frozen foods (S)	$1.73$	***
other foods (S)	$1.02$	***
male $\times$ over 60s	$- 0.59$	***
female $\times$ married	$- 0.78$	***
dessert/yogurt (S)	$- 0.92$	***
milk beverage (S)	$- 1.22$	***
female $\times$ with children	$- 1.54$	***
food (L)	$- 1.71$	***
fresh foods (M)	$- 2.05$	***
coffee/tea (S)	$- 2.20$	***
durables (L)	$- 2.87$	***
commodities (L)	$- 5.61$	***
processed foods (M)	$- 5.78$	***

Table 9. Table 9: Explanatory variables selected by WHM in DvsS dataset (positive example: drugstore; negative example: grocery supermarket)

Explanatory variable	Coefficient
health foods (S)	$5.42$	***
infant foods (S)	$4.00$	***
bread/cereal (S)	$3.48$	***
grain (S)	$3.39$	***
healthcare (M)	$3.15$	***
everyday sundries (M)	$2.60$	***
cosmetics (M)	$2.50$	***
soup (S)	$2.14$	***
sweets (S)	$2.03$	***
soft drink (S)	$1.62$	***
noodles (S)	$1.59$	***
tofu/konjac (S)	$1.35$	***
seasoning (S)	$1.32$	***
female $\times$ under class 2	$0.67$	***
intercept term	$0.08$
commodities (L)	$- 0.42$	***
female $\times$ with children	$- 0.61$	***
other housewares (S)	$- 2.64$	***
cultural supplies (L)	$- 2.66$	***
food (L)	$- 3.07$	***
processed foods (M)	$- 4.88$	***

Equations24

y_{i} := \mbox (p u r c ha se q u an t i t y a t s t or e A) - \mbox (p u r c ha se q u an t i t y a t s t or e B)

y_{i} := \mbox (p u r c ha se q u an t i t y a t s t or e A) - \mbox (p u r c ha se q u an t i t y a t s t or e B)

y_{i} = b + j \in G \sum a_{j} x_{ij} + ε_{i} (i = 1, 2, \dots, n),

y_{i} = b + j \in G \sum a_{j} x_{ij} + ε_{i} (i = 1, 2, \dots, n),

\mbox minimi z e_{a, b, z}

\mbox minimi z e_{a, b, z}

z_{j} = 0 \Rightarrow a_{j} = 0 (j \in G),

j \in G \sum z_{j} \leq s,

z_{j} \in {0, 1} (j \in G),

z_{j_{1}} \geq z_{j_{2}} \geq z_{j_{3}} ((j_{1}, j_{2}, j_{3}) \in H),

z_{j_{1}} \geq z_{j_{2}} \geq z_{j_{3}} ((j_{1}, j_{2}, j_{3}) \in H),

z_{j_{1}} \geq z_{j_{2}}, z_{j_{1}} + z_{j_{2}} \geq z_{j_{3}} ((j_{1}, j_{2}, j_{3}) \in H) .

\mbox minimi z e_{a, b, z}

\mbox minimi z e_{a, b, z}

z_{j} = 0 \Rightarrow a_{j} = 0 (j \in G),

j \in G \sum z_{j} \leq s,

z_{j_{1}} \geq z_{j_{2}} \geq z_{j_{3}} ((j_{1}, j_{2}, j_{3}) \in H),

z_{j} \in {0, 1} (j \in G) .

\mbox minimi z e_{a, b, z}

\mbox minimi z e_{a, b, z}

z_{j} = 0 \Rightarrow a_{j} = 0 (j \in G),

j \in G \sum z_{j} \leq s,

z_{j_{1}} \geq z_{j_{2}}, z_{j_{1}} + z_{j_{2}} \geq z_{j_{3}} ((j_{1}, j_{2}, j_{3}) \in H),

z_{j} \in {0, 1} (j \in G) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConsumer Retail Behavior Studies · Consumer Market Behavior and Pricing · Consumer Behavior in Brand Consumption and Identification

Full text

∎

11institutetext: Toshiki Sato 22institutetext: Graduate School of Systems and Information Engineering, University of Tsukuba, 1-1-1 Tennodai, Tsukuba-shi, Ibaraki 305-8573, Japan 33institutetext: Yuichi Takano 44institutetext: School of Network and Information, Senshu University, 2-1-1 Higashimita, Tama-ku, Kawasaki-shi, Kanagawa 214-8580, Japan 55institutetext: Takanobu Nakahara 66institutetext: School of Commerce, Senshu University, 2-1-1 Higashimita, Tama-ku, Kawasaki-shi, Kanagawa 214-8580, Japan

Investigating consumers’ store-choice behavior via hierarchical variable selection

Toshiki Sato

Yuichi Takano

Takanobu Nakahara

(Received: date / Accepted: date)

Abstract

This paper is concerned with a store-choice model for investigating consumers’ store-choice behavior based on scanner panel data. Our store-choice model enables us to evaluate the effects of the consumer/product attributes not only on the consumer’s store choice but also on his/her purchase quantity. Moreover, we adopt a mixed-integer optimization (MIO) approach to selecting the best set of explanatory variables with which to construct a store-choice model. We devise two MIO models for hierarchical variable selection in which the hierarchical structure of product categories is used to enhance the reliability and computational efficiency of the variable selection. We assess the effectiveness of our MIO models through computational experiments on actual scanner panel data. These experiments are focused on the consumer’s choice among three types of stores in Japan: convenience stores, drugstores, and grocery supermarkets. The computational results demonstrate that our method has several advantages over the common methods for variable selection, namely, the stepwise method and $L_{1}$ -regularized regression. Furthermore, our analysis reveals that convenience stores tend to be chosen because of accessibility, drugstores are chosen for the purchase of specific products at low prices, and grocery supermarkets are chosen for health food products by women with families.

Keywords:

Store choice Variable selection Mixed-integer optimization Multiple regression analysis Scanner panel data

1 Introduction

Variable selection, also known as feature/attribute/subset selection, involves selecting a set of relevant explanatory variables from many candidates and using them to construct a statistical model. This procedure facilitates interpretation of the subsequent analysis of the statistical model, and enhances the model’s predictive performance by preventing overfitting (Guyon and Elisseeff 2003). Because datasets are becoming ever larger, computational methods for variable selection are under active investigation in the fields of machine learning and data mining (Blum and Langley 1997; Guyon and Elisseeff 2003; Kohavi and John 1997; Liu and Motoda 2007).

A direct way of selecting the best set of explanatory variables is to evaluate all possible subset models (Furnival and Wilson 2000). However, this approach is unsuitable in practice unless there are sufficiently few candidate variables. Although the stepwise method (Efroymson 1960), regularized/penalized regression (Tibshirani 1996), and metaheuristics (Yusta 2009) are practical approaches to variable selection, they do not necessarily find the best set of explanatory variables under the given goodness-of-fit measures. Hence, alternative approaches based on mixed-integer optimization (MIO) are now receiving considerable attention (Bertsimas and King in press; Bertsimas et al. 2016; Konno and Yamamoto 2009; Maldonado et al. 2014; Miyashiro and Takano 2015a, b; Sato et al. 2016a, 2017; Ustun and Rudin 2016; Wilson and Sahinidis in press) because these have the potential to provide the best set of explanatory variables with respect to several goodness-of-fit measures.

The purpose of this paper is to use MIO-based variable selection to analyze the factors that affect which stores consumers choose. Although there have been various studies of consumer store choice (Baker et al. 2002; Bloemer and De Ruyter 1998; Briesch et al. 2009; Leszczyc and Timmermans 2002; Pan and Zinkhan 2006; Reutterer and Teller 2009), only Sato et al. (2016b) used MIO-based variable selection to analyze it. Specifically, they used panel data of the barcode scans of a consumer’s previous purchases to predict which stores s/he would visit. Their predictive model accounted for the different purchasing patterns among the targeted stores, and hence revealed the product set associated with each store. However, Sato et al. (2016b) did not predict how many purchases the consumer would make. In addition, they did not take into account the hierarchical structure (or inclusion relation) of product categories when constructing their predictive model.

Structured regularization is a method whereby existing structural information (e.g., relationships between explanatory variables) is used in the construction of a statistical model. For instance, sets of interrelated variables are selected simultaneously in the grouped variable selection, whereas the selection of hierarchical variables is prioritized according to previous information in the hierarchical variable selection (Bien et al. 2013; Huang et al. 2011; Jacob et al. 2009; Jenatton et al. 2011; Kim and Xing 2010; Tibshirani et al. 2005; Yuan and Lin 2006; Zhao et al. 2009). These studies used either the stepwise method or regularized/penalized regression as the algorithm for selecting the structured variables. However, to the best of our knowledge, no existing study has used the MIO-based method to select hierarchical variables.

Hence, we propose a purchase-quantity-based store-choice model for analyzing the factors involved in consumer store choices. This model allows us to explore a consumer’s store choices and his/her purchase quantities simultaneously. To construct the store-choice model, we devise two MIO models for selecting hierarchical variables. We evaluate the effectiveness of each MIO model by comparing its computational performance with those of the stepwise method and $L_{1}$ -regularized regression. We also use computational results from panel data of actual barcode scans to clarify the factors involved in consumer store choice.

The remainder of the paper is organized as follows. In Sect. 2, we present our purchase-quantity-based store-choice model. In Sect. 3, we formulate MIO models of variable selection for the store-choice model. In Sect. 4, we report the computational results of variable selection, and in Sect. 5 we conclude with a brief summary.

2 Store-choice model

Let us consider two stores: store A (positive example) and store B (negative example). Each sample datum $i=1,2,\ldots,n$ corresponds to a visit of a consumer to one or other of these stores. On the basis of these data, Sato et al. (2016b) developed a binary classification model for predicting which store a consumer would visit. In contrast, to include consumer purchase quantity in the store-choice model, we define the explained variable $y_{i}$ as follows:

[TABLE]

for each sample $i=1,2,\ldots,n$ . Note that each sample (or visit) is associated with one or other of the two stores. Therefore, for each sample $i$ , $y_{i}$ is a positive integer if store A is chosen or a negative integer if store B is chosen.

We consider consumer demographics and categories of purchased products as explanatory variables that influence store choice. A set of consumer demographics is denoted by $G_{0}$ , and the value $x_{ij}$ for $j\in G_{0}$ is given by the corresponding consumer demographic in each sample $i=1,2,\ldots,n$ . Sets of large, medium, and small product categories are denoted by $G_{1}$ , $G_{2}$ , and $G_{3}$ , respectively, which are all dummy variables. For all $j\in G_{1}\cup G_{2}\cup G_{3}$ , $x_{ij}:=1$ if products in category $j$ are purchased in sample $i$ ; otherwise, $x_{ij}:=0$ . The set of all candidate explanatory variables is denoted by $G:=G_{0}\cup G_{1}\cup G_{2}\cup G_{3}$ .

On the basis of these explanatory and explained variables, we consider the following linear regression model:

[TABLE]

where $b$ is an intercept term to be estimated, $a_{j}$ is a regression coefficient to be estimated for the $j$ th explanatory variable, and $\varepsilon_{i}$ is a prediction residual for each sample $i=1,2,\ldots,n$ . We refer to this regression model as a store-choice model; it explains the effects of the consumer/product attributes not only on the consumer’s store choice but also on his/her purchase quantity.

3 Mixed-integer optimization models for variable selection

In this section, we firstly present a basic MIO model for variable selection (Bertsimas et al. 2016; Konno and Yamamoto 2009). We then describe our MIO models for hierarchical variable selection in the store-choice model.

3.1 Basic MIO model for variable selection

We begin by explaining the decision variables with which we formulate the MIO models. Specifically, $\bm{a}:=(a_{j})_{j\in G}\in\mathbb{R}^{p}$ denotes a vector of decision variables representing regression coefficients, where $p$ is the number of candidate explanatory variables, that is, $p:=|G|$ . Next, $\bm{z}:=(z_{j})_{j\in G}\in\{0,1\}^{p}$ denotes a vector of 0–1 decision variables for variable selection; that is, $z_{j}=1$ if the $j$ th explanatory variable is selected; otherwise, $z_{j}=0$ . If $z_{j}=0$ , then we eliminate the $j$ th explanatory variable from the regression model by setting its coefficient, $a_{j}$ , to zero.

We minimize the residual sum of squares subject to the constraint of the upper bound on the number of selected explanatory variables. Consequently, the basic MIO model for variable selection is posed as follows (Bertsimas et al. 2016; Konno and Yamamoto 2009):

[TABLE]

where $s$ is a user-defined parameter representing the upper bound on the number of selected explanatory variables. The logical implication in (2) can be formulated by using a constraint in the form of a special ordered set of type 1 (SOS1), which is supported by standard MIO software. This constraint implies that no more than one element in the set can have a non-zero value. Therefore, the logical implication in (2) is equivalent to imposing the SOS1 constraints on $\{1-z_{j},a_{j}\}$ for all $j\in G$ . Indeed, if $z_{j}=0$ , then $1-z_{j}$ is non-zero and $a_{j}$ must be zero from the SOS1 constraints.

3.2 MIO models for hierarchical variable selection

To enhance the reliability of the regression analysis, we exploit the hierarchical (or inclusion) relationship among product categories. For instance, the small category “seasoning” is contained in the medium category “processed foods,” and these two categories are contained in the large category “food.”

Let $H$ be the set of 3-tuples $(j_{1},j_{2},j_{3})\in G_{1}\times G_{2}\times G_{3}$ of product categories having such a hierarchical relationship. In other words, $(j_{1},j_{2},j_{3})\in H$ means that large category $j_{1}$ contains medium category $j_{2}$ , and medium category $j_{2}$ contains small category $j_{3}$ . On the basis of these hierarchical relationships, we consider the following constraints in the variable selection:

[TABLE]

To gain a better understanding of these constraints, we suppose that for $(j_{1},j_{2},j_{3})\in H$ , small category $j_{3}$ is selected as an explanatory variable (i.e., $z_{j_{3}}=1$ ). In that case, we must select both large category $j_{1}$ and medium category $j_{2}$ (i.e., $z_{j_{1}}=z_{j_{2}}=1$ ) when the strong hierarchical constraints are imposed. In contrast, the weak hierarchical constraints require us to select at least one of the categories that is superordinate to the selected one. Indeed, the weak hierarchical constraints are satisfied by selecting large category $j_{1}$ (i.e., $z_{j_{1}}=1$ ) even if medium category $j_{2}$ is not selected.

These hierarchical constraints can be validated based on the following observations:

Reliability of variable selection.

When a certain product category is a store-choice factor, its superordinate categories are likely to affect the store choice. For this reason, the hierarchical constraints can improve the reliability of selected explanatory variables.

Accuracy of coefficient estimates.

Superordinate categories are preferentially selected by the hierarchical constraints. Since such categories involve many samples, the accuracy of the coefficient estimates can be raised.

**Efficiency of MIO computations. **

The number of feasible subsets of explanatory variables is greatly reduced by the hierarchical constraints. As a result, the MIO computations can be made more efficient.

The strong hierarchical MIO model is formulated by appending the strong hierarchical constraints to the basic MIO model (1)–(4) as follows:

[TABLE]

Similarly, the weak hierarchical MIO model is framed as follows:

[TABLE]

4 Computational experiments

In this section, we assess the effectiveness of our MIO models for hierarchical variable selection and examine the store-choice factors that are inferred from the computational results.

4.1 Datasets

We used scanner panel data that were provided by the Japanese marketing research company MACROMILL (http://www.macromill.com/global/index.html). These data were collected from home scans by roughly 4,000 consumers during 2012–2013. We focused on choosing among three types of stores in Tokyo: convenience stores, drugstores, and grocery supermarkets. Each store type comprised the leading five chains as determined by sales volume, as given in Table 1.

Each sample corresponded to one consumer visit and was designated as either a positive or negative example according to which store was visited. The three datasets that we analyzed are listed in Table 2, where $n$ is the number of samples.

As shown in Table 3, 37 dummy consumer-demographic variables were created from the results of a questionnaire survey of consumers. We prepared dummy variables corresponding to 5 large categories, 29 medium categories, and 214 small categories of products. These were based on the Japan Item Code File Service (JICFS), which is a standard means of classifying products in Japan. For reference, the large and medium product categories are listed in Table 4. Note that any redundant variable (i.e., zero in all samples) was eliminated in each dataset, which is why the number of candidate explanatory variables differs among datasets.

In the following sections, we use two sets of candidate explanatory variables:

C-LM set:

set of explanatory variables of consumer demographics and {large, medium} product categories;

C-LMS set:

set of explanatory variables of consumer demographics and {large, medium, small} product categories.

Note that the small product categories were excluded from the C-LM set.

4.2 Evaluation of predictive performance

We evaluated the predictive performance of each of the following methods for variable selection by means of five-fold cross-validation:

SW

stepwise method (Efroymson 1960)

L1

$L_{1}$ -regularized regression (Tibshirani 1996)

BM

basic MIO model (1)–(4) (Bertsimas et al. 2016; Konno and Yamamoto 2009)

SHM

strong hierarchical MIO model (5)–(9)

WHM

weak hierarchical MIO model (10)–(14)

All computations were performed on a Macintosh computer with an Intel Xeon X5650 CPU ( $2\times 2.66$ GHz) and 64 GB of memory. The stepwise method was started with no explanatory variables, whereupon the variable that led to the largest decrease in Akaike’s information criterion was added or eliminated iteratively until $s$ variables had been selected. This was performed using the step function in R 3.2.0 (http://www.R-project.org). In the case of $L_{1}$ -regularized regression, the regression coefficients were estimated using the glmnet package in R 3.2.0 for each value of the regularization parameter chosen from $\{0,0.0001,0.0002,\ldots,0.9999,1\}$ . We then chose a set of $s$ variables that had non-zero coefficients, and we estimated those coefficients again using the ordinary least-squares method. The MIO problems were solved using Gurobi Optimizer 6.5 (http://www.gurobi.com). Here, the MIO computation time was limited to 1,000 s; that is, if a computation had not finish by itself within 1,000 s, the best feasible solution obtained thus far was used as the result.

Tables 5 and 6 give the results of five-fold cross-validation for the C-LM and C-LMS sets, respectively. Here, there are $p$ candidate explanatory variables in each dataset, and $s$ is the upper bound on the number of selected explanatory variables, where $s=10,20$ for the C-LM set and $s=10,20,50$ for the C-LMS set. The columns labeled “ $R^{2}$ ” and “RMSE” are the average values of the coefficient of determination and the root-mean-square error measured through the five-fold cross-validation, respectively. These quantify the predictive performance, and the best values among the five methods are indicated in bold. The column labeled “Time (s)” is the average computation time in seconds required for variable selection.

We begin by focusing on the results for the C-LM set (Table 5), from which the small product categories were excluded. In this case, SHM and WHM are the same model, so these results are shown as “S/WHM” in the table. We can see that although SW, BM, and S/WHM provided relatively good predictive performance, the average $R^{2}$ and RMSE values of S/WHM were the best among all the methods. The worst predictive performance was always delivered by L1 . In contrast to the other methods, S/WHM exploited the hierarchical relationships of product categories in the course of variable selection. As a result, its predictive performance was better than those of the others.

We next discuss the results for the C-LMS set (Table 6), in which the small product categories were included. As in Table 5, good predictive performance was achieved by SW, SHM, and WHM, whereas the average $R^{2}$ and RMSE values were slightly better for WHM than for the others. In contrast, the predictive performance of BM was greatly decreased. The main reason for this was that the C-LMS set involved many candidate explanatory variables, and thus BM failed to provide quality solutions because of the limited computation time. Nevertheless, SHM and WHM maintained high predictive performance for the C-LMS set because the number of feasible solutions was reduced by the hierarchical constraints.

The best predictive performance for the CvsD dataset with $s=50$ was attained by SW (Table 6), but it is noteworthy that SW took more than 6,000 s to compute. If we quit the stepwise method in the middle of computation, the number of selected variables did not reach $s$ . Because such a shortage of selected variables could lead to an unfair comparison, we did not set a time limit on the SW computation. However, we should note that the computation time of SW was 1,047.0 s for the CvsD dataset with $s=20$ (Table 6). In other words, SW would select fewer than 20 variables if the computation time is limited to 1,000 s. In addition, the predictive performance was much lower for SW with $s=20$ than for SHM and WHM with $s=50$ . In fact, for the CvsD dataset, the $R^{2}$ value of SW with $s=20$ was 0.3642, and those of SHM and WHM with $s=50$ were 0.3693 and 0.3697, respectively. Taking all aspects into consideration, it is clear that SW was inferior to SHM and WHM in relation to the selection of many explanatory variables within a limited time. In contrast, although L1 delivered relatively low predictive performance, its computation was extremely rapid.

4.3 Analysis of store-choice factors

Tables 7–9 give the explanatory variables selected by WHM with $s=20$ for the C-LMS set. Here, all the samples in each dataset were used for variable selection, and the time limit for an MIO computation was extended to 10,000 s. In the tables, (L), (M), and (S) stand for large, medium, and small product categories, respectively.

4.3.1 Convenience store versus drugstore

Table 7 gives the results for the CvsD dataset, for which the variables with positive coefficients were the choice factors of convenience stores, and those with negative coefficients were the choice factors of drugstores. The absolute value of a coefficient indicates the number of products purchased together. For instance, in Table 7 we see that “other commodities (M)” was a choice factor of convenience stores, and that the purchase of “other commodities (M)” led to the purchase of roughly five products from such a store.

In the case of WHM, if a certain category is selected, at least one of its superordinate categories has to be selected because of the weak hierarchical constraints. For instance, Table 7 shows that “other foods (M)” was selected along with “food (L),” and that “sanitary papers (S),” “everyday sundries (M),” and “commodities (L)” were selected according to the hierarchical relationship. Meanwhile, these variables can have coefficients with opposite signs. In fact, the coefficients of “sanitary papers (S)” and “commodities (L)” were $1.09$ and $-4.67$ , respectively. In what follows, we analyze these results in relation to product price, store accessibility, and selection of products.

In Table 7, “commodities (L)” was a choice factor of drugstores, but “other commodities (M)” was a choice factor of convenience stores. The choice factor “commodities (L)” contained featured products of drugstores, such as tissues, toilet paper, and detergent. Hence, consumers tended to purchase these products at drugstores because of their lower prices. In contrast, “other commodities (M)” was composed largely of Amazon/iTunes gift cards and garbage disposal permits from convenience stores. These products were purchased in convenience stores probably because of their accessibility. Additionally, the coefficient of “other commodities (M)” was very large (i.e., 5.38); that is, when those products were purchased, other products were likely to be purchased at the same time. In contrast, food categories such as “frozen foods (S),” “alcohol (S),” “delicatessen (S),” and “ice cream (S)” had relatively small positive coefficients. This means that consumers would purchase only those products in a single visit, and hence would choose convenience stores because of their accessibility.

The choice factor “everyday sundries (M)” was composed primarily of detergents, tissues, and toilet paper purchased from drugstores, probably because of the lower prices. The choice factor “clothes/accessories/sports (L)” consisted mainly of women’s beauty products that are specific to drugstores, such as stockings, tights, and open-toe socks. Moreover, because “female $\times$ with children” and “commodities (L)” were choice factors of drugstores, we reason that it is on economic grounds that women shop at drugstores for commodities for their families.

4.3.2 Convenience store versus grocery supermarket

Table 8 gives the results for the CvsS dataset, for which the variables with positive coefficients were the choice factors of convenience stores, and those with negative coefficients were the choice factors of grocery supermarkets. As in Table 7, the coefficient of “other commodities (M)” was a very large positive value (i.e., 6.63). Accordingly, gift cards and garbage disposal permits were still major choice factors of convenience stores. Other choice factors of convenience stores included “bread/cereal (S),” “soup (S),” “delicatessen (S),” “noodles (S),” and “frozen foods (S).” These easy-to-prepare products were purchased at convenience stores probably because of accessibility. In contrast, the choice factors of grocery supermarkets included “male $\times$ over 60s,” “female $\times$ married,” and “female $\times$ with children” as consumer demographics, and “dessert/yogurt (S),” “milk beverage (S),” and “fresh foods (M)” as product categories. In other words, the typical customers of grocery supermarkets were elderly men and married women who purchased healthy foods there. The large categories “food (L),” “durables (L),” and “commodities (L)” were also choice factors of grocery supermarkets; these are typical products of general merchandise stores and shopping malls.

4.3.3 Drugstore versus grocery supermarket

Table 9 gives the results for the DvsS dataset, for which the variables with positive coefficients were the choice factors of drugstores, and those with negative coefficients were the choice factors of grocery supermarkets. We can see that drugstores were chosen for typical product categories such as “health foods (S),” “infant foods (S),” “healthcare (M),” and “cosmetics (M).” Additionally, “bread/cereal (S),” “soup (S),” and “noodles (S)” were common choice factors of convenience stores in Table 8 and drugstores in Table 9. In other words, convenience stores and drugstores were chosen for similar products when compared to grocery supermarkets. In addition, some food products such as “health foods (S),” “infant foods (S),” “bread/cereal (S),” and “grain (S)” had large positive coefficients. Hence, these products could stimulate impulse purchases at drugstores and so increase average customer sales. In contrast, the large categories “commodities (L)” and “food (L)” were choice factors of grocery supermarkets, which were chosen especially for “processed foods (M)” by “female $\times$ with children.” To summarize the results of Tables 8 and 9, grocery supermarkets should provide a good selection of health food products for women with families.

4.3.4 Intercept term

We conclude this section by discussing the intercept term, which can be interpreted as quantifying a store’s ability to attract consumers once the effects of all other consumer/product attributes have been eliminated. We can see from Tables 7 and 8 that convenience stores had the highest potential to attract consumers. This suggests that impulse purchases would be made frequently at convenience stores.

5 Conclusions

We proposed an MIO-based method of hierarchical variable selection and analyzed consumer store-choice factors based on purchase quantity. Our method improved the predictive performance based on hierarchically structured product categories. It also offered better computational efficiency because the hierarchical constraints reduced the number of feasible solutions appreciably. We verified the effectiveness of our method by comparing its computational performance with those of the stepwise method and $L_{1}$ -regularized regression through a five-fold cross-validation.

We used our variable selection method to examine consumer store choice among convenience stores, drugstores, and grocery supermarkets. We found from the analysis that convenience stores were chosen because of their accessibility, whereas drugstores were chosen in order to purchase specific products at low prices. We also found that grocery supermarkets were chosen especially for health food products and were favored by women with families.

A future direction of study would be to append constraints for eliminating multicollinearity (Bertsimas and King 2016; Tamura et al. 2016, in press) to our MIO models. These constraints could further improve the accuracy of coefficient estimates in the store-choice model. Although our computational experiments were focused on three store types, various analyses of store-choice factors could be performed using our MIO models. For instance, consumer choice between stores of the same type could be investigated, and seasonal/regional store-choice factors could be identified by applying our store-choice model to the data of each season/region. To gain customers in today’s fiercely competitive market, each store/company needs to accelerate its product development, have a good selection of products, and examine how it sets product prices. Accordingly, the results of our store-choice analysis would be useful in understanding the strengths and weaknesses of each store, and for devising its unique marketing strategy.

Acknowledgements.

This research was partially supported by a Grant-in-Aid of Joint Research from the Institute of Information Science, Senshu University.

Bibliography36

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Baker et al. (2002) Baker J, Parasuraman A, Grewal D, Voss GB (2002) The influence of multiple store environment cues on perceived merchandise value and patronage intentions. J Mark 66:120–141
2Bertsimas and King (2016) Bertsimas D, King A (2016) An algorithmic approach to linear regression. Oper Res 64:2–16
3Bertsimas and King (in press) Bertsimas D, King A (in press) Logistic regression: From art to science. Stat Sci
4Bertsimas et al. (2016) Bertsimas D, King A, Mazumder R (2016) Best subset selection via a modern optimization lens. Ann Stat 44:813–852
5Bien et al. (2013) Bien J, Taylor J, Tibshirani R (2013) A lasso for hierarchical interactions. Ann Stat 41:1111–1141
6Bloemer and De Ruyter (1998) Bloemer J, De Ruyter K (1998) On the relationship between store image, store satisfaction and store loyalty. Eur J Mark 32:499–513
7Blum and Langley (1997) Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intel 97:245–271
8Briesch et al. (2009) Briesch RA, Chintagunta PK, Fox EJ (2009) How does assortment affect grocery store choice?. J Mark Res 46:176–189