Multitask Learning Deep Neural Networks to Combine Revealed and Stated   Preference Data

Shenhao Wang; Qingyi Wang; Jinhua Zhao

arXiv:1901.00227·econ.GN·August 28, 2019

Multitask Learning Deep Neural Networks to Combine Revealed and Stated Preference Data

Shenhao Wang, Qingyi Wang, Jinhua Zhao

PDF

TL;DR

This paper introduces a multitask deep neural network framework that effectively combines revealed and stated preference data, outperforming traditional models in travel behavior analysis and providing interpretable insights into autonomous vehicle adoption.

Contribution

The study develops a novel MTLDNN framework that surpasses classical models in prediction accuracy and offers interpretability for travel behavior data analysis.

Findings

01

MTLDNNs outperform benchmark models by about 5% in prediction accuracy.

02

Soft constraints and architectural design are key to performance gains.

03

AV is mainly a substitute for driving, with specific variables more influential than socio-economic factors.

Abstract

It is an enduring question how to combine revealed preference (RP) and stated preference (SP) data to analyze travel behavior. This study presents a framework of multitask learning deep neural networks (MTLDNNs) for this question, and demonstrates that MTLDNNs are more generic than the traditional nested logit (NL) method, due to its capacity of automatic feature learning and soft constraints. About 1,500 MTLDNN models are designed and applied to the survey data that was collected in Singapore and focused on the RP of four current travel modes and the SP with autonomous vehicles (AV) as the one new travel mode in addition to those in RP. We found that MTLDNNs consistently outperform six benchmark models and particularly the classical NL models by about 5% prediction accuracy in both RP and SP datasets. This performance improvement can be mainly attributed to the soft constraints…

Tables5

Table 1. Table 1: Comparison of Eight Models

	MTLDNN (Top1)	MTLDNN-E (Top10)	DNN-SPT	DNN-JOINT	NL-C	NL-NC	MNL-SPT	MNL-JOINT
Panel 1: Prediction Accuracy
Joint RP+SP (Testing)	60.0%	58.7%	53.4%	53.8%	55.4%	55.0%	55.0%	51.9%
RP (Testing)	69.9%	66.6%	65.8%	65.8%	65.4%	64.7%	64.5%	44.0%
SP (Testing)	58.2%	57.2%	51.1%	51.5%	53.5%	53.2%	53.2%	53.5%
Joint RP+SP (Training)	60.7%	62.2%	52.5%	52.9%	54.0%	54.5%	54.4%	50.3%
RP (Training)	69.1%	71.9%	59.8%	59.8%	58.9%	62.2%	62.1%	37.0%
SP (Training)	59.1%	60.3%	51.1%	51.5%	53.0%	53.0%	53.0%	52.8%
Panel 2: Different Characteristics of Models
Automatic Feature Learning	$\times$	$\times$	$\times$	$\times$
Soft Constraints	$\times$	$\times$
Hard Constraints					$\times$
Data Augmentation	$\times$	$\times$		$\times$	$\times$	$\times$		$\times$

Table 2. Table 2: Average Elasticity of Choosing AV

Variable	Elasticity
AV Cost	-0.981
AV In-Vehicle Time	-0.905
Age	-0.561
AV Wait Time	-0.375
Income	0.102

Table 3. Table 3: Hyperparameter Space of MTLDNN

Hyperparameter Dimensions	Values
Shared M1	$[1, 2, 3, 4, 5]$
Domain-specific M2	$[1, 2, 3, 4, 5]$
$λ_{1}$ constant	$[1 e - 20, 1 e - 4, 1 e - 2,, 5 e - 1]$
$λ_{2}$ constant	$[1 e - 20, 1 e - 4, 1 e - 2,, 5 e - 1]$
$λ_{3}$ constant	$[1 e - 20, 1 e - 4, 1 e - 2,, 5 e - 1]$
n hidden	$[25, 50, 100, 200]$
n iteration	$20000$
n mini batch	$200$

Table 4. Table 4: Survey Descriptive Summary Statistics

Age Group	Population (%)	Sample (%)	Income Group	Population (%)	Sample (%)
$20 - 24$	8.42	16.31	No income	10.79	1.46
$25 - 29$	9.04	17.32	Below $2,000	7.49	7.19
$30 - 34$	9.22	15.45	$2,000 $-$ $3,999	10.69	14.9
$35 - 39$	9.75	14.08	$4,000 $-$ $5,999	11.29	17.35
$40 - 44$	10.12	10.09	$6,000 $-$ $7,999	10.89	15.57
$45 - 49$	9.72	10.2	$8,000 $-$ $9,999	9.49	14.77
$50 - 54$	10.19	7.42	$10,000 $-$ $11,999	8.39	10.07
$55 - 59$	9.67	4.93	$12,000 $-$ $14,999	9.09	8.22
$60 - 64$	8.13	2.49	$15,000 $-$ $19,999	9.49	4.78
$65 - 69$	6.39	0.67	Over $20,000	12.39	5.69
$70 - 74$	3.35	0.91
$75 - 79$	2.84	0
$80 - 84$	1.73	0.13
$85 +$	1.43	0

Table 5. Table 5: Top 10 MTLDNN Architectures

Shared M1	Domain-specific M2	n hidden	$λ_{1}$	$λ_{2}$	$λ_{3}$
1	1	25	1.00E-02	1.00E-02	1.00E-04
3	2	25	1.00E-02	1.00E-04	1.00E-20
1	1	25	1.00E-20	1.00E-02	1.00E-02
1	1	25	1.00E-02	5.00E-01	1.00E-04
1	1	100	1.00E-02	1.00E-20	1.00E-04
1	4	25	1.00E-02	5.00E-01	1.00E-02
1	1	200	1.00E-02	5.00E-01	1.00E-02
1	1	50	1.00E-02	1.00E-02	1.00E-20
3	1	100	1.00E-02	1.00E-04	1.00E-04
2	3	50	1.00E-02	5.00E-01	1.00E-20

Equations22

V_{k_{r}, i} = (g_{r}^{M_{2}, k_{r}} \circ g_{r}^{M_{2} - 1} \circ ... \circ g_{r}^{1}) \circ (g_{0}^{M_{1}} \circ g_{0}^{M_{1} - 1} \circ ... \circ g_{0}^{1}) (x_{r, i})

V_{k_{r}, i} = (g_{r}^{M_{2}, k_{r}} \circ g_{r}^{M_{2} - 1} \circ ... \circ g_{r}^{1}) \circ (g_{0}^{M_{1}} \circ g_{0}^{M_{1} - 1} \circ ... \circ g_{0}^{1}) (x_{r, i})

V_{k_{s}, t} = (g_{s}^{M_{2}, k_{s}} \circ g_{s}^{M_{2} - 1} \circ ... \circ g_{s}^{1}) \circ (g_{0}^{M_{1}} \circ g_{0}^{M_{1} - 1} \circ ... \circ g_{0}^{1}) (x_{s, t})

P (y_{k_{r}, i}; w_{r}, w_{0}) = \frac{e ^{V_{k_{r}, i}}}{\sum _{j_{r} = 1}^{K_{r}} e ^{V_{j_{r}, i}}}

P (y_{k_{r}, i}; w_{r}, w_{0}) = \frac{e ^{V_{k_{r}, i}}}{\sum _{j_{r} = 1}^{K_{r}} e ^{V_{j_{r}, i}}}

P (y_{k_{s}, t}; w_{s}, w_{0}, T) = \frac{e ^{V_{k_{s}, t} / T}}{\sum _{j_{s} = 1}^{K_{s}} e ^{V_{j_{s}, t} / T}}

w_{r}, w_{s}, w_{0}, T min R (X, Y; w_{r}, w_{s}, w_{0}, T; c_{H}) = w_{r}, w_{s}, w_{0}, T min

w_{r}, w_{s}, w_{0}, T min R (X, Y; w_{r}, w_{s}, w_{0}, T; c_{H}) = w_{r}, w_{s}, w_{0}, T min

- \frac{λ _{0}}{N _{s}} t = 1 \sum N_{s} k_{s} = 1 \sum K_{s} y_{k_{s}} lo g P (y_{k_{s}, t}; w_{r}, w_{0}, T; c_{H})

\displaystyle\ \ \ \ +\lambda_{1}||w_{0}||^{2}_{2}+\lambda_{2}||w_{s}||^{2}_{2}+\lambda_{3}||\tilde{w}_{s}-w_{r}||_{2}^{2}\Big{\}}

U_{k_{r}, i} = V_{k_{r}, i} + ϵ_{k_{r}} = β_{k_{r}}^{T} ϕ (x_{r, i}) + ϵ_{k_{r}, i}

U_{k_{r}, i} = V_{k_{r}, i} + ϵ_{k_{r}} = β_{k_{r}}^{T} ϕ (x_{r, i}) + ϵ_{k_{r}, i}

U_{k_{s}, t} = V_{k_{s}, t} + ϵ_{k_{s}} = β_{k_{s}}^{T} ϕ (x_{s, t}) + ϵ_{k_{s}, t}

V a r (ϵ_{k_{r}, i}) / V a r (ϵ_{k_{s}, t}) = 1/ θ^{2}

V a r (ϵ_{k_{r}, i}) / V a r (ϵ_{k_{s}, t}) = 1/ θ^{2}

P (y_{k_{r}, i}; β_{r}) = \frac{e ^{β_{k_{r}}^{T} ϕ (x_{r, i})}}{\sum _{j_{r} = 1}^{K_{r}} e ^{β_{j_{r}}^{T} ϕ (x_{r, i})}}

P (y_{k_{r}, i}; β_{r}) = \frac{e ^{β_{k_{r}}^{T} ϕ (x_{r, i})}}{\sum _{j_{r} = 1}^{K_{r}} e ^{β_{j_{r}}^{T} ϕ (x_{r, i})}}

P (y_{k_{s}, t}; β_{s}) = \frac{e ^{β_{k_{s}}^{T} ϕ (x_{s, t}) / θ}}{\sum _{j_{s} = 1}^{K_{s}} e ^{β_{j_{s}}^{T} ϕ (x_{s, t}) / θ}}

\displaystyle\underset{\beta_{r},\beta_{s}}{\min}\ R(X,Y;\beta_{r},\beta_{s})=\underset{\beta_{r},\beta_{s}}{\min}\Big{\{}-\frac{1}{N}\big{[}\sum_{i=1}^{N_{r}}\sum_{k_{r}=1}^{K_{r}}y_{k_{r},i}\log P(y_{k_{r},i};\beta_{r})+\sum_{t=1}^{N_{s}}\sum_{k_{s}=1}^{K_{s}}y_{k_{s},t}\log P(y_{k_{s},t};\beta_{s})\big{]}\Big{\}}

\displaystyle\underset{\beta_{r},\beta_{s}}{\min}\ R(X,Y;\beta_{r},\beta_{s})=\underset{\beta_{r},\beta_{s}}{\min}\Big{\{}-\frac{1}{N}\big{[}\sum_{i=1}^{N_{r}}\sum_{k_{r}=1}^{K_{r}}y_{k_{r},i}\log P(y_{k_{r},i};\beta_{r})+\sum_{t=1}^{N_{s}}\sum_{k_{s}=1}^{K_{s}}y_{k_{s},t}\log P(y_{k_{s},t};\beta_{s})\big{]}\Big{\}}

\overset{c}{^}_{H} = c_{H} \in {c_{H}^{(1)}, c_{H}^{(2)}, ..., c_{H}^{(S)}} argmin R (X, Y; \overset{w}{^}_{r}, \overset{w}{^}_{s}, \overset{w}{^}_{0}, \hat{T}; c_{H})

\overset{c}{^}_{H} = c_{H} \in {c_{H}^{(1)}, c_{H}^{(2)}, ..., c_{H}^{(S)}} argmin R (X, Y; \overset{w}{^}_{r}, \overset{w}{^}_{s}, \overset{w}{^}_{0}, \hat{T}; c_{H})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Multitask Learning Deep Neural Networks to Combine Revealed and Stated Preference Data

Shenhao Wang; Qingyi Wang; Jinhua Zhao

Massachusetts Institute of Technology

Aug 2019

Abstract

It is an enduring question how to combine revealed preference (RP) and stated preference (SP) data to analyze travel behavior. This study presents a framework of multitask learning deep neural networks (MTLDNNs) for this question, and demonstrates that MTLDNNs are more generic than the traditional nested logit (NL) method, due to its capacity of automatic feature learning and soft constraints. About 1,500 MTLDNN models are designed and applied to the survey data that was collected in Singapore and focused on the RP of four current travel modes and the SP with autonomous vehicles (AV) as the one new travel mode in addition to those in RP. We found that MTLDNNs consistently outperform six benchmark models and particularly the classical NL models by about 5% prediction accuracy in both RP and SP datasets. This performance improvement can be mainly attributed to the soft constraints specific to MTLDNNs, including its innovative architectural design and regularization methods, but not much to the generic capacity of automatic feature learning endowed by a standard feedforward DNN architecture. Besides prediction, MTLDNNs are also interpretable. The empirical results show that AV is mainly the substitute of driving and AV alternative-specific variables are more important than the socio-economic variables in determining AV adoption. Overall, this study introduces a new MTLDNN framework to combine RP and SP, and demonstrates its theoretical flexibility and empirical power for prediction and interpretation. Future studies can design new MTLDNN architectures to reflect the speciality of RP and SP and extend this work to other behavioral analysis.

1 Introduction

Both revealed preference (RP) and stated preference (SP) data are widely used for demand analysis with their own pros and cons. RP data are commonly thought to have stronger external validity, but problematic owing to the limited coverage and high correlation of attributes. SP data are necessary when researchers seek to understand the effects of new attributes or alternatives, while they have biases owing to respondents’ sensitivity to survey formats and unrealistic hypothetical scenarios. To mitigate these problems, researchers often combine them by using a nested logit (NL) approach, which assigns the alternatives in RP and SP to two nests with different scale factors [Hensher1993, Bradley1997, Ben_Akiva1990, Ben_Akiva1994] 111This nested logit method can also be seen as a pooled estimation with heteroscedasticity across RP and SP [Louviere1999, Helveston2018]. This NL approach is used to predict future travel demand and examine the factors that determine the adoption of certain travel alternatives based on parameter estimation. However, this NL method heavily relies on handcrafted feature engineering based on domain knowledge, which could be too restrictive in comparison to the automatic feature learning in deep neural networks (DNNs), as shown in many empirical studies [LeCun2015, Bengio2013, Collobert2008]. This capacity of automatically learning features is enabled by the theoretically appealing property of DNN being a universal approximator [Hornik1989, Hornik1991, Cybenko1989], and as a result, DNN has demonstrated its extraordinary prediction power across the domains of natural language processing, image recognition, and travel behavioral analysis [Fernandez2014, Krizhevsky2012, LeCun2015]. The theoretically appealing property and the empirically predictive power of DNN prompt us to ask whether it is possible to address the classical problem of combining RP and SP for demand analysis in a DNN framework, in a way more generic and flexible than the traditional NL method.

This paper presents the multitask learning deep neural network (MTLDNN) framework to jointly model RP and SP as two different but relevant tasks. One MTLDNN architecture is visualized in Figure 1, which starts with shared layers and ends with task-specific layers, capturing both the similarities and differences between tasks [Caruana1997]. This architecture is more generic than the classical NL method, because it has the capacity of automatic feature learning and soft constraints. The automatic feature learning in MTLDNN referes to the process of automatically learning the feature transformation based on a powerful model class assumption (e.g. DNN), as opposed to the handcrafted feature engineering in the NL that relies on researchers’ prior knowledge for model specification. The soft constraints refer to the flexibility of MTLDNN architectures and the regularization methods used in the training process of MTLDNNs, as opposed to the hard constraints such as parameter sharing between tasks, as commonly done in the NL approach. Specifically, the prototype MTLDNN architecture in Figure 1 is flexible because it could take various forms with different shared and task-specific layers, which are designed into the hyperparameter space of the MTLDNN model.