Variable Selection in Functional Linear Concurrent Regression

Rahul Ghosal; Arnab Maity; Timothy Clark; Stefano B Longo

arXiv:1904.08507·stat.AP·November 1, 2019

Variable Selection in Functional Linear Concurrent Regression

Rahul Ghosal, Arnab Maity, Timothy Clark, Stefano B Longo

PDF

1 Repo

TL;DR

This paper introduces a new variable selection method for functional linear concurrent regression, effectively identifying influential time-varying factors in complex, noisy, and sparse data settings, with applications in fisheries footprint and dietary studies.

Contribution

The paper extends scalar variable selection techniques like LASSO, SCAD, and MCP to the functional linear concurrent regression context, using group penalties for high accuracy.

Findings

01

High accuracy in variable selection demonstrated through simulations

02

Minimal false positives and negatives even with sparse, noisy data

03

Successful application to fisheries footprint and dietary calcium absorption studies

Abstract

We propose a novel method for variable selection in functional linear concurrent regression. Our research is motivated by a fisheries footprint study where the goal is to identify important time-varying socio-structural drivers influencing patterns of seafood consumption, and hence fisheries footprint, over time, as well as estimating their dynamic effects. We develop a variable selection method in functional linear concurrent regression extending the classically used scalar on scalar variable selection methods like LASSO, SCAD, and MCP. We show in functional linear concurrent regression the variable selection problem can be addressed as a group LASSO, and their natural extension; group SCAD or a group MCP problem. Through simulations, we illustrate our method, particularly with group SCAD or group MCP penalty, can pick out the relevant variables with high accuracy and has minuscule…

Tables3

Table 1. Table 3: Comparison of MC absolute bias and mean square error.

Sample Size	Method	${\hat{β}}_{1} (t)$		${\hat{β}}_{2} (t)$		${\hat{β}}_{3} (t)$
Sample Size	Method	Bias	MSE	Bias	MSE	Bias	MSE
n=100	FLASSO	0.083	0.033	0.092	0.048	0.112	0.191
	FSCAD	0.011	0.025	0.015	0.038	0.022	0.165
	FMCP	0.011	0.025	0.015	0.038	0.022	0.165
n=200	FLASSO	0.061	0.017	0.069	0.024	0.092	0.109
	FSCAD	0.007	0.013	0.008	0.019	0.010	0.091
	FMCP	0.007	0.013	0.008	0.019	0.010	0.091
n=400	FLASSO	0.047	0.009	0.051	0.013	0.070	0.063
	FSCAD	0.004	0.007	0.004	0.010	0.010	0.050
	FMCP	0.004	0.007	0.004	0.010	0.010	0.050

Table 2. Table 4: Selection Percentages ( % percent \% ) of variables in Calcium absorption Study.

Method	Var 1(Calcium Intake)	Var 2 (BSA)	Var 3(BMI)	Max Var (4-18)
FSCAD	100	0	0	0
FMCP	100	0	0	0
BIC (LM)	100	100	0	9
Cp (LM)	100	100	2	33
PGEE (ind)	100	0	100	31
PGEE (AR-1)	100	0	100	31

Table 3. Table 5: List of covariates in the Fisheries Footprint Study.

Predictor Variables in the Fisheries Footprint study
agriculture value added	aquaculture production tons
arable land hectares	arable land pct
exportsofgoodsandservicesofgdpn	fao livestock
FDI inflow	foodexportsofmerchandiseexprtst
foodimportsofmerchandiseimprtst	gdp pc 2010
manufacturing value pctGDP	meat consumption FAO
population 15_64 pct	population density
populationtotalsppoptotl	services value growth pct
tractors	trade pct GDP
urban pop	urban pop pct

Equations33

Y_{i} (t) = j = 1 \sum p Z_{ij} (t) β_{j} (t) + ϵ_{i} (t),

Y_{i} (t) = j = 1 \sum p Z_{ij} (t) β_{j} (t) + ϵ_{i} (t),

L (b) = 1/ n \int {Y (t) - Z (t) Θ (t) b}^{T} {Y (t) - Z (t) Θ (t) b} d t + λ j = 1 \sum p (b_{j}^{T} \amsmathbb K_{ϕ, j} b_{j})^{1/2} .

L (b) = 1/ n \int {Y (t) - Z (t) Θ (t) b}^{T} {Y (t) - Z (t) Θ (t) b} d t + λ j = 1 \sum p (b_{j}^{T} \amsmathbb K_{ϕ, j} b_{j})^{1/2} .

i = 1 \sum n l = 1 \sum m [Y_{i} (t_{l}) - j = 1 \sum p Z_{ij} (t_{l}) {k = 1 \sum k_{j} b_{k j} θ_{k j} (t_{l})}]^{2} + λmn j = 1 \sum p (b_{j}^{T} \amsmathbb K_{ψ, j} b_{j})^{1/2} .

i = 1 \sum n l = 1 \sum m [Y_{i} (t_{l}) - j = 1 \sum p Z_{ij} (t_{l}) {k = 1 \sum k_{j} b_{k j} θ_{k j} (t_{l})}]^{2} + λmn j = 1 \sum p (b_{j}^{T} \amsmathbb K_{ψ, j} b_{j})^{1/2} .

R (γ)

R (γ)

= i = 1 \sum n l = 1 \sum m [Y_{i} (t_{l}) - j = 1 \sum p Z_{ij}^{*} (t_{l})^{T} b_{j}]^{2} + λmn j = 1 \sum p (b_{j}^{T} \amsmathbb K_{ψ, j} b_{j})^{1/2} where Z_{ij}^{*} (t_{l})^{T} = Z_{ij} (t_{l}) \times θ_{j} (t_{l})^{T}

= i = 1 \sum n l = 1 \sum m [Y_{i} (t_{l}) - j = 1 \sum p \tilde{Z_{ij}}^{*} (t_{l})^{T} γ_{j}]^{2} + λmn j = 1 \sum p (γ_{j}^{T} γ_{j})^{1/2} where \tilde{Z_{ij}}^{*} (t_{l}) = \amsmathbb L_{ψ, j}^{- 1} Z_{ij}^{*} (t_{l})

= i = 1 \sum n ∣∣ Y_{i} - \amsmathbb Z_{i}^{*} γ ∣ ∣_{2}^{2} + λmn j = 1 \sum p (γ_{j}^{T} γ_{j})^{1/2} = i = 1 \sum n ∣∣ Y_{i} - j = 1 \sum p \amsmathbb Z_{i}^{* j} γ_{j} ∣ ∣_{2}^{2} + λmn j = 1 \sum p (γ_{j}^{T} γ_{j})^{1/2},

\amsmathbb Z_{i}^{*}

\amsmathbb Z_{i}^{*}

^γ

^γ

= γ_{j}, j = 1, 2, \dots, p argmin i = 1 \sum n ∣∣ Y_{i} - j = 1 \sum p \amsmathbb Z_{i}^{* j} γ_{j} ∣ ∣_{2}^{2} + λmn j = 1 \sum p ∣∣ γ_{j} ∣ ∣_{2}

= γ_{j},, j = 1, 2, \dots, p argmin i = 1 \sum n ∣∣ Y_{i} - j = 1 \sum p \amsmathbb Z_{i}^{* j} γ_{j} ∣ ∣_{2}^{2} + mn j = 1 \sum p P_{L A S S O, λ} (∣∣ γ_{j} ∣ ∣_{2}) .

\hat{γ} = γ_{j}, j = 1, 2, \dots, p argmin i = 1 \sum n ∣∣ Y_{i} - j = 1 \sum p \amsmathbb Z_{i}^{* j} γ_{j} ∣ ∣_{2}^{2} + mn j = 1 \sum p P_{S C A D, λ, ϕ} (∣∣ γ_{j} ∣ ∣_{2}),

\hat{γ} = γ_{j}, j = 1, 2, \dots, p argmin i = 1 \sum n ∣∣ Y_{i} - j = 1 \sum p \amsmathbb Z_{i}^{* j} γ_{j} ∣ ∣_{2}^{2} + mn j = 1 \sum p P_{S C A D, λ, ϕ} (∣∣ γ_{j} ∣ ∣_{2}),

P_{S C A D, λ, ϕ} (∣∣ γ_{j} ∣ ∣_{2}) = ⎩ ⎨ ⎧ λ ∣∣ γ_{j} ∣ ∣_{2} if ∣∣ γ_{j} ∣ ∣_{2} \leq λ . \frac{λ ϕ ∣∣ γ _{j} ∣ ∣ _{2} - .5 ( ∣∣ γ _{j} ∣ ∣ _{2}^{2} + λ ^{2} )}{ϕ - 1} if λ < ∣∣ γ_{j} ∣ ∣_{2} \leq λ ϕ . .5 λ^{2} (ϕ + 1) if ∣∣ γ_{j} ∣ ∣_{2} > λ ϕ .

P_{S C A D, λ, ϕ} (∣∣ γ_{j} ∣ ∣_{2}) = ⎩ ⎨ ⎧ λ ∣∣ γ_{j} ∣ ∣_{2} if ∣∣ γ_{j} ∣ ∣_{2} \leq λ . \frac{λ ϕ ∣∣ γ _{j} ∣ ∣ _{2} - .5 ( ∣∣ γ _{j} ∣ ∣ _{2}^{2} + λ ^{2} )}{ϕ - 1} if λ < ∣∣ γ_{j} ∣ ∣_{2} \leq λ ϕ . .5 λ^{2} (ϕ + 1) if ∣∣ γ_{j} ∣ ∣_{2} > λ ϕ .

\hat{γ} = γ_{j}, j = 1, 2, \dots, p argmin i = 1 \sum n ∣∣ Y_{i} - j = 1 \sum p \amsmathbb Z_{i}^{* j} γ_{j} ∣ ∣_{2}^{2} + mn j = 1 \sum p P_{M C P, λ, ϕ} (∣∣ γ_{j} ∣ ∣_{2}),

\hat{γ} = γ_{j}, j = 1, 2, \dots, p argmin i = 1 \sum n ∣∣ Y_{i} - j = 1 \sum p \amsmathbb Z_{i}^{* j} γ_{j} ∣ ∣_{2}^{2} + mn j = 1 \sum p P_{M C P, λ, ϕ} (∣∣ γ_{j} ∣ ∣_{2}),

P_{M C P, λ, ϕ} (∣∣ γ_{j} ∣ ∣_{2}) = {λ ∣∣ γ_{j} ∣ ∣_{2} - \frac{∣∣ γ _{j} ∣ ∣ _{2}^{2}}{2 ϕ} if ∣∣ γ_{j} ∣ ∣_{2} \leq λ ϕ . .5 λ^{2} ϕ if ∣∣ γ_{j} ∣ ∣_{2} > λ ϕ .

P_{M C P, λ, ϕ} (∣∣ γ_{j} ∣ ∣_{2}) = {λ ∣∣ γ_{j} ∣ ∣_{2} - \frac{∣∣ γ _{j} ∣ ∣ _{2}^{2}}{2 ϕ} if ∣∣ γ_{j} ∣ ∣_{2} \leq λ ϕ . .5 λ^{2} ϕ if ∣∣ γ_{j} ∣ ∣_{2} > λ ϕ .

G (s, t) = k = 1 \sum \infty λ_{k} ϕ_{k} (s) ϕ_{k} (t),

G (s, t) = k = 1 \sum \infty λ_{k} ϕ_{k} (s) ϕ_{k} (t),

\hat{Σ} (s, t) = k = 1 \sum K \hat{λ_{k}} \hat{ϕ_{k}} (s) \hat{ϕ_{k}} (t) + \hat{σ^{2}} I (s = t),

\hat{Σ} (s, t) = k = 1 \sum K \hat{λ_{k}} \hat{ϕ_{k}} (s) \hat{ϕ_{k}} (t) + \hat{σ^{2}} I (s = t),

Y_{i} (t) = β_{0} (t) + j = 1 \sum 20 Z_{ij} (t) β_{j} (t) + ϵ_{i} (t), i = 1, 2, \dots, n, t \in [0, 100] .

Y_{i} (t) = β_{0} (t) + j = 1 \sum 20 Z_{ij} (t) β_{j} (t) + ϵ_{i} (t), i = 1, 2, \dots, n, t \in [0, 100] .

ϵ_{i} (t) = ξ_{i 1} cos (t) + ξ_{i 2} s in (t) + N (0, 1),

ϵ_{i} (t) = ξ_{i 1} cos (t) + ξ_{i 2} s in (t) + N (0, 1),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rahulfrodo/FLCM_Selection
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Variable Selection in Functional Linear Concurrent

Regression

Rahul Ghosal1

Arnab Maity1

Timothy Clark2 and Stefano B Longo2

1 North Carolina State University (Department of Staistics)

2 North Carolina State University (Department of Sociology and Anthropology)

Abstract

We propose a novel method for variable selection in functional linear concurrent regression. Our research is motivated by a fisheries footprint study where the goal is to identify important time-varying socio-structural drivers influencing patterns of seafood consumption, and hence fisheries footprint, over time, as well as estimating their dynamic effects. We develop a variable selection method in functional linear concurrent regression extending the classically used scalar on scalar variable selection methods like LASSO, SCAD, and MCP. We show in functional linear concurrent regression the variable selection problem can be addressed as a group LASSO, and their natural extension; group SCAD or a group MCP problem. Through simulations, we illustrate our method, particularly with group SCAD or group MCP penalty, can pick out the relevant variables with high accuracy and has minuscule false positive and false negative rate even when data is observed sparsely, is contaminated with noise and the error process is highly non-stationary. We also demonstrate two real data applications of our method in studies of dietary calcium absorption and fisheries footprint in the selection of influential time-varying covariates.

keywords:

Functional Linear Concurrent Regression; Variable Selection; Fisheries Footprint

1 Introduction

Function on function regression is an active area of research in functional data with new statistical methods frequently emerging to address data where both the response variable and the covariates are functions over some continuous index such as time. The functional concurrent regression model is a special case of function on function regression where the predictor variables influence the response variable only through their value at the current time point (Kim et al., 2018). The commonly used functional linear concurrent regression model assumes a linear relationship between the response and the predictors, where the value of the response at a particular time point is modelled as a linear combination of the covariates at that specific time point, and the coefficients of the functional covariates are univariate smooth functions over time (Ramsay and Silverman, 2005). Multiple methods exist in literature for estimation of these regression functions in functional linear concurrent regression and the closely related varying coefficient model (Hastie and Tibshirani, 1993), using kernel-local polynomial smoothing (Wu et al., 1998; Hoover et al., 1998; Fan and Zhang, 1999; Kauermann and Tutz, 1999), polynomial spline (Huang et al., 2002, 2004), smoothing spline (Hastie and Tibshirani, 1993; Hoover et al., 1998; Chiang et al., 2001; Eubank et al., 2004) among many others. Similar to classical scalar regression, when there are a large number of covariates present, the primary interest might be to select only the set influential variables and estimate their effects. While doing significance testing and building confidence bands can help for assessing the individual effect of a predictor, they are computationally infeasible when the number of covariates is large. Thus arises the need to perform variable selection in functional linear concurrent regression.

Our research in this article is motivated by a fisheries footprint study where the goal is to identify important time-varying socio-structural and economic drivers influencing fisheries footprint (Global Footprint Network’s measure of the total marine area required to produce the amount of seafood products a nation consumes) and to estimate their time-varying effects. Although, a number of variable selection methods have been developed for scalar on function regression (Gertheiss et al., 2013; Fan et al., 2015) and function on scalar regression (Chen et al., 2016), literature for variable selection in functional linear concurrent regression is relatively sparse. Recently Goldsmith and Schwartz (2017) developed a variable selection method for functional linear concurrent model using a variational Bayes approach with sparsity being introduced through a spike and slab prior on the coefficients of the basis expansion of the regression functions. In this article, we propose a variable selection method in functional linear concurrent regression extending the classically used variable selection methods like LASSO (Tibshirani, 1996), SCAD (Fan and Li, 2001), and MCP (Zhang, 2010).

Our work is inspired by Gertheiss et al. (2013), where they show the variable selection problem in scalar on function regression scenario can be reduced to a group LASSO (Yuan and Lin, 2006) problem. We have shown in functional linear concurrent regression the variable selection problem can be addressed as a group LASSO, and their natural extension group SCAD or group MCP problem. Chen et al. (2016) also used group MCP for their variable selection in function on scalar regression. Our model is fundamentally different from them in the sense the covariates we consider are time-varying functions and possibly observed with measurement error. Our method is similar to Wang et al. (2008) in which they use a group SCAD penalty for variable selection in varying coefficient models, but we propose a different penalty on the coefficient functions which simultaneously penalizes departure from sparsity as well as roughness of the coefficient functions, and our research shows there is much to be gained by using the group MCP penalty. We employ a pre-whitening procedure similar to Chen et al. (2016) to take into account the possible temporal dependence present within functions. We also consider that the covariates might be contaminated with measurement error and therefore use functional principal component analysis (FPCA) to get denoised trajectories of the covariates, which improves the estimation accuracy of our approach. Through simulations, we illustrate the proposed method, particularly with group SCAD or group MCP penalty, can pick out the relevant variables with very high accuracy and has minuscule false positive and false negative rates even when data is observed sparsely and is contaminated with measurement error. We demonstrate two real data applications of our method in the study of dietary calcium absorption (Davis, 2002) and the fisheries footprint study in selection of the influential time-varying covariates.

The rest of the article is organized as follows. In Section 2, we present our modelling framework and illustrate our variable selection method. In Section 3, we conduct a simulation study to evaluate the performance of our method and summarize the simulation results. In Section 4, we go back to the two real data examples; calcium absorption study and fisheries footprint study, apply our variable selection method to find out the influential covariates and present our findings. We conclude in Section 5 with a discussion about some limitations and possible extensions of our work.

2 Methodology

2.1 Modelling Framework and Variable Selection Method

Suppose that the observed data for the $i$ -th subject is $\{Y_{i}(t),Z_{i1}(t),Z_{i2}(t),\ldots,Z_{ip}(t)\}$ ( $i=1,2,\ldots,n$ ), where $Y_{i}(\cdot)$ is a functional response and $Z_{i1}(\cdot)$ , $Z_{i2}(\cdot)$ , $\ldots,Z_{ip}(\cdot)$ are the corresponding functional covariates. We assume the covariates and the response are observed on a fine and regular grid of points $S=\{t_{1},t_{2},\ldots,t_{m}\}\subset S=[0,T]$ for some $T>0$ , and the covariates are measured without any error. We discuss later in this section how our model and the proposed method can be easily extended to accommodate more general scenarios where the covariates are contaminated with measurement error and observed sparsely. We consider a functional linear concurrent regression model of the form,

[TABLE]

where $\beta_{j}(t)$ ( $j=1,2,\ldots,p$ ) are smooth functions (with finite second derivative) representing the functional regression parameters. We assume $Z_{ij}(\cdot)$ are independent and identically distributed (i.i.d.) copies of ${Z}_{j}(\cdot)$ ( $j=1,2,\ldots,p$ ), where ${Z}_{j}(\cdot)$ s are underlying smooth stochastic processes. We further assume $\epsilon_{i}(\cdot)$ are i.i.d copies of $\epsilon(\cdot)$ , which is a mean zero stochastic process. The model (1) in stacked form can be rewritten as $Y(t)=Z(t)\beta(t)+\epsilon(t)$ . Generally in functional linear concurrent regression, estimation is done (Ramsay and Silverman, 2005) by minimizing the penalized residual sum of square, $SSE(\beta)=\int r(t)^{T}r(t)dt+\sum_{j=1}^{p}\lambda_{j}\int(L_{j}\beta_{j}(t))^{2}dt$ , where $r(t)=Y(t)-Z(t)\beta(t)$ . For example when $L_{j}=I$ , we minimize $\int r(t)^{T}r(t)dt+\sum_{j=1}^{p}\lambda_{j}\int(\beta_{j}(t))^{2}dt$ . Now suppose $\{\theta_{kj}(t),k=1,2,\ldots,k_{j}\}$ is a set of known basis functions for $j=1,2,\ldots,p$ . We model the unknown coefficient functions using basis function expansion as $\beta_{j}(t)=\sum_{k=1}^{k_{j}}b_{kj}\theta_{kj}(t)=\bm{\theta}_{j}(t)^{T}\mathbf{b}_{j}$ , where $\bm{\theta}_{j}(t)=[\theta_{1j}(t),\theta_{2j}(t),\ldots,\theta_{K_{j}j}(t)]^{T}$ and $\mathbf{b}_{j}=(b_{1j},b_{2j},\ldots,b_{k_{j}j})^{T}$ is a vector of unknown coefficients. In this article, we use B-spline basis functions, however, other basis functions can be used as well. Then the minimization in the example mentioned above can be carried out by minimizing $\int\{Y(t)-Z(t)\bm{\Theta}(t)\mathbf{b}\}^{T}\{Y(t)-Z(t)\bm{\Theta}(t)\mathbf{b}\}dt+\mathbf{b}^{T}\amsmathbb{R}\mathbf{b}$ . Here $\mathbf{b}$ , $\bm{\Theta}(t)$ and penalty matrix $\amsmathbb{R}$ are defined in stacked form as $\mathbf{b}=(\mathbf{b}_{1}^{T},\mathbf{b}_{2}^{T},\ldots,\mathbf{b}_{p}^{T})^{T}$ , $\bm{\Theta}(t)=\{\bm{\theta}_{1}(t)^{T},\bm{\theta}_{2}(t)^{T},\ldots,\bm{\theta}_{p}(t)^{T}\}$ and $\amsmathbb{R}=diag(\amsmathbb{R}_{1},\amsmathbb{R}_{2},\ldots,\amsmathbb{R}_{p})$ , where $\amsmathbb{R}_{j}=\lambda_{j}\mathbf{b}_{j}^{T}\{\int\bm{\theta}_{j}(t)\bm{\theta}_{j}(t)^{T}dt\}\mathbf{b}_{j}$ . For our variable selection method, we define penalty on the regression functions $\beta_{j}(\cdot)$ as, $P_{\lambda,\psi}\{\beta_{j}(\cdot)\}=\lambda[\int\beta_{j}(t)^{2}dt+\psi\int\{\beta_{j}^{{}^{\prime\prime}}(t)\}^{2}dt]^{1/2}=\lambda\left(\mathbf{b}_{j}^{T}\amsmathbb{R}_{j}\mathbf{b}_{j}+\psi\mathbf{b}_{j}^{T}\amsmathbb{Q}_{j}\mathbf{b}_{j}\right)^{1/2}\\ =\lambda\left(\mathbf{b}_{j}^{T}\amsmathbb{K}_{\psi,j}\mathbf{b}_{j}\right)^{1/2},$ where $\amsmathbb{K}_{\psi,j}=\amsmathbb{R}_{j}+\psi\amsmathbb{Q}_{j}$ , $\amsmathbb{R}_{j}=\{\int\bm{\theta}_{j}(t)\bm{\theta}_{j}(t)^{T}dt\}$ , $\amsmathbb{Q}_{j}=\{\int\bm{\theta}_{j}^{{}^{\prime\prime}}(t)\bm{\theta}_{j}^{{}^{\prime\prime}}(t)^{T}dt\}$ . This penalty was originally proposed by Meier et al. (2009) and later used by Gertheiss et al. (2013) for their variable selection method in scalar on function regression. The parameter $\psi\geq 0$ controls the amount of penalization on the roughness penalty. The proposed penalty simultaneously penalizes departure from sparsity and roughness of the coefficient functions ensuring the resulting coefficient functions are smooth and small coefficient functions are shrunk to zero introducing sparsity. Subsequently, we propose to minimize the following penalized mean sum of squares of the residuals for performing variable selection,

[TABLE]

Since we assume data is observed on a dense equispaced grid, the variable selection in practice is carried out by minimizing the following equivalent criterion,

[TABLE]

Now using Cholesky decomposition of $\amsmathbb{K}_{\psi,j}=\amsmathbb{L}_{\psi,j}\amsmathbb{L}_{\psi,j}^{T}$ and denoting $\bm{\gamma}_{j}=\amsmathbb{L}_{\psi,j}^{T}\mathbf{b}_{j}$ , the penalized sum of square of residuals can be reformulated as,

[TABLE]

where ${\mathbf{Y}_{i}}=(Y_{i}(t_{1}),Y_{i}(t_{2}),\ldots,Y_{i}(t_{m}))^{T}$ , and $\bm{\gamma}=(\bm{\gamma}_{1}^{T},\bm{\gamma}_{2}^{T},\ldots,\bm{\gamma}_{p}^{T})^{T}$ and $\amsmathbb{Z}_{i}^{*}$ is defined as follows,

[TABLE]

where $\amsmathbb{Z}_{i}^{*j}$ refers to the $j$ th block column in this matrix. We recognize this minimization problem as performing a group LASSO (Yuan and Lin, 2006), where the grouping is introduced by covariates. In particular, we obtain estimates of $\bm{\gamma}_{j}$ by minimizing similar penalized least square as in group LASSO namely;

[TABLE]

We extend this group LASSO formulation to non-convex penalties, which are known (Breheny and Huang, 2015; Mazumder et al., 2011) to produce sparser solutions especially when there are large number of variables. In particular, we propose to use two non convex penalties; SCAD (Fan and Li, 2001) and MCP (Zhang, 2010). These two penalties overcome the high bias problem of LASSO as they relax the rate of penalization as the magnitude of the coefficient gets large. SCAD and MCP have been shown to ensure selection consistency and estimation consistency under standard assumptions in the scalar regression case. They also enjoy the so-called oracle property in which they behave like oracle MLE asymptotically. Unlike adaptive LASSO, these methods do not require initial estimates of weights. These facts motivate us to use them in our functional variable selection context. Then the problem of variable selection reduces to a group SCAD or group MCP problem in our modeling setup as follows.

Group SCAD Method

In this method, we perform variable selection and obtain estimates of $\bm{\gamma}$ using a penalized least square criterion as in (4), where we now use a group SCAD penalty on the coefficients instead of group LASSO. In particular we estimate,

[TABLE]

where $P_{SCAD,\lambda,\phi}(||\bm{\gamma}_{j}||_{2})$ is defined in the following way:

[TABLE]

Group MCP Method

For Group MCP method we estimate $\bm{\gamma}$ as

[TABLE]

where $P_{MCP,\lambda,\phi}(||\bm{\gamma}_{j}||_{2})$ is defined as :

[TABLE]

2.2 Incorporating Covariance Structure into Variable Selection

The variable selection method proposed in Section 2.1 does not account for possible correlation in the error process. In reality, however, temporal correlation is more likely to be present within functions. While using an independent working correlation structure can yield consistent and unbiased estimates, incorporating the true covariance structure in the variable selection criterion (4), (5), or (6) may give definite gains in terms of performance, as illustrated by Chen et al. (2016). We follow a similar pre-whitening procedure employed by Chen et al. (2016); Kim et al. (2018) to take into account the correct covariance structure. We assume the error process $\epsilon(t)$ has the form $\epsilon(t)=V(t)+w_{t}$ , where $V(t)$ is a smooth mean zero stochastic process with covariance kernel $G(s,t)$ and $w_{t}$ is a white noise with variance $\sigma^{2}$ . The covariance function of the error process is then given by $\Sigma(s,t)=cov\{\epsilon(s),\epsilon(t)\}=G(s,t)+\sigma^{2}I(s=t)$ . For data observed on dense and regular grid, the covariance matrix of the residual vector is the given by, $\mathbb{\Sigma}$ =diag { ${\mathbb{\Sigma}_{m\times m},\mathbb{\Sigma}_{m\times m},\ldots,\mathbb{\Sigma}_{m\times m}}$ }, where $\mathbb{\Sigma}_{m\times m}$ denotes the covariance kernel $\Sigma(s,t)$ evaluated at $S=\{t_{1},t_{2},\ldots,t_{m}\}$ . Now if $\mathbb{\Sigma}_{m\times m}$ is known, redefining $\mathbf{Y}_{i}$ and $\amsmathbb{Z}_{i}^{*j}$ as $\mathbf{Y}_{i}=\{\mathbb{\Sigma}_{m\times m}^{-1/2}\}\mathbf{Y}_{i}$ , $\amsmathbb{Z}_{i}^{*j}=\{\mathbb{\Sigma}_{m\times m}^{-1/2}\}\amsmathbb{Z}_{i}^{*j}$ , the same penalized criterion (4), (5) or (6) can be used to perform variable selection.

In reality $\mathbb{\Sigma}$ is unknown, and we need an estimator $\hat{\mathbb{\Sigma}}$ . In the context of functional data, we want to estimate $\Sigma(\cdot,\cdot)$ nonparametrically. If we had the original residuals $\epsilon_{ij}$ available, we could use functional principal component analysis (FPCA), e.g., Yao et al. (2005) or Zhang and Chen (2007) to estimate $\Sigma(s,t)$ . If the covariance kernel $G(s,t)$ of the smooth part $V(t)$ is a Mercer kernel (Mercer, 1909), by Mercer’s theorem $G(s,t)$ must have a spectral decomposition

[TABLE]

where $\lambda_{1}\geq\lambda_{2}\geq\ldots 0$ are the ordered eigenvalues and $\phi_{k}(\cdot)$ s are the corresponding eigenfunctions. Thus we have the decomposition $\Sigma(s,t)=\sum_{k=1}^{\infty}\lambda_{k}\phi_{k}(s)\phi_{k}(t)+\sigma^{2}I(s=t)$ . Given $\epsilon_{t_{ij}}=V(t_{ij})+w_{ij}$ , one could employ FPCA based methods to get $\hat{\phi}_{k}(\cdot)$ , $\hat{\lambda}_{k}$ s and $\hat{\sigma^{2}}$ . So an estimator of $\Sigma(s,t)$ can be formed as

[TABLE]

where $K$ is large enough for the convergence to hold and is typically chosen such that percent of variance explained (PVE) by the selected eigencomponents exceeds some pre-specified value such as $99\%$ or $95\%$ . In reality, we don’t have the original residuals $\epsilon_{ij}$ and use the full model (1) to obtain residuals $e_{ij}=Y_{i}(t_{j})-\hat{Y_{i}}(t_{j})$ . Then treating $e_{ij}$ as our original residuals, we obtain $\hat{\Sigma}(s,t)$ using FPCA.

Remark 1: We use cubic B-spline basis with the same number of basis functions to model the regression functions $\beta_{j}(t)$ s, where the number of basis is large so the basis is rich enough. For selection of the tuning parameter $\psi$ (for smoothness) and the penalty parameter $\lambda$ , we use the Extended Bayesian information criteria (EBIC) (Chen and Chen, 2008) corresponding to the equivalent linear model of criterion (4), (5) or (6) and this has shown good performance in our simulation study. Chen and Chen (2008) established consistency of EBIC under standard assumptions and illustrated its superiority over other methods like cross-validation, AIC, and BIC, which tend to over select the variables. For tuning parameter $\phi$ we use the values $4$ for SCAD and $3$ for MCP, as proposed by the original authors. For model fitting we use ‘grpreg’ package (Breheny, 2019) in R.

Remark 2: In practice, we recommend standardizing the variables either using Euclidean norm (automatically performed in ‘grpreg’) or using FPCA based methods ( $X^{*}_{j}(t)=\frac{X_{j}(t)-\mu_{j}(t)}{\sigma_{j}(t)}$ ), which is especially useful for highly sparse data where some B-splines might not have observed data on its support. This can help in faster convergence of the proposed method. We performed both the standardization methods in our simulation studies and obtained very similar results.

2.3 Extension to Sparse data and Noisy Covariates

More generally we can consider the case where data is observed sparsely and covariates are observed with measurement error. This is most often the case for longitudinal data. Here the observed data is the response {( $Y_{i}(t_{ij}),t_{ij}),j=1,2,\ldots,m_{i}$ } and the observed covariates {( $U_{1}(t_{1ij}),t_{1ij}),j=1,2,\ldots,m_{1i}$ }, {( $U_{2}(t_{2ij}),t_{2ij}),j=1,2,\ldots,m_{2i}$ },…,{( $U_{p}(t_{pij}),t_{pij}),\hskip 2.84526ptj=1,2,\ldots,m_{pi}$ }. Let us denote $U_{k}(t_{kij})$ s, ( $k=1,2,3,\ldots,p$ ) by $U_{ijk}$ . Here $U_{ijk}$ s represent the observed covariates with measurement error, i.e., we have $U_{ijk}=Z_{k}(t_{kij})+e_{ijk}$ for $i=1,2,\ldots,n$ , $j=1,2,\ldots,m_{ki}$ and $k=1,2,\ldots,p$ . The measurement error $e_{ijk}$ are assumed to be white noises with zero mean and variance $\sigma_{k}^{2}$ . In sparse data set up it is generally assumed (Kim et al., 2018) although individual number of observations $m_{i}$ is small, $\bigcup_{i=1}^{n}\bigcup_{j=1}^{m_{i}}{t_{ij}}$ is dense in $[0,T]$ . Then we reconstruct the original curves from the observed sparse and noisy curves using FPCA methods (Yao et al., 2005) by estimating the eigenvalues and eigenfunctions corresponding to the original curves. Li and Hsing (2010) proved uniform convergence of the mean, eigenvalues and eigenfunctions associated with the curves for both dense and in particular sparse design under suitable regularity conditions. For prediction of the scores, we use PACE method as in Yao et al. (2005). Then these estimates are put together using Karhunen-Lo $\grave{e}$ ve expansion (Karhunen, Loeve 1946) to get estimates $\hat{Z}_{ik}(\cdot)$ of the true curves $Z_{ik}(\cdot)$ as $\hat{Z}_{ik}(t)=\hat{\mu}_{k}(t)+\sum_{s=1}^{S}\hat{\zeta}_{isk}\hat{\psi}_{sk}(t)$ , where the number of eigenfunctions $S$ to use is chosen using the percent of variance explained (PVE) criterion, which is the percentage of variance explained by the first few eigencomponents. Alternatively one can also use multivariate FPCA (Happ and Greven, 2018) instead of running FPCA on each predictor variable separately. Then for sparse data observed on irregular grid and observed with measurement error, we use { $Y_{i}(t_{ij}),\hat{Z}_{i1}(t_{ij}),\hat{Z}_{i2}(t_{ij}),\ldots,\hat{Z}_{ip}(t_{ij})j=1,2,\ldots,m_{i}\}_{i=1}^{n}$ as our original data for performing variable selection.

3 Simulation Study

3.1 Simulation Setup

In this section, we evaluate the performance of our variable selection method using a simulation study. To this end we generate data from the model,

[TABLE]

The regression functions are given by $\beta_{0}(t)=8sin(\pi t/50)$ , $\beta_{1}(t)=5sin(\pi t/100)$ , $\beta_{2}(t)=4sin(\pi t/50)+4cos(\pi t/50)$ , $\beta_{3}(t)=25e^{-t/20}$ and rest of the $\beta_{j}(t)=0$ for $j=4,5,6,\ldots,20$ , i.e., the last 17 covariates are not relevant. The original covariates $Z_{ij}(\cdot)\stackrel{{\scriptstyle iid}}{{\sim}}Z_{j}(\cdot)$ , where $Z_{j}(t)$ ( $j=1,2,\ldots,20$ ) are given by $Z_{j}(t)=a_{j}\hskip 2.84526pt\sqrt[]{2}sin(\pi jt/400)+b_{j}\hskip 2.84526pt\sqrt[]{2}cos(\pi jt/400)$ , where $a_{j}\sim\mathcal{N}(50,2^{2})$ , $b_{j}\sim\mathcal{N}(50,2^{2})$ . We moreover assume that $Z_{ij}(t)$ are observed with measurement error i.e., we observe $U_{ij}(t)=Z_{ij}(t)+\delta_{j}$ , where $\delta_{j}\sim\mathcal{N}(0,0.6^{2})$ . The error process $\epsilon_{i}(t)$ is generated as follows;

[TABLE]

where $\xi_{i1}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{N}(0,0.5^{2})$ and $\xi_{i2}\stackrel{{\scriptstyle iid}}{{\sim}}\mathcal{N}(0,0.75^{2})$ . The response $Y_{i}(t)$ and noisy covariate $U_{ij}(t)$ ’s are observed sparsely for randomly chosen $m_{i}$ points in $S$ , the set of $m=81$ equidistant time points in $[0,100]$ and $m_{i}\stackrel{{\scriptstyle iid}}{{\sim}}Unif\{30,31,\ldots,41\}$ . Three sample sizes $n\in\{100,200,400\}$ are considered. For each sample size, we use 500 generated datasets for evaluation of our method.

3.2 Simulation Results

Our primary interest is selection (identification) of the relevant covariates $Z_{1}(\cdot),Z_{2}(\cdot),Z_{3}(\cdot)$ and estimating their effects $\beta_{1}(t),\beta_{2}(t),\beta_{3}(t)$ accurately. As the covariates are observed sparsely and with measurement error, we apply FPCA as discussed in Section 2.3 with PVE= $99\%$ and obtain the denoised curves $\hat{Z}_{ij}(t)$ before applying our variable selection method. We apply the proposed variable selection method with and without the pre-whitening procedure mentioned in Section 2.2. Table 1 and Table 2 display the selection percentage of each variable for each of the three selection methods discussed in Section 2 and for the three sample sizes $n=100,200,400$ , for the non pre-whitened and pre-whitened case respectively. We use the acronyms FLASSO (Functional LASSO), FSCAD (Functional SCAD) and FMCP (Functional MCP) respectively for the proposed variable selection methods for FLCM. We expect that the group LASSO selection method to have a higher false positive rate and use this as a benchmark for comparison. It can be seen from Table 1 and 2 that all the three methods (group LASSO, group SCAD, group MCP) pick out the three true covariates $Z_{1}(\cdot),Z_{2}(\cdot),Z_{3}(\cdot)$ ; $100\%$ of the time. The group LASSO method has a high false positive selection percentage as can be seen in both Table 1 and Table 2, with selection accuracy improving with increasing sample size. The group SCAD and group MCP method, on the other hand, have a false selection percentage in the range of $0.2\%-1\%$ for non pre-whitened case and exactly $0\%$ for pre-whitened case. In other words, the group SCAD and group MCP method are able to identify the true model using the pre-whitening procedure. In scalar regression, SCAD and MCP are known to produce sparser solutions than LASSO due to its concave nature, and here also in the context of variable selection in functional linear concurrent regression, we observe these two methods (their group extension) outperforming LASSO. The average model sizes for each scenario are also given in Table 1, and the group SCAD and group MCP method produce smaller and closer values to the true model size 3 (exactly 3 with pre-whitening procedure in Table 2). These results also illustrate the benefit of pre-whitening and henceforth we have used pre-whitening as a preprocessing step to perform variable selection using the proposed methods.

Next, as an assessment of the accuracy of the estimates $\hat{\beta}_{k}(t)$ ( $k=1,2,3$ ), we plot the true regression curves overlaid by their Monte Carlo (MC) mean estimate from the three methods. MC point-wise confidence intervals ( $95\%$ ) (corresponding to point-wise 2.5 and 97.5 percentiles of the estimated curves over 500 replicates) for each of the three curves are also displayed to

asses variability of the estimates. Figure 1 displays this plot for $n=200$ , the plots for $n=100,n=400$ are similar with more accuracy and less variability for larger sample sizes. The group LASSO estimates (dashed line) have a larger bias which is again expected, as LASSO is known to have a relatively high bias when magnitude of the regression coefficient is large. The group SCAD (dotted line) and group MCP (dashed-dotted line) estimates have almost identical accuracy and variability as seen from Figure 1; they have superimposed on each other and on the true curves represented by solid lines.

To further evaluate the performance of the estimates we calculate the absolute bias and the MC mean square error of the estimates averaged across $100$ equally spaced grid points in $[0,100]$ , for all the selection methods and the three sample sizes. This is displayed in Table 3. We again observe group SCAD and group MCP method outperforming the group LASSO method, in terms of both absolute bias and mean square error, the performance of the estimators improving with increasing sample size. We compared these mean square errors of the estimates arising from pre-whitening procedure with the same from non pre-whitening procedure and found these only to be marginally higher, which is expected due to the uncertainty associated with estimating the covariance matrix. The mean square errors appear to be converging to zero across all the three methods with increase in sample size indicating consistency of the estimators. The simulation results illustrate superior performance of the proposed group SCAD (FSACD) or group MCP (FMCP) based selection method in the context of functional linear concurrent model and are the recommended methods of this article.

4 Real Data Applications

In this section, we demonstrate application of our variable selection method in selection of influential time-varying predictors in two real data studies. For performing variable selection, we use only the FSCAD and FMCP method along with the initial pre-whitening procedure, as the group LASSO method yields a significantly higher false positive rate which is illustrated by our simulations. We first consider a small dietary calcium absorption dataset (three time-varying covariates) with added pseudo covariates as an illustration of our method. Addition of pseudo covariates is a popular way (Wang et al., 2008; Wu et al., 2007; Miller, 2002) of assessing false selection rate in real datasets. Pseudo variables can, therefore, be used effectively for tuning variable selection procedures. We show that our proposed method is able to select the relevant predictors and discard the pseudo variables successfully. Finally, we apply our variable selection method to the fisheries dataset to find out relevant socio-economic drivers influencing fisheries footprint of nations over time.

4.1 Study of Dietary Calcium Absorption

We consider the study of dietary calcium absorption in Davis (2002). In this study, the subjects are a group of 188 patients. We have data on calcium absorption ( $Y(t)$ ), dietary calcium intake ( $Z_{1}(t)$ ), BMI ( $Z_{2}(t)$ ) and BSA (Body surface area) ( $Z_{3}(t)$ ) of these patients, at irregular time points between 35 and 64 years of their ages. At the beginning of the study patients aged between 35 and 45 years and subsequent observations were taken approximately every 5 years. The number of repeated measurements for each patient varies from 1 to 4. Figure 2 displays the individual curves of patients’ calcium absorption, calcium intake, BSA, BMI along their ages.

We are primarily interested in finding out which covariates influence calcium absorption profile of the patients. Kim et al. (2018) also investigated the effect of calcium intake on calcium absorption using an additive nonlinear functional concurrent model, and found the effect to be more or less linear while comparing to a functional linear concurrent model. So we use functional linear concurrent regression to model the dependence of calcium absorption on calcium intake, BSA and BMI. As data is observed very sparsely and the original covariates might be observed with measurement error, we apply FPCA methods (PVE = $95\%$ ) as discussed in Section 2.3 and get the denoised trajectories $\hat{Z}_{j}(t)$ for $j=1,2,3$ . We expect that, calcium intake among the three covariates will be associated with calcium absorption. We add 15 pseudo covariates by simulating from the following functional model to illustrate the selection performance and false positive rate of our variable selection method. We generate $Z_{ij}(\cdot)\stackrel{{\scriptstyle iid}}{{\sim}}Z_{j}(\cdot)$ where $Z_{j}(t)$ ( $j=4,5,\ldots,18$ ) are given by $Z_{j}(t)=a_{j}\hskip 2.84526pt\sqrt[]{2}sin(\pi(j-3)t/200)+b_{j}\hskip 2.84526pt\sqrt[]{2}cos(\pi(j-3)t/200)$ , where $a_{j}\sim\mathcal{N}(0,(2)^{2})$ , $b_{j}\sim\mathcal{N}(0,(2)^{2})$ . So in total, we have 18 covariates, where the first 3 are the denoised original covariates and rest are simulated predictors. Then we apply our variable selection method to $Y(t)$ and $\hat{Z}_{1}(t),\hat{Z}_{2}(t),\hat{Z}_{3}(t),Z_{4}(t),Z_{5}(t),\ldots,Z_{18}(t)$ . We repeat this a large number of times and observe which variables are being selected in each iteration. We expect our variable selection method to pick out the truly influential predictors and ignore the randomly generated functional covariates the majority of the time. To illustrate the benefit of using our proposed variable selection method in functional regression model for this particular data we compare its performance to a backward selection method which uses model selection criterion like BIC or Mallows’ Cp, under a linear model approach (using an independent working correlation structure), and to a penalized generalized estimating equations (PGEE) procedure (Wang et al., 2012) which was developed to analyze longitudinal data with a large number of covariates. We use the ‘PGEE’ package in R (Inan and Wang, 2017) for implementing the penalized generalized estimating equations procedure under two different working correlation structure (independent and AR (1)).

Table 4 illustrates the selection percentage of each of the variables under different methods. We notice that both the proposed FSCAD and FMCP method identify calcium intake ( $Z_{1}(t)$ ) as a significant predictor $100\%$ of the time. All other variables including all the pseudo covariates are ignored in $100\%$ of the iterations. On the other hand, BIC, Cp, and PGEE exhibit a high false selection percentage for the pseudo variables. The case of selection of BSA or BMI appears to be over selection as their individual effects were not found to be statistically significant. This demonstrates when the underlying model is functional, the use of naive variable selection methods using scalar regression techniques can lead to wrong inference as they don’t account for functional nature of the data.

As calcium intake is the only significant variable selected by both the proposed methods we want to estimate its effect and also get a measure of uncertainty of our estimate. For this purpose, we use a subject-level bootstrap on our original data (no pseudo covariates added) while performing variable selection to come up with an estimated regression curve $\hat{\beta}_{1}(t)$ and a pointwise confidence interval for the effect of calcium intake. This is displayed in Figure 3. We notice as calcium intake increase calcium absorption should decrease particularly until age $60$ years, as $\hat{\beta}_{1}(t)<0$ up to this age and the confidence interval strictly lies below zero, which might be due to dietary calcium saturation or due to interaction with some other elements in the body; although the overall magnitude of the effect seems to decrease with age. Above age $60$ , the estimate appears to have high variability associated with it, which is primarily because we have very few observations ( $5.62\%$ ) above this mark (illustrated in Figure 3). The uncertainty in estimating $\hat{\mathbb{\Sigma}}$ using FPCA and the uncertainty due to bootstrap is reflected in its variability. Hence, some care should be taken in interpretation of the estimated regression curve beyond 60 years because of such high uncertainty.

4.2 Study of Fisheries Footprint

Production of fisheries is a source of protein as well as an economic livelihood across the world. Along with the increasing global population, the importance of fish production and consumption has steadily increased through the modern era. Fisheries Footprint is defined as the Global Footprint Network’s measure of total marine area required to sustain consumption levels of aquatic production of fish, crustacean (e.g., shrimp), shellfish, and seaweed from captures and aquaculture; so the fisheries footprint basically represents the coastal and marine area required to sustain the amount of seafood products a nation consumes. As pointed out by Longo and Clark (2016), the interaction between marine and social systems calls for further sociological analysis.

Over the last two decades, social scientists have accomplished much in advancing scholarly knowledge on the social drivers of ecological impact at a macro-scale. Such work is essential, as ecological problems are becoming increasingly interlinked and severe at a global or planetary scale (Steffen et al., 2011). Over time, for example, economic development, population structuring (e.g., urbanization or age structure), trade relations, and technological change are shown to affect measures of environmental impact across nations over time (Clark and Longo, 2019; Jorgenson and Clark, 2010; York et al., 2003). This body of literature centers on the ecological affects of globalization and modernization, under the socio-structural parameters of a capitalist economy. There is still much debate over the impacts of industrial and agricultural modernization on ecosystems. For example, development and resource economics literature (World Bank, 2007) advocate for the utilization of innovation and techno-improvements to improve marine system sustainability and food security (Valderrama and Anderson, 2010), while, on the other hand, environmental sociologists demonstrate that such innovation, chiefly aquaculture, does not displace the deleterious ecological impacts of capture fisheries (Longo et al., 2019). Nevertheless, despite such progress, fisheries footprint remains an understudied metric, and its drivers are less understood in social research (Clark et al., 2018; Jorgenson et al., 2005).

The goal of this study, therefore, is to identify the relevant socio-economic drivers such as levels of economic development, population size, and transformations in food-system dynamics that influence fisheries footprint of nations over time and also to capture their time-varying effects. Data for this study is collected from the World Bank, Fish StatJ of UN FAO, and Ecological Footprint Network for years between 1970-2009, across 136 nations. The main dependent variable of interest in this study is fisheries footprint. Figure 4 displays the fisheries footprint of the nations over the study years in log scale. Fisheries footprint of three representative nations are plotted using solid, dashed and dotted lines.

To capture the trend over the years, we plot the mean fisheries footprint of the nations along with their pointwise $95\%$ confidence interval. This is displayed in Figure 5. We notice an overall upward trend as well as heterogeneity across years.

There are 20 independent time-varying covariates in the study, broadly covering various sectors of population dynamics (e.g. population density, urban population, total population, working age population percentage etc), agriculture (e.g. tractor, agriculture value added etc), food consumption and other fisheries variables (e.g. meat consumption, aquaculture production tons etc), international relations (e.g. food export as percentage of merchandise, FDI inflow etc), and economy (e.g. GDP per capita at constant U.S. dollar , trade percentage of GDP etc). The full list of the variables are given in Table 5.

The predictors here are also time-varying and can be expected to have dynamic effects on fisheries footprint. Generally panel data methods like fixed effects or random effects modelling is used (Torres-Reyna, 2007; Clark and Longo, 2019) for analyzing such data where the effect the covariates are taken to be constant, since we are interested in dynamic effects of the covariates, the FLCM can be seen as a generalization of this approach with time-varying effects of predictors. Therefore we use a functional linear concurrent regression model (1) discussed in this article to model the dynamic effects of the socio-economic predictors on fisheries footprint. The predictors in their original scale are also very large in magnitude, therefore converted into log scale; the covariates observed as percentages are used without any conversion. Before applying our variable selection method all the covariates are preprocessed using FPCA methods (PVE = $95\%$ ) as discussed in Section 2.3.

We use the pre-whitening procedure discussed in Section 2.2 and apply our proposed variable selection method. Out of 20 covariates, the proposed FSCAD and FMCP method both identify GDP per capita and urban population as the two significant predictors. Gross domestic product (GDP) is a measure of the market value of all the final goods and services produced in a specific time period, and GDP per capita is a measure of a country’s economic output adjusting for its number of people. As the major economic indicator GDP per capita is associated with primary aspects of economic growth, consumer behaviour, trade and therefore is a key indicator of fisheries footprint of nations over time. Furthermore, GDP per capita is a common metric in extant social science research to operationalize the extent to which a nation is successfully developing according to the standards of the world, capitalist economy (Dietz and Jorgenson, 2013). Figure 6 (left panel) shows the estimated regression curve for GDP per capita obtained by applying the FSCAD selection method. The estimate from the FMCP method is similar. We observe the net effect of GDP per capita on fisheries footprint to be positive and linear, although the magnitude of the effect has decreased over time.

Urban population being the key market, also plays a crucial role in the total seafood consumption of nations and therefore influences fisheries footprint. Change in urban population reflects urbanization and urbanization have important effects on food security and farming (Satterthwaite et al., 2010; Cohen and Garrett, 2010). Figure 6 (right panel) shows the estimated regression curve illustrating the effect of urban population on fisheries footprint. Here also we notice the net effect of urban population on fisheries footprint to be positive, although the effect appears to be more or less constant with a very marginal decrease (in log scale).

Here, it is important to note that fisheries footprint represents the metabolic potential of an ecosystem to reproduce itself ecologically. According to the Food and Agriculture Organization of the United Nations (FAO, 2016), about 58 percent of global fish stocks are currently fully exploited, and about 55 percent of ocean territory (conservatively) was subjected to industrial fishing in the past year (Kroodsma et al., 2018). Thus, there is declining metabolic potential for the expansion of capture fisheries, which likely helps to account for why variable effects were stronger in earlier, more ecologically productive decades.

Both these variables are important in the sense they represent the primary indicators in economics, food consumption, population dynamics, trade, etc; which directly interact with a nation’s need for seafood and therefore should influence fisheries footprint. It is therefore not surprising that countries having high GDP per capita and/or high urban population e.g., United States, Australia, Singapore, etc also have a high fisheries footprint. In Figure 7 we display the fisheries footprint, GDP per capita and urban population profile of the three representative countries mentioned earlier. We notice the overall trend in the fisheries footprint profile can be described well by their GDP per capita and urban population profile, both of which were shown to have a positive effect on fisheries footprint.

Remark 3: We have successfully applied our proposed variable selection method to find out the relevant time-varying predictors and their time-varying effects on fisheries footprint. It is very plausible that there might be country or region specific effects on fisheries footprint as revealed in study by Clark and Longo (2019), and one might be interested in estimating these effects. The proposed variable selection method for FLCM can be extended to handle such region specific effects in its existing form.

Remark 4: We have considered concurrent effects of the predictors while some of the predictors might have lagged effects on fisheries footprint. For example in an economic crisis or recession, the predictors could very likely have reverberating impacts on development for a few years. As invested capital takes time to flow through the economy, considering such lagged effects would be interesting. We applied our variable selection method with lagged predictors present along with the original predictors (with lag window = 1, 3). We found out that for lag one, the proposed FMCP and FSCAD method select almost identical models with the FSCAD method selecting ‘services value growth pct’ as an additional variable. Considering a lag window of three years, the FSCAD method selects urban population lag instead of urban population while the FMCP method additionally selects ‘aquaculture production tons’ and ‘services value growth pct lag’ as influential covariates. These results indicate some of the predictors could have reverberating impacts and a more general framework like the historical functional regression model (Malfait and Ramsay, 2003) might be more suitable to model past effects of covariates on the response at current time point.

5 Discussion

In this article, we have proposed a variable selection method in functional linear concurrent regression extending the classically used penalized variable selection methods like LASSO, SCAD, and MCP. We have shown the problem can be addressed as a group LASSO and their natural extension group SCAD or group MCP problem. We have used a pre-whitening procedure to take into account the temporal dependence present within functions and through numerical simulations, have illustrated our proposed selection method with group SCAD or group MCP penalty can select the true underlying variables with high accuracy and has minuscule false positive and false negative rate even when data is observed sparsely, is contaminated with measurement error and the error process is highly non-stationary. We have illustrated usefulness of the proposed method by applying to two real datasets: the dietary calcium absorption study data and the fisheries footprint data in identification of the relevant time-varying covariates. In this article we have used a resampling subject based bootstrap method to measure uncertainty of the regression functions estimates, theoretical properties corresponding to such bootstrap is something we would like to explore more deeply in the future.

There are many interesting research directions this work can head into. In real data, the dynamic effects of the predictors might always not be linear. In future, we would like to extend our variable selection method to nonparametric functional concurrent regression model (Maity, 2017), which is a more general and flexible model to capture complex relationships present between the response and covariates. As mentioned earlier it would be also of interest to consider the lagged effects of covariates through a more general historical functional regression model (Malfait and Ramsay, 2003).

In developing our method we assumed the covariates to be independent and identically distributed. In many cases this might not be a reasonable assumption. For example, in the fisheries footprint data some countries could be very similar and form clusters, on the other hand, they might not be even independent with the interplay of economies and other variables among nations. Even if the covariates are not independent over subjects, the variable selection criterion proposed in this article can still be used in practice as a penalized least square method. The heterogeneity present among the subjects can be addressed using interaction effect of covariates with regions, which can be clustered based on the level of affluence. This can be done similarly as in Clark and Longo (2019). Alternatively, one can also use subject specific functional random effects for covariates, especially if one is interested in individual specific trajectories. Functional linear mixed model (Liu et al., 2017) might be an appropriate choice in such situations. Extending the proposed variable selection method to such general functional regression models would be an extension of this work and remain an area for future research.

Software

All the methods discussed in this article has been implemented using the ‘grpreg’ package (Breheny, 2019) in R. Illustrations of implementation of our method using $R$ are available with this article on Wiley Online Library and at GitHub (https://github.com/rahulfrodo/FLCM_Selection).

Acknowledgement

We would like to thank the editor, the associate editor and an anonymous referee for their valuable inputs and suggestions which have greatly helped in improving this article.

Bibliography54

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Breheny (2019) Breheny, P. (2019) Regularization paths for regression models with grouped covariates.
2Breheny and Huang (2015) Breheny, P. and Huang, J. (2015) Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and computing , 25 , 173–187.
3Chen and Chen (2008) Chen, J. and Chen, Z. (2008) Extended bayesian information criteria for model selection with large model spaces. Biometrika , 95 , 759–771.
4Chen et al. (2016) Chen, Y., Goldsmith, J. and Ogden, R. T. (2016) Variable selection in function-on-scalar regression. Stat , 5 , 88–101.
5Chiang et al. (2001) Chiang, C.-T., Rice, J. A. and Wu, C. O. (2001) Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variables. Journal of the American Statistical Association , 96 , 605–619.
6Clark and Longo (2019) Clark, T. P. and Longo, S. B. (2019) Examining the effect of economic development, region, and time period on the fisheries footprints of nations (1961–2010). International Journal of Comparative Sociology , 0020715219869976. URL: https://doi.org/10.1177/0020715219869976 . · doi ↗
7Clark et al. (2018) Clark, T. P., Longo, S. B., Clark, B. and Jorgenson, A. K. (2018) Socio-structural drivers, fisheries footprints, and seafood consumption: A comparative international study, 1961-2012. Journal of rural studies , 57 , 140–146.
8Cohen and Garrett (2010) Cohen, M. J. and Garrett, J. L. (2010) The food price crisis and urban food (in) security. Environment and Urbanization , 22 , 467–482.