Measure Selection for Functional Linear Model

Su I Iao; Hans-Georg M\"uller

arXiv:2509.00583·stat.ME·September 3, 2025·Comput. Stat. Data Anal.

Measure Selection for Functional Linear Model

Su I Iao, Hans-Georg M\"uller

PDF

Open Access

TL;DR

This paper introduces a flexible functional linear model that adaptively chooses the measure defining the function space, improving predictive accuracy over traditional models especially for complex data.

Contribution

It proposes a novel data-adaptive measure selection method for functional linear models, extending the framework beyond the standard Lebesgue measure.

Findings

01

Improved predictive performance with the adaptive measure approach.

02

Consistent outperformance over traditional models in simulations.

03

Effective application to COVID-19 and health survey data.

Abstract

Advancements in modern science have led to an increased prevalence of functional data, which are usually viewed as elements of the space of square-integrable functions $L^{2}$ . Core methods in functional data analysis, such as functional principal component analysis, are typically grounded in the Hilbert structure of $L^{2}$ and rely on inner products based on integrals with respect to the Lebesgue measure over a fixed domain. A more flexible framework is proposed, where the measure can be arbitrary, allowing natural extensions to unbounded domains and prompting the question of optimal measure choice. Specifically, a novel functional linear model is introduced that incorporates a data-adaptive choice of the measure that defines the space, alongside an enhanced function principal component analysis. Selecting a good measure can improve the model's predictive performance, especially when the…

Tables12

Table 1. Table 1 : Average mean squared prediction errors and standard deviations (in parentheses) for bounded domain 𝒯 = [ 0 , 1 ] \mathcal{T}=[0,1] , based on 1000 Monte Carlo simulations, comparing FLM (functional linear model with Lebesgue measure) and wFLM (weighted functional linear model with step function weights).

Scenario	Method	$n$	$N = 20$	$N = 50$	$N = 100$
I	FLM	50	1.004 (0.215)	1.000 (0.214)	1.001 (0.217)
	wFLM	50	0.905 (0.218)	0.813 (0.197)	0.815 (0.200)
	FLM	100	0.975 (0.142)	0.973 (0.142)	0.973 (0.142)
	wFLM	100	0.846 (0.154)	0.722 (0.134)	0.703 (0.137)
	FLM	200	0.967 (0.100)	0.967 (0.101)	0.968 (0.100)
	wFLM	200	0.821 (0.118)	0.668 (0.088)	0.644 (0.082)
	FLM	500	0.958 (0.060)	0.958 (0.061)	0.958 (0.060)
	wFLM	500	0.796 (0.084)	0.636 (0.042)	0.614 (0.039)
II	FLM	50	0.084 (0.029)	0.078 (0.027)	0.076 (0.025)
	wFLM	50	0.057 (0.026)	0.044 (0.024)	0.038 (0.021)
	FLM	100	0.065 (0.018)	0.059 (0.014)	0.058 (0.014)
	wFLM	100	0.041 (0.019)	0.024 (0.014)	0.022 (0.013)
	FLM	200	0.052 (0.011)	0.049 (0.008)	0.049 (0.008)
	wFLM	200	0.031 (0.016)	0.015 (0.008)	0.013 (0.007)
	FLM	500	0.045 (0.004)	0.044 (0.004)	0.043 (0.004)
	wFLM	500	0.022 (0.013)	0.010 (0.003)	0.008 (0.003)

Table 2. Table 2 : Average mean squared prediction errors and standard deviations (in parentheses) for unbounded domain 𝒯 = [ 0 , ∞ ) \mathcal{T}=[0,\infty) , based on 500 Monte Carlo simulations, comparing the FLM (functional linear model with Lebesgue measure) and wFLM (weighted functional linear model with exponential density weights and half-normal density weights).

Method	$n$	$N_{i}$ = 5-10	$N_{i}$ = 20
FLM	100	1000.49 (1267.99)	782.28 (1063.72)
wFLM (Exp)		116.09 (388.10)	61.53 (435.21)
wFLM (HalfNorm)		105.80 (478.56)	52.83 (385.37)
FLM	200	804.27 (1016.61)	678.97 (956.70)
wFLM (Exp)		82.08 (240.86)	31.27 (330.86)
wFLM (HalfNorm)		76.13 (187.65)	39.82 (324.44)

Table 3. Table 3 : Leave-one-out cross-validation score and standard deviations (in parentheses) of wFLM (weighted functional linear model with step function weight and exponential density weight function) and FLM (functional linear model with Lebesgue measure) for COVID-19 data

wFLM (Step)	wFLM (Exp)	FLM (Lebesgue)
0.513 (0.809)	0.263 (0.328)	0.779 (1.027)

Table 4. Table 4 : Leave-one-out cross-validation score and standard deviations (in parentheses) of wFLM (weighted functional linear model with exponential density weight function) and FLM (functional linear model with Lebesgue measure) for NHANES data.

wFLM (Exp)	FLM (Lebesgue)
1.022 (1.633)	1.398 (8.814)

Table 5. Table S.1 : Average training time in minutes across different sample sizes and different numbers of measurement points and standard deviations (in parentheses) for FLM (functional linear model with Lebesgue measure) and wFLM (weighted functional linear model with step function weights) under Scenario 1 of Section 4.1 .

Method	$n$	$N = 20$	$N = 50$	$N = 100$
FLM	50	0.032 (0.006)	0.030 (0.004)	0.033 (0.005)
wFLM	50	0.180 (0.022)	0.236 (0.017)	0.422 (0.055)
FLM	100	0.030 (0.004)	0.030 (0.003)	0.035 (0.006)
wFLM	100	0.224 (0.024)	0.323 (0.024)	0.693 (0.065)
FLM	200	0.031 (0.008)	0.031 (0.004)	0.040 (0.004)
wFLM	200	0.342 (0.056)	0.504 (0.051)	1.223 (0.108)
FLM	500	0.033 (0.005)	0.037 (0.011)	0.057 (0.010)
wFLM	500	0.667 (0.071)	1.106 (0.246)	2.785 (0.213)

Table 6. Table S.2 : Average training time in minutes across different sample sizes and different numbers of measurement points and standard deviations (in parentheses) for FLM (functional linear model with Lebesgue measure) and wFLM (weighted functional linear model with exponential density weight function) under the setting of Section 4.2.

Method	$n$	$N_{i} = 5 - 10$	$N_{i} = 20$
FLM	100	0.076 (0.009)	0.291 (0.041)
wFLM	100	0.795 (0.125)	4.395 (0.133)
FLM	200	0.218 (0.041)	1.298 (0.147)
wFLM	200	3.208 (0.126)	17.759 (0.438)

Table 7. Table S.3 : Average mean squared prediction error (AMSPE) and standard deviations (in parentheses) for FLM (functional linear model with Lebesgue measure) across different sample sizes n n , number of measurements N N , and measurement error levels σ \sigma .

$σ$	$n$	$N = 20$	$N = 50$	$N = 100$
0.00	50	1.077 (0.168)	1.077 (0.167)	1.077 (0.166)
	100	0.998 (0.140)	0.999 (0.140)	0.999 (0.140)
	200	0.983 (0.159)	0.983 (0.159)	0.983 (0.158)
	500	0.968 (0.135)	0.968 (0.135)	0.968 (0.135)
0.25	50	1.086 (0.171)	1.083 (0.178)	1.090 (0.170)
	100	1.009 (0.144)	1.003 (0.143)	1.008 (0.143)
	200	0.983 (0.155)	0.983 (0.158)	0.988 (0.157)
	500	0.971 (0.136)	0.970 (0.136)	0.971 (0.136)
0.50	50	1.086 (0.172)	1.082 (0.177)	1.089 (0.169)
	100	1.010 (0.143)	1.003 (0.143)	1.008 (0.143)
	200	0.983 (0.155)	0.983 (0.158)	0.988 (0.157)
	500	0.971 (0.136)	0.970 (0.136)	0.971 (0.136)
0.75	50	1.088 (0.173)	1.082 (0.179)	1.088 (0.168)
	100	1.010 (0.143)	1.003 (0.143)	1.009 (0.145)
	200	0.982 (0.155)	0.983 (0.158)	0.988 (0.157)
	500	0.971 (0.136)	0.970 (0.135)	0.971 (0.136)
1.00	50	1.087 (0.173)	1.082 (0.177)	1.087 (0.167)
	100	1.010 (0.145)	1.003 (0.142)	1.009 (0.145)
	200	0.983 (0.156)	0.984 (0.158)	0.987 (0.157)
	500	0.972 (0.136)	0.970 (0.135)	0.971 (0.136)

Table 8. Table S.4 : Average mean squared prediction error (AMSPE) and standard deviations (in parentheses) for wFLM (weighted functional linear model with step function weights) across different sample sizes n n , number of measurements N N , and measurement error levels σ \sigma .

$σ$	$n$	$N = 20$	$N = 50$	$N = 100$
0.00	50	0.986 (0.177)	0.858 (0.265)	0.784 (0.275)
	100	0.863 (0.172)	0.634 (0.206)	0.605 (0.197)
	200	0.825 (0.160)	0.522 (0.126)	0.511 (0.102)
	500	0.806 (0.145)	0.496 (0.076)	0.496 (0.076)
0.25	50	0.987 (0.192)	0.843 (0.247)	0.814 (0.248)
	100	0.881 (0.179)	0.661 (0.182)	0.637 (0.182)
	200	0.836 (0.168)	0.565 (0.092)	0.564 (0.119)
	500	0.812 (0.148)	0.553 (0.084)	0.543 (0.084)
0.50	50	0.993 (0.190)	0.922 (0.208)	0.938 (0.216)
	100	0.896 (0.175)	0.768 (0.170)	0.746 (0.179)
	200	0.840 (0.161)	0.671 (0.103)	0.651 (0.121)
	500	0.823 (0.147)	0.645 (0.095)	0.623 (0.095)
0.75	50	1.009 (0.188)	0.971 (0.193)	0.977 (0.184)
	100	0.901 (0.165)	0.862 (0.166)	0.843 (0.169)
	200	0.855 (0.159)	0.772 (0.131)	0.749 (0.135)
	500	0.837 (0.145)	0.719 (0.107)	0.696 (0.104)
1.00	50	1.017 (0.191)	0.997 (0.188)	0.995 (0.180)
	100	0.917 (0.165)	0.888 (0.160)	0.874 (0.162)
	200	0.867 (0.158)	0.819 (0.127)	0.806 (0.138)
	500	0.853 (0.144)	0.775 (0.114)	0.760 (0.110)

Table 9. Table S.5 : Average mean squared prediction error (AMSPE) and standard deviations (in parentheses) for FLM (functional linear model with Lebesgue measure) and wFLM (weighted functional linear model with exponential density weight function) under varying noise levels σ \sigma , for different sample sizes n n and number of measurements N i N_{i} .

$σ$	$n$	Method	$N_{i} = 5$ – $10$	$N_{i} = 20$
0.00	100	FLM	836.47 (1167.38)	800.22 (1340.22)
	100	wFLM	42.70 (64.91)	4.07 (3.83)
	200	FLM	814.38 (1384.70)	651.86 (968.43)
	200	wFLM	32.38 (23.11)	2.61 (2.10)
0.25	100	FLM	788.45 (958.31)	824.89 (1279.89)
	100	wFLM	96.52 (452.15)	18.41 (111.21)
	200	FLM	898.71 (1457.29)	671.07 (1042.29)
	200	wFLM	40.83 (20.09)	5.37 (8.50)
0.50	100	FLM	912.36 (1156.95)	802.88 (1164.14)
	100	wFLM	81.98 (112.51)	56.91 (412.74)
	200	FLM	898.25 (1160.19)	669.86 (958.95)
	200	wFLM	66.28 (65.12)	17.03 (73.01)
0.75	100	FLM	942.42 (1182.26)	873.84 (1357.36)
	100	wFLM	118.25 (220.73)	57.63 (269.89)
	200	FLM	931.48 (1203.69)	675.96 (924.48)
	200	wFLM	110.87 (292.86)	35.00 (153.27)
1.00	100	FLM	1065.91 (1327.04)	860.24 (1311.72)
	100	wFLM	143.09 (289.21)	86.20 (422.79)
	200	FLM	924.58 (1142.45)	690.19 (934.90)
	200	wFLM	150.31 (503.18)	54.13 (254.85)

Table 10. Table S.6 : Median mean squared error for FLM (functional linear model with Lebesgue measure) and wFLM (weighted functional linear model with exponential density weight function) under varying measurement error levels σ \sigma , for different sample sizes n n and numbers of measurements N i N_{i} .

$σ$	$n$	Method	$N_{i} = 5$ – $10$	$N_{i} = 20$
0.00	100	FLM	559.87	372.53
	100	wFLM	32.63	2.49
	200	FLM	478.78	393.68
	200	wFLM	25.68	1.93
0.25	100	FLM	546.97	421.66
	100	wFLM	44.14	4.09
	200	FLM	472.03	366.03
	200	wFLM	36.48	3.16
0.50	100	FLM	648.08	423.55
	100	wFLM	56.65	7.87
	200	FLM	489.00	382.13
	200	wFLM	50.19	5.69
0.75	100	FLM	697.27	441.59
	100	wFLM	70.63	12.33
	200	FLM	544.98	393.73
	200	wFLM	63.05	10.10
1.00	100	FLM	797.91	454.03
	100	wFLM	86.47	18.54
	200	FLM	566.82	402.10
	200	wFLM	78.71	15.21

Table 11. Table S.7 : Sensitivity of AMSPE to varying λ 1 \lambda_{1} (with λ 2 = 0 \lambda_{2}=0 fixed) for step-based wFLM with n = 500 n=500 .

Method	$λ_{1}$	$N = 20$	$N = 50$	$N = 100$
FLM	—	0.971 (0.136)	0.970 (0.136)	0.971 (0.136)
wFLM	0.0	0.830 (0.143)	0.645 (0.094)	0.622 (0.094)
	0.1	0.938 (0.136)	0.781 (0.201)	0.770 (0.199)
	0.2	0.949 (0.134)	0.948 (0.134)	0.949 (0.134)
	0.3	0.951 (0.133)	0.949 (0.130)	0.950 (0.134)
	0.4	0.953 (0.135)	0.950 (0.131)	0.950 (0.134)
	0.5	0.953 (0.135)	0.950 (0.131)	0.950 (0.134)
	1.0	0.953 (0.135)	0.950 (0.131)	0.950 (0.134)
	1.5	0.954 (0.134)	0.950 (0.132)	0.950 (0.134)
	2.0	0.954 (0.134)	0.950 (0.132)	0.950 (0.134)
	3.0	0.954 (0.134)	0.950 (0.132)	0.950 (0.134)
	4.0	0.954 (0.134)	0.950 (0.132)	0.950 (0.134)
	5.0	0.954 (0.134)	0.950 (0.132)	0.950 (0.134)

Table 12. Table S.8 : Sensitivity of AMSPE to varying λ 2 \lambda_{2} (with λ 1 = 0 \lambda_{1}=0 fixed) for step-based wFLM with n = 500 n=500 .

Method	$λ_{2}$	$N = 20$	$N = 50$	$N = 100$
FLM	—	0.971 (0.136)	0.970 (0.136)	0.971 (0.136)
wFLM	0.0	0.830 (0.143)	0.645 (0.094)	0.622 (0.094)
	0.1	0.823 (0.146)	0.645 (0.095)	0.621 (0.095)
	0.2	0.823 (0.148)	0.645 (0.095)	0.621 (0.095)
	0.3	0.823 (0.147)	0.645 (0.095)	0.621 (0.095)
	0.4	0.823 (0.147)	0.645 (0.095)	0.621 (0.095)
	0.5	0.823 (0.147)	0.645 (0.095)	0.621 (0.095)
	1.0	0.823 (0.147)	0.645 (0.095)	0.621 (0.095)
	1.5	0.819 (0.145)	0.645 (0.095)	0.621 (0.095)
	2.0	0.819 (0.145)	0.645 (0.095)	0.621 (0.095)
	3.0	0.819 (0.145)	0.645 (0.095)	0.621 (0.095)
	4.0	0.819 (0.145)	0.645 (0.095)	0.621 (0.095)
	5.0	0.819 (0.145)	0.645 (0.095)	0.621 (0.095)

Equations97

μ_{X} (t) = E {X (t)} and C_{X X} (s, t) = E [{X (s) - μ_{X} (s)} {X (t) - μ_{X} (t)}] .

μ_{X} (t) = E {X (t)} and C_{X X} (s, t) = E [{X (s) - μ_{X} (s)} {X (t) - μ_{X} (t)}] .

C_{X X} (s, t) = k = 1 \sum \infty ρ_{k} ϕ_{k} (s) ϕ_{k} (t), s, t \in T,

C_{X X} (s, t) = k = 1 \sum \infty ρ_{k} ϕ_{k} (s) ϕ_{k} (t), s, t \in T,

X_{i} (t) = μ_{X} (t) + k = 1 \sum \infty ξ_{ik} ϕ_{k} (t), t \in T,

X_{i} (t) = μ_{X} (t) + k = 1 \sum \infty ξ_{ik} ϕ_{k} (t), t \in T,

⟨ f, g ⟩ = \int_{T} f (t) g (t) d ν (t),

⟨ f, g ⟩ = \int_{T} f (t) g (t) d ν (t),

W = {w : T \mapsto [0, \infty); \int_{T} w (t) d t = \int_{T} d ν (t) = 1} .

W = {w : T \mapsto [0, \infty); \int_{T} w (t) d t = \int_{T} d ν (t) = 1} .

Z = w (X - μ_{X}),

Z = w (X - μ_{X}),

C_{Z Z} (s, t) = k = 1 \sum \infty ρ_{w k} ϕ_{Z k} (s) ϕ_{Z k} (t), s, t \in T,

C_{Z Z} (s, t) = k = 1 \sum \infty ρ_{w k} ϕ_{Z k} (s) ϕ_{Z k} (t), s, t \in T,

Z (t) =

Z (t) =

ξ_{w k} =

ϕ_{w k} = \frac{ϕ _{Z k}}{w}, k = 1, 2, \dots, where \frac{1}{w} = 0 if w = 0,

ϕ_{w k} = \frac{ϕ _{Z k}}{w}, k = 1, 2, \dots, where \frac{1}{w} = 0 if w = 0,

Z (t) = k = 1 \sum \infty ξ_{w k} ϕ_{Z k} (t),

Z (t) = k = 1 \sum \infty ξ_{w k} ϕ_{Z k} (t),

ξ_{w k} = \int_{T} Z (t) ϕ_{Z k} (t) d t = \int_{T} {X (t) - μ_{X} (t)} ϕ_{w k} (t) d ν (t), k = 1, 2, \dots .

ξ_{w k} = \int_{T} Z (t) ϕ_{Z k} (t) d t = \int_{T} {X (t) - μ_{X} (t)} ϕ_{w k} (t) d ν (t), k = 1, 2, \dots .

ψ_{w k} = ψ_{Z k} / \frac{d ν _{2}}{d ν _{1}} .

ψ_{w k} = ψ_{Z k} / \frac{d ν _{2}}{d ν _{1}} .

ζ_{w k} = \int_{T} Z (t) ψ_{Z k} (t) d ν_{1} (t) = \int_{T} {X (t) - μ_{X} (t)} ψ_{w k} (t) d ν_{2} (t) .

ζ_{w k} = \int_{T} Z (t) ψ_{Z k} (t) d ν_{1} (t) = \int_{T} {X (t) - μ_{X} (t)} ψ_{w k} (t) d ν_{2} (t) .

E [Y ∣ X] = β_{0} + \int_{T} X (t) β (t) d ν (t) .

E [Y ∣ X] = β_{0} + \int_{T} X (t) β (t) d ν (t) .

E [Y ∣ X] = μ_{Y} + \int_{T} {X (t) - μ_{X} (t)} β (t) d ν (t) .

E [Y ∣ X] = μ_{Y} + \int_{T} {X (t) - μ_{X} (t)} β (t) d ν (t) .

E [Y ∣ X] = μ_{Y} + \int_{T} Z (t) β_{w} (t) d t,

E [Y ∣ X] = μ_{Y} + \int_{T} Z (t) β_{w} (t) d t,

β_{w} (t) = k = 1 \sum \infty \frac{E { ξ _{w k} ( Y - μ _{Y} )}}{E ( ξ _{w k}^{2} )} ϕ_{Z k} (t) = k = 1 \sum \infty ρ_{w k}^{- 1} σ_{k Y} ϕ_{Z k} (t) = k = 1 \sum \infty β_{k} ϕ_{Z k} (t),

β_{w} (t) = k = 1 \sum \infty \frac{E { ξ _{w k} ( Y - μ _{Y} )}}{E ( ξ _{w k}^{2} )} ϕ_{Z k} (t) = k = 1 \sum \infty ρ_{w k}^{- 1} σ_{k Y} ϕ_{Z k} (t) = k = 1 \sum \infty β_{k} ϕ_{Z k} (t),

β (t) = k = 1 \sum \infty β_{k} ϕ_{w k} (t) .

β (t) = k = 1 \sum \infty β_{k} ϕ_{w k} (t) .

C_{Y Z} (t) = cov (Y, Z (t)) = k = 1 \sum \infty E {ξ_{w k} (Y - μ_{Y})} ϕ_{Z k} (t) .

C_{Y Z} (t) = cov (Y, Z (t)) = k = 1 \sum \infty E {ξ_{w k} (Y - μ_{Y})} ϕ_{Z k} (t) .

\hat{β}_{w} (t) = k = 1 \sum M \hat{β}_{k} \hat{ϕ}_{Z k} (t) and \hat{β} (t) = k = 1 \sum M \hat{β}_{k} \hat{ϕ}_{w k} (t),

\hat{β}_{w} (t) = k = 1 \sum M \hat{β}_{k} \hat{ϕ}_{Z k} (t) and \hat{β} (t) = k = 1 \sum M \hat{β}_{k} \hat{ϕ}_{w k} (t),

E [Y^{*} ∣ X^{*}] = μ_{Y} + k = 1 \sum \infty ρ_{w k}^{- 1} σ_{k Y} ξ_{w k}^{*},

E [Y^{*} ∣ X^{*}] = μ_{Y} + k = 1 \sum \infty ρ_{w k}^{- 1} σ_{k Y} ξ_{w k}^{*},

ξ_{w k}^{*} = \int_{T} (X^{*} (t) - μ_{X} (t)) ϕ_{w k} (t) d ν (t) = \int_{T} Z^{*} (t) ϕ_{Z k} (t) d t

ξ_{w k}^{*} = \int_{T} (X^{*} (t) - μ_{X} (t)) ϕ_{w k} (t) d ν (t) = \int_{T} Z^{*} (t) ϕ_{Z k} (t) d t

w^{*} = w \in W arg min CVE (w) = w \in W arg min \frac{1}{n} i = 1 \sum n (Y_{i} - \hat{Y}_{i, M}^{(- i)})^{2},

w^{*} = w \in W arg min CVE (w) = w \in W arg min \frac{1}{n} i = 1 \sum n (Y_{i} - \hat{Y}_{i, M}^{(- i)})^{2},

W_{step} = {w (t) = l = 1 \sum 2^{K} c_{l} \cdot 1_{t \in [\frac{l - 1}{2 ^{K}}, \frac{l}{2 ^{K}})} : c_{l} \in R, \int_{T} w (t) d t = 1} .

W_{step} = {w (t) = l = 1 \sum 2^{K} c_{l} \cdot 1_{t \in [\frac{l - 1}{2 ^{K}}, \frac{l}{2 ^{K}})} : c_{l} \in R, \int_{T} w (t) d t = 1} .

w^{*} = w \in W_{step} arg min PCVS (w) = w \in W_{step} arg min \frac{\sum _{i = 1}^{n} ( Y _{i} - Y ^ _{i, M}^{(- i)} ) ^{2}}{\sum _{i = 1}^{n} ( Y _{i} - Y ˉ ) ^{2}} + λ_{1} T V (w) + λ_{2} \frac{\int _{T} I { w ( t ) \neq = 0 } d t}{∣ T ∣},

w^{*} = w \in W_{step} arg min PCVS (w) = w \in W_{step} arg min \frac{\sum _{i = 1}^{n} ( Y _{i} - Y ^ _{i, M}^{(- i)} ) ^{2}}{\sum _{i = 1}^{n} ( Y _{i} - Y ˉ ) ^{2}} + λ_{1} T V (w) + λ_{2} \frac{\int _{T} I { w ( t ) \neq = 0 } d t}{∣ T ∣},

ψ_{k} (t) = {2 cos (k π t), 2 sin ((k - 1) π t), k \in {1, 3, 5, 7, 9}, k \in {2, 4, 6, 8, 10} .

ψ_{k} (t) = {2 cos (k π t), 2 sin ((k - 1) π t), k \in {1, 3, 5, 7, 9}, k \in {2, 4, 6, 8, 10} .

X_{ij} = μ_{X} (t_{ij}) + k = 1 \sum 10 ξ_{ik} ψ_{k} (t_{ij}) + ϵ_{ij},

X_{ij} = μ_{X} (t_{ij}) + k = 1 \sum 10 ξ_{ik} ψ_{k} (t_{ij}) + ϵ_{ij},

w (t) = ⎩ ⎨ ⎧ 0, 1/6, 1/3, 1/2, t \in (- \infty, 1/4) \cup (1, + \infty); t \in [1/4, 1/2); t \in [1/2, 3/4); t \in [3/4, 1] .

w (t) = ⎩ ⎨ ⎧ 0, 1/6, 1/3, 1/2, t \in (- \infty, 1/4) \cup (1, + \infty); t \in [1/4, 1/2); t \in [1/2, 3/4); t \in [3/4, 1] .

AMSPE = \frac{1}{Q} q = 1 \sum Q \frac{1}{100} i = 1 \sum 100 (\hat{Y}_{i, q}^{*} - Y_{i, q}^{*})^{2},

AMSPE = \frac{1}{Q} q = 1 \sum Q \frac{1}{100} i = 1 \sum 100 (\hat{Y}_{i, q}^{*} - Y_{i, q}^{*})^{2},

LOOCVS = \frac{\sum _{i = 1}^{n} ( Y _{i} - Y ^ _{i}^{(- i)} ) ^{2}}{\sum _{i = 1}^{n} ( Y _{i} - Y ˉ ) ^{2}},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Fault Detection and Control Systems

Full text

Measure Selection for Functional Linear Model

Su I Iao and Hans-Georg Müller111Department of Statistics, One Shields Ave., University of California, Davis, CA 95616, U.S.A. e-mail: [email protected]

Department of Statistics, University of California, Davis, One Shields Ave, Davis, 95616, CA, USA

Abstract

Advancements in modern science have led to an increased prevalence of functional data, which are usually viewed as elements of the space of square-integrable functions $L^{2}$ . Core methods in functional data analysis, such as functional principal component analysis, are typically grounded in the Hilbert structure of $L^{2}$ and rely on inner products based on integrals with respect to the Lebesgue measure over a fixed domain. A more flexible framework is proposed, where the measure can be arbitrary, allowing natural extensions to unbounded domains and prompting the question of optimal measure choice. Specifically, a novel functional linear model is introduced that incorporates a data-adaptive choice of the measure that defines the space, alongside an enhanced function principal component analysis. Selecting a good measure can improve the model’s predictive performance, especially when the underlying processes are not well-represented when adopting the default Lebesgue measure. Simulations, as well as applications to COVID-19 data and the National Health and Nutrition Examination Survey data, show that the proposed approach consistently outperforms the conventional functional linear model.

keywords:

Functional data analysis, weighted functional principal component analysis, weighted functional linear model, optimal measures.

††journal:

1 Introduction

Functional data have become increasingly prevalent with the advancement of modern data collection technologies. Typically, functional data are considered independent and identically distributed samples representing realizations of an underlying smooth stochastic process observed at discrete time points. Over the past decades, the field of functional data analysis (FDA) has garnered significant attention, particularly in connection with the successful deployment of methods such as functional principal component analysis (FPCA) (Kleffe, 1973; Castro et al., 1986; Yao et al., 2005a; Hall and Hosseini-Nasab, 2006; Chen and Lei, 2015) and functional linear models (FLM) (Ramsay and Silverman, 2005; Yao et al., 2005b; Hall and Horowitz, 2007). Comprehensive introductions and reviews can be found for example in Ramsay and Silverman (2005), Hsing and Eubank (2015), and Wang et al. (2016).

Both FPCA and FLM utilize the Hilbert space structure of $L^{2}(\mathcal{T})$ , which is conventionally equipped with the Lebesgue measure to facilitate computation, where $\mathcal{T}$ denotes the continuum of interest for the functional data. FLM is often implemented using an FPCA-based approach (Yao et al., 2005b; Hall and Hosseini-Nasab, 2006; Hilgert et al., 2013; Imaizumi and Kato, 2018), where FPCA is first applied to decompose functional predictors into orthogonal principal components. These principal components serve as low-dimensional representations that are subsequently used as covariates in a regression model. However, while this approach is widely used in applications (Liang et al., 2015; Chen et al., 2024; Iao et al., 2024; Zhou et al., 2024), it may not always provide the most effective representation of functional data. The success of FPCA-based FLM largely depends on the efficient representation of the coefficient function of FLM by the leading functional principal components (Cai and Yuan, 2012). In practice, the principal components of $X$ may not align well with the structure of the coefficient function, leading to suboptimal predictive performance. This issue parallels limitations observed in principal component regression (Jolliffe, 1982) and singular value decomposition techniques for linear inverse problems (Donoho, 1995). Notably, when low-variance components of $X$ carry non-negligible predictive power, discarding them can degrade model performance. These considerations motivate the development of alternative eigensystem constructions aimed at improving predictive accuracy and model interpretability.

A promising approach to address this issue is to introduce weighting schemes in functional data analysis. Prior research has explored weighted methods in FPCA (Leng, 2004; Talská et al., 2020), as well as in clustering and classification (Chen et al., 2014; Romano et al., 2020). By defining inner products with respect to an alternative measure, the resulting eigensystem can potentially yield a more effective representation of the coefficient function. Despite these developments, the application of weighted methodologies to functional linear models remains an open research question.

In this work, we propose a novel weighted functional linear model (wFLM) with functional predictors and scalar responses based on a data-driven measure. This framework is designed to operate in a Hilbert space equipped with a general measure, transcending beyond the classical default Lebesgue measure, and applies to both bounded and unbounded domains. By incorporating a general measure, the proposed approach enables a more flexible representation of functional data, leading to improved model interpretability and predictive performance. In addition to optimizing eigensystem alignment, the weighting approach that we propose here conveys additional benefits when dealing with infinite domains. When the domain is unbounded $\mathcal{T}=[0,\infty)$ , the space $L^{2}(\mathcal{T})$ imposes major constraints, as commonly used functions, like polynomials, are not situated in this space, while they are square-integrable when $\mathcal{T}$ is finite. If trajectories $X$ do not lie in $L^{2}(\mathcal{T})$ , traditional functional data analysis techniques such as FPCA and FLM are not applicable. Adopting a weighting scheme might also reflect that not all regions of a function’s domain are equally important or relevant for the analysis. Changing the uniform reference measure may be interpreted as emphasizing or downplaying the variability at some subdomains of the stochastic processes.

The rest of this paper is organized as follows. In Sections 2, we introduce the weighted functional principal component analysis and weighted functional linear model. The data-adaptive measure selections are established in Section 3. Simulations are shown in Section 4. Applications for COVID-19 data and the National Health and Nutrition Examination Survey data are discussed in Section 5.

2 Methodology

In this section, we introduce weighted functional principal component analysis (wFPCA) and then proceed to the wFPCA-based weighted functional linear model (wFLM).

2.1 Weighted functional principal component analysis

Revisiting classical functional principal component analysis (FPCA), let $X(t)$ be a square integrable stochastic process on $\mathcal{T}$ for which one has $n$ independent copies $X_{i}(t)$ , $i=1,\ldots,n$ . Mean and covariance function of $X$ are

[TABLE]

By Mercer’s Theorem (see Theorem 4.6.5 in Hsing and Eubank (2015)), the spectral decomposition of the covariance function $C_{XX}(s,t)$ is

[TABLE]

where $\rho_{1}>\rho_{2}>\cdots>0$ are the eigenvalues and $\{\phi_{k}\}_{k=1}^{\infty}$ are the corresponding eigenfunctions of the auto-covariance operator. The latter form an orthonormal system on $L^{2}(\mathcal{T})$ with respect to inner products based on the Lebesgue measure. The Karhunen-Loève representation implies that the $i$ th random curve can be represented as

[TABLE]

where the principal component scores $\xi_{ik}=\int_{\mathcal{T}}\left\{X_{i}(t)-\mu_{X}(t)\right\}\phi_{k}(t)dt$ are uncorrelated random variables with zero mean and variances $E(\xi_{ik}^{2})=\rho_{k}$ .

An extension of classical FPCA involves defining inner products with respect to more general measures,

[TABLE]

where $\nu$ is an absolutely continuous measure with respect to Lebesgue measure and $f,g$ are $\nu$ -square integrable functions on the domain $\mathcal{T}$ , see, e.g., Leng (2004); Chen et al. (2014); Talská et al. (2020). The Radon-Nikodym theorem (Ash and Doléans-Dade, 2000) ensures the existence of a measurable function $w(t)=d\nu(t)/dt$ , $t\in\mathcal{T}$ , which we refer to as weight function. The weight function $w$ is assumed to reside in the space

[TABLE]

To define the weighted FPCA, we assume that $X(t)$ is square integrable on $\mathcal{T}$ with respect to $\nu$ , i.e., $\int_{\mathcal{T}}X^{2}(t)d\nu(t)<\infty$ and introduce a new square integrable process with respect to Lebesgue measure,

[TABLE]

where $Z$ has $E(Z(t))=0$ and covariance function $C_{ZZ}(s,t)=E\{Z(s)Z(t)\}$ . The spectral decomposition of the covariance $C_{ZZ}$ with respect to Lebesgue measure is

[TABLE]

with eigenvalues $\rho_{w1}>\rho_{w2}>\cdots>0$ and eigenfunctions $\{\phi_{Zk}\}_{k=1}^{\infty}$ . The Karhunen-Loève representation of the process $Z$ is

[TABLE]

where $\xi_{wk}$ is the $k$ th principal component score of $Z$ and $\phi_{wk}(t)=\phi_{Zk}(t)/\sqrt{w(t)}$ . To ensure $\phi_{wk}(t)$ is well-defined, we set $1/\sqrt{w(t)}=0$ if $w(t)=0$ . This leads to the following proposition, which appeared previously in Leng (2004).

Proposition 1.

Given a probability measure $\nu$ that is absolutely continuous with respect to the Lebesgue measure, where $d\nu(t)=w(t)dt$ , and a stochastic process $X(t)\in L^{2}(\mathcal{T},\nu)$ , the process $Z=\sqrt{w}(X-\mu_{X})$ is mean zero and square integrable with respect to the Lebesgue measure. For the eigenvalues and eigenfunctions $\{\rho_{wk},\phi_{Zk}\}_{k=1}^{\infty}$ of the process $Z$ with respect to Lebesgue measure as per (3) and the functions

[TABLE]

it holds that the $\{\rho_{wk},\phi_{wk}\}_{k=1}^{\infty}$ form the eigensystem of the original process $X$ with respect to the probability measure $\nu$ . The Karhunen-Loève representation of $Z$ is given by

[TABLE]

with principal component scores

[TABLE]

The scores $\xi_{wk}$ can equivalently be interpreted as principal component scores of the process $Z$ under the Lebesgue measure or as principal component scores of the process $X$ under the probability measure $\nu$ .

All proofs are provided in the Supplementary Material. Given a general measure $\nu$ , Proposition 1 yields an easily implementable approach to obtain the Karhunen-Loève expansion of processes $X$ and the weighted FPCA of a process $X$ in $L^{2}(\mathcal{T},\nu)$ . With the measure $\nu$ and random samples $\{X_{i}\}_{i=1}^{n}$ , one can follow the estimation procedures outlined in Yao et al. (2005a) and Zhang and Wang (2016) to obtain estimates $\hat{\mu}_{X}$ , $\hat{C}_{ZZ}$ and further derive estimates $\hat{\rho}_{wk}$ , $\hat{\phi}_{Zk}$ , $\hat{\xi}_{wk}$ and $\hat{\phi}_{wk}$ for the corresponding targets indexed by $k=1,\ldots,M$ (where $M$ is the number of included eigenfunctions, which can be chosen by leave-one-out cross-validation, see Section 3).

Proposition 1 relies on two key assumptions that are standard and well-motivated in functional data analysis: (1) The stochastic process $X(t)$ resides in the Hilbert space $L^{2}(\mathcal{T},\nu)$ , i.e., it is square integrable with respect to the general measure $\nu$ . This assumption ensures the existence of well-defined mean and covariance functions and guarantees the applicability of the Karhunen-Loève expansion and the Hilbert space structure of $L^{2}(\mathcal{T},\nu)$ provides the basis for eigen-analysis, including completeness, a well-defined inner product, and the existence of an orthonormal basis. (2) The measure $\nu$ is absolutely continuous with respect to the Lebesgue measure, so that the Radon-Nikodym derivative $w(t)=d\nu(t)/dt$ exists. This is a mild condition, commonly satisfied in practical applications where the weighting function is derived from data or design considerations. In particular, it is satisfied by our proposed data-driven measure, which is constructed to be absolutely continuous by design; see Section 3 for further details. Intuitively, absolute continuity ensures that $\nu$ does not assign positive mass to any set that has zero Lebesgue measure, so no information carried by $\nu$ is lost when working with Lebesgue integrals, allowing for the analysis of the transformed process $Z=\sqrt{w}(X-\mu_{X})$ in the standard $L^{2}(\mathcal{T})$ setting.

More generally, if we consider two measures $\nu_{1}$ and $\nu_{2}$ , where $\nu_{2}$ is absolutely continuous with respect to $\nu_{1}$ , with Radon-Nikodym derivative $d\nu_{2}/d\nu_{1}$ , one can conduct a weighted FPCA within the space $L^{2}(\mathcal{T},\nu_{2})$ by means of the space $L^{2}(\mathcal{T},\nu_{1})$ . The following proposition extends Proposition 1 to this more general setting.

Proposition 2.

Given two general measures $\nu_{1}$ , $\nu_{2}$ and a stochastic process $X(t)\in L^{2}(\mathcal{T},\nu_{2})$ , if we assume $\nu_{2}$ is absolutely continuous with respect to $\nu_{1}$ , then $Z=\sqrt{\frac{d\nu_{2}}{d\nu_{1}}}(X-\mu_{X})$ belongs to $L^{2}(\mathcal{T},\nu_{1})$ . Denote the eigenvalues and eigenfunctions of the process $Z$ with respect to $\nu_{1}$ as $\{\lambda_{wk},\psi_{Zk}\}_{k=1}^{\infty}$ and define a new function

[TABLE]

Then, the eigensystem of $X$ within $L^{2}(\mathcal{T},\nu_{2})$ is $\{\lambda_{wk},\psi_{wk}\}_{k=1}^{\infty},$ and the $k$ th principal component score of the process $X$ with respect to the measure $\nu_{2}$ is

[TABLE]

Proposition 2 generalizes Proposition 1 by establishing a mapping between eigensystems defined under two arbitrary measures, provided one of these measures absolutely continuous with respect to the other. This result enables the construction of a weighted FPCA framework in $L^{2}(\mathcal{T},\nu_{2})$ by leveraging the eigendecomposition of a rescaled process $Z$ in a potentially simpler or more tractable space $L^{2}(\mathcal{T},\nu_{1})$ . The key idea is that the Radon-Nikodym derivative $d\nu_{2}/d\nu_{1}$ determines how the geometry of the space, and hence the structure of the principal components, transforms across different weighting schemes.

When in Proposition 2 $\nu_{1}$ is the Lebesgue measure and $\nu_{2}=\nu$ is an absolutely continuous measure with Radon-Nikodym derivative $w(t)=d\nu(t)/dt$ , Proposition 1 emerges as a special case, where the transformed process $Z(t)=\sqrt{w(t)}\{X(t)-\mu_{X}(t)\}$ and the reweighted eigenfunctions $\phi_{wk}(t)=\phi_{Zk}(t)/\sqrt{w(t)}$ match those in Proposition 1.

2.2 Weighted functional linear model

Consider a general measure $\nu$ which is absolutely continuous with respect to the Lebesgue measure, such that there exists a weight function $w(t)=d\nu(t)/dt\in\mathcal{W}$ . Let $(X,Y)$ be a random pair in $L^{2}(\mathcal{T},\nu)\times\mathbb{R}$ , where $X$ is a functional predictor and $Y$ a scalar response, where $\mu_{Y}=E(Y)$ and $\mu_{X}=E(X)$ , and variance $\sigma_{Y}^{2}={\rm Var}(Y)$ and covariance $C_{XX}$ as per (1). Suppose $\{X_{i},Y_{i}\}_{i=1}^{n}$ are $n$ independent realizations of $(X,Y)$ . In this section, we consider a weighted functional linear model (wFLM) in which $(X,Y)$ are generated by the model

[TABLE]

Here the regression function $\beta(t)$ is smooth and square integrable, i.e., $\int_{\mathcal{T}}\beta^{2}(t)d\nu(t)<\infty$ .

Centering predictor processes $X$ , the functional linear regression model becomes

[TABLE]

Consider the transformed processes $Z=\sqrt{w}(X-\mu_{X})$ , the wFLM is equivalent to

[TABLE]

where $\beta_{w}(t)=\beta(t)\sqrt{w(t)}$ . The regression parameter function $\beta_{w}$ can be represented as (Yao et al., 2005b; Hall and Horowitz, 2007)

[TABLE]

where $\{\rho_{wk},\phi_{Zk}\}_{k=1}^{\infty}$ is the eigensystem of the process $Z$ as per (3), $\xi_{wk}$ are the $k$ th principal component scores of the process $Z$ as per (5), $\sigma_{kY}=E\{\xi_{wk}(Y-\mu_{Y})\}$ and $\beta_{k}=\rho_{wk}^{-1}\sigma_{kY}$ . Transforming $\beta_{w}$ back to $\beta$ , we obtain the representation

[TABLE]

One can use a well-established local linear smoothing approach to obtain an estimate $\hat{C}_{YZ}(t)$ of the cross-covariance surface

[TABLE]

This leads to the estimators

[TABLE]

where $M$ is number of included eigen-components, which is a tuning parameter, $\hat{\beta}_{k}=\hat{\rho}_{wj}^{-1}\hat{\sigma}_{kY}$ , $\hat{\sigma}_{kY}=\int_{\mathcal{T}}\hat{C}_{YZ}(t)\hat{\phi}_{Zk}(t)dt$ . Further details about this smoothing approach to obtain estimates of the eigen-components and coefficient functions can be found, e.g., in Yao et al. (2005b).

To predict the scalar response $Y^{*}$ from a new predictor trajectory $X^{*}$ , we ultilize the equation (6), the basis representation of $\beta_{w}(t)$ as per (7) and the orthonormality of the $\{\phi_{Zk}\}_{k\geq 1}$ . The prediction of the response can be obtained via the conditional expectation

[TABLE]

where

[TABLE]

is the $j$ th functional principal component score of the predictor trajectory $X^{*}$ . The quantities $\mu_{Y}$ , $\mu_{X}$ , $\rho_{wk}$ , $\sigma_{kY}$ can be estimated from the data, as described in Yao et al. (2005a, b) and Zhang and Wang (2016).

3 Choosing the weight function for the functional linear model

So far the weight function, $w$ was assumed to be given. In practical applications, selecting a good weight function from the available data is crucial. Ideally, we aim to find the optimal weight function within a set $W$ of potential weight functions as per (2). The objective is to minimize the cross-validation error,

[TABLE]

where $\hat{Y}^{(-i)}_{i,M}=\hat{\mu}_{Y}^{(-i)}+\sum_{k=1}^{M}\hat{\beta}_{k}^{(-i)}\hat{\xi}_{i,Zk}^{(-i)}$ is the cross-validation prediction for the $i$ th subject and $\hat{\beta}_{k}^{(-i)}=\hat{\sigma}_{kY}^{(-i)}/\hat{\rho}_{wk}^{(-i)}$ is the estimate of $\beta_{k}$ as per (7). Here, the superscript $(-i)$ denotes leave-one-out estimation, where the $i$ th sample is omitted from the estimation process. To accomplish this goal, we present two practical approaches for selecting optimal weight functions tailored to different types of domain $\mathcal{T}$ , aiming to up-weigh or down-weigh subdomains that are more or less important for obtaining good predictions when applying the functional linear model.

3.1 Step function approach on the finite domain $\mathcal{T}=[0,1]$

Finding an analytical solution for the optimal weight function in Equation (11) is challenging. To efficiently obtain approximate solutions for (11) in practical applications, we employ a dyadic splitting algorithm (Leng, 2004). For the sake of completeness, details about this algorithm are included in the Supplementary Material. This algorithm results in weight functions that are step functions.

We search for the optimal weight function within the subset $W_{\text{step}}\subset W$ , where

[TABLE]

Here, $2^{K}$ is the number of steps and $K$ is the number of times that we split the interval. To ensure that the resulting weight function is interpretable and has no abrupt jumps, we consider a penalized cross-validation score (13),

[TABLE]

where $TV(w)=\sum_{l}^{2^{K}-1}|c_{l+1}-c_{l}|$ is the total variation of $w$ and $M$ , $\lambda_{1}$ , $\lambda_{2}$ are tuning parameters, $M$ denoting the number of included components.

For the selection of tuning parameters, we employed cross-validation to simultaneously select $M,\lambda_{1}$ and $\lambda_{2}$ . To ensure computational efficiency, we limit the number of candidate values for $M$ , $\lambda_{1}$ , and $\lambda_{2}$ to expedite the cross-validation procedure. For $M$ , we consider candidate values ranging from 1 to $\text{M}_{Leb}$ , where $\text{M}_{Leb}$ is the best value for $M$ in the FLM under the Lebesgue measure according to the cross-validation error; for $\lambda_{1}$ and $\lambda_{2}$ , we consider the values 0, 0.5 and 1. For a comprehensive sensitivity analysis for the choice of the tuning parameters $\lambda_{1}$ and $\lambda_{2}$ we refer to Section S.8 of the Supplementary Material.

3.2 Parametric density approach on the infinite domain $\mathcal{T}=[0,\infty)$

For infinite domains, we adopt a parametric approach for selecting the weight function $w$ . Specifically, we consider density functions whose support aligns with $\mathcal{T}=[0,\infty)$ or $\mathcal{T}=(-\infty,\infty)$ to ensure that the resulting weighted $L^{2}$ space remains well-defined. We focus on the case $\mathcal{T}=[0,\infty)$ , as extensions to the case $\mathcal{T}=(-\infty,\infty)$ are analogous. For $\mathcal{T}=[0,\infty)$ , suitable choices include distributions from the exponential family, such as the exponential, half-normal, gamma and truncated normal distributions. These parametric choices incorporate prior knowledge or a desired emphasis on specific subregions of the domain. In our applications, we focus on the exponential density $w(t;\lambda)=\lambda e^{-\lambda t}$ for $t\in[0,\infty)$ due to its interpretability, single parameter $\lambda>0$ which controls the decay rate and its strong empirical performance. The exponential density places more weight near the origin and decays monotonically, which is often appropriate in functional data where signal strength may diminish over time. We also considered the half-normal distribution in our simulation studies, where $w(t;\sigma)=\sqrt{\frac{2}{\pi\sigma^{2}}}\exp\left(-\frac{t^{2}}{2\sigma^{2}}\right)$ for $t\in[0,\infty)$ . The half-normal distribution also defines decreasing weights over $[0,\infty)$ , and while its rate of decay differs from that of the exponential distribution, both densities asymptotically approach zero as $t\to\infty$ . As demonstrated in Section 4.2, the predictive performances of exponential and half-normal weights are comparable, suggesting robustness to specific choices. For each choice we selected the optimal parameter via cross-validation using the criterion in Equation (11).

While the exponential density is emphasized in our applications and the half-normal is included in our simulations, the proposed framework is not restricted to these choices. Weight functions derived from other parametric distributions such as the gamma distribution could also be incorporated. These alternatives offer additional flexibility. Both gamma and truncated normal distributions may place more emphasis on mid- or late-domain regions rather than near the origin, which may be beneficial in settings where important information is concentrated away from $t=0$ . Although these densities differ in shape near the center, they all exhibit exponential decay as $t\to\infty$ , ensuring stability over unbounded domains. Thus, the gamma distribution and truncated normal distribution may be suitable alternatives when emphasizing mid-to-late domain regions is desirable. Among these different options, the exponential density provides a computationally efficient and conceptually straightforward baseline. Nonetheless, the proposed framework is flexible and can accommodate weight functions derived from other parametrically specified distributions.

4 Simulation studies

4.1 Simulations on $\mathcal{T}=[0,1]$

We conducted simulation studies evaluating weight functions for two distinct measures: the Lebesgue measure (uniform density) and an optimal measure as approximated by a step function. These investigations comprised two separate simulation scenarios, each encompassing $Q=1000$ Monte Carlo runs. In each scenario, we considered $12$ settings with $n$ ranging from ${50,100,200,500}$ i.i.d. pairs, consisting of a response scalar and a predictor trajectory, as well as varying numbers of measurements per predictor trajectory $N=20,50,100$ ; the locations $t_{ij}$ where these measurements were taken were equidistant within the interval $[0,1]$ .

The predictor trajectories, denoted as $X_{i}(\cdot)$ with corresponding noisy measurements $X_{ij}$ , were generated as follows. For both scenarios, the simulated processes $X$ had mean function $\mu_{X}(t)=2t-5\cos(2\pi t)$ and covariance functions were constructed using 10 eigenfunctions $\psi_{k}(t)$ such that

[TABLE]

For Scenario 1 we chose eigenvalues $\rho_{k}=10\times 0.5^{10-k}$ for $k=1,\ldots,10$ . We generated functional principal component scores $\xi_{ik}$ from $\mathcal{N}\left(0,\rho_{k}\right)$ and obtained the predictor measurements

[TABLE]

where the additional measurement errors $\epsilon_{ij}$ followed a normal distribution with mean 0 and variance $0.5^{2}$ , $j=1,\ldots,N$ and $i=1,\ldots,n$ . The scalar responses were generated according to $Y_{i}=\int_{0}^{1}\beta(t)X_{i}(t)dt+e_{i}=\sum_{k=1}^{10}\beta_{k}\xi_{ik}+e_{i},$ where $\beta(t)=\sum_{k=1}^{10}\beta_{k}\psi_{k}(t)$ , with $\beta_{k}=5\times 0.5^{k-1}$ for $k=1,\ldots,10$ , and the additional measurement errors for the responses $e_{i}$ followed a normal distribution with mean 0 and variance $0.5^{2}$ . Processes $X$ and all errors were independent in both Scenario 1 and Scenario 2.

In Scenario 2 we chose eigenvalues $\rho_{k}=10\times 0.5^{k-1}$ for $k=1,\ldots,10$ . We generated FPC scores $\xi_{ik}$ from $\mathcal{N}\left(0,\rho_{k}\right)$ , and calculated the predictor measurements $X_{ij}$ again as in (14), where all errors were obtained in the same way as in Scenario 1. Here the scalar responses were generated as $Y_{i}=\int_{0}^{1}\beta(t)X_{i}(t)w(t)dt+e_{i}$ , where $\beta(t)=2+3t-3\sin(\pi t)$ , with $e_{i}$ as in Scenario 1 and the weight function $w(t)$ was specified as

[TABLE]

For the $q$ th Monte Carlo run, we generated 100 new noisy predictors $X_{ij,q}^{*}$ and 100 corresponding noise-free responses $Y_{i,q}^{*}$ . We evaluated the predictive performance using the average mean squared prediction error (AMSPE)

[TABLE]

where $\hat{Y}_{i,q}^{*}$ represented the predicted responses estimated by either FLM or wFLM.

Table 1 presents the results for both scenarios. In Scenario 1, wFLM (step) consistently outperformed FLM (Lebesgue) in terms of AMSPE, with larger gains observed for increased sample sizes and measurement points. These results suggest that when the coefficient function $\beta(\cdot)$ cannot be efficiently represented using the leading functional principal components, using the default Lebesgue measure in the FLM may be suboptimal and a more general step function-approximated measure may entail a more suitable eigensystem to efficiently represent the regression parameter function $\beta$ , especially when sample size $n$ and number of measurement points $N$ are relatively large. Furthermore, in Scenario 2, wFLM (step) also demonstrated superior predictive performance across all settings, with notable improvements when $n=500$ , achieving reductions in MSPE of $51\%$ , $78\%$ and $82\%$ for numbers of measurements $N=20,50,100$ .

4.2 Simulations on $\mathcal{T}=[0,\infty)$

To evaluate the performance of the proposed weighted functional linear model (wFLM) on an unbounded domain, we conducted a simulation study over $\mathcal{T}=[0,\infty)$ . In addition to the exponential density, which aligns with the true underlying measure used to generate the data, we also included the half-normal density to assess the robustness of the method regarding the choice of weighting function. We compared the predictive performance of wFLM with exponential weight function, wFLM with half-normal weight function and the classical FLM that utilizes the Lebesgue measure. We considered four settings with sample sizes $n$ ranging from $\{100,200\}$ and two choices for the number of measurements $N_{i}$ with $Q=500$ Monte Carlo runs. The number of measurements $N_{i}$ was either set to $20$ or chosen randomly for each predictor trajectory with equal probability from $\{5,6,7,8,9,10\}$ . The locations of the measurements were exponentially distributed with a rate $1/2$ over the infinite interval $[0,\infty)$ , reflecting unbounded support with irregular and potentially sparse sampling.

The predictor trajectories $X_{i}(\cdot)$ and associated noisy measurements $X_{ij}$ were generated as follows. The simulated processes $X$ had mean function $\mu_{X}(s)=5-3\cos(\pi t/5)+2t$ and covariance function constructed using a set of 9 eigenfunctions $\psi_{k}$ (for more details we refer to Section S.4 in the Supplementary Material), which are orthonormal in $L^{2}([0,\infty),d\nu)$ where $d\nu(t)=e^{-t}dt$ is the density of the standard exponential distribution. We chose the eigenvalues $\rho_{wk}=10\times 0.5^{k-1},\,\,k=1,\ldots,9$ and $\rho_{wk}=0,k>9$ , and $\sigma_{X}^{2}=0.5$ as variance of the additional measurement errors $\epsilon_{ij}$ , which were assumed to be normal with mean 0. For each sample $i$ , we generate FPC scores $\xi_{ik}$ from $N(0,\rho_{k})$ and obtained predictor measurements, $X_{ij}=\mu_{X}(t_{ij})+\sum_{k=1}^{9}\xi_{ik}\psi_{k}(t_{ij})+\epsilon_{ij},\,\,j=1,\ldots,N_{i},\,\,i=1,\ldots,n$ . The scalar responses $Y_{i}$ were generated by $Y_{i}=\sum_{k=1}^{9}\beta_{k}\xi_{ik}+e_{i}$ , where $\beta_{k}=10/k^{3}$ and $e_{i}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}N(0,0.5^{2})$ . As before, for each Monte Carlo run we generated 100 new noisy predictors $X_{ij}^{*}$ and 100 corresponding noise-free responses $Y_{i}^{*}$ .

Table 2 reports the average mean squared prediction errors and standard deviations across the simulations. Here both wFLM approaches dramatically outperform the classical FLM for the irregular and sparse measurement settings. For instance, when $n=100$ and $N_{i}\in{5,\ldots,10}$ , wFLM (Exp) reduces AMSPE by 88.4% compared to the basic FLM, while wFLM (Half-Normal) yields similar gains (89.0%), despite the fact that the true data-generating measure here is the standard exponential density. This demonstrates a certain robustness of our method regarding the specific choice of a weight function derived from a parametric distribution. Even in relatively irregular dense settings, e.g., when $N_{i}=20$ , both weighting schemes substantially improve prediction accuracy. As expected, performance improves with larger sample size $n=200$ and both wFLM approaches maintain a clear advantage over FLM. These results highlight the flexibility and reliability of the proposed framework for unbounded domains and irregular measurement patterns.

5 Applications

5.1 Predicting COVID-19 new cases

We illustrate the performance of the proposed method with COVID-19 data. Functional data analyses for time-dynamic data of COVID-19 cases have been conducted previously (Carroll et al., 2020; Dubey et al., 2022). We obtained daily confirmed cases across countries from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. These data are publicly available at https://github.com/CSSEGISandData/COVID-19. The data feature the cumulative number of confirmed cases for each country from January 22, 2020, to March 3, 2023 and were accessed on April 11, 2023. For the analysis, we focused on the period from July 1, 2020, to December 31, 2022 (a total of 914 days), and used the seven-day moving average of daily confirmed cases per million as functional predictor. The scalar response was taken as the total confirmed cases from January 1 to January 31 in 2023.

The seven-day moving averages of daily confirmed cases per million people from July 1, 2020 to December 31, 2022 for 29 countries are displayed in Fig 1. The 29 countries for which data were included were located in America or Europe, as they exhibited similar COVID-19 response policies and relatively low bias in reported cases, and countries with zero cases in the 914-day trajectory were excluded.

We investigated the performance of both the selection of a weight function as a step function and as an exponential density. We applied the infinite domain method here as the functional predictor spans a fairly long duration and as it is reasonable to assume that the closer the data points are to the end of 2022, the more crucial their influence becomes for the subsequent total confirmed cases in January 2023. Therefore, it is reasonable to assign higher importance to the domain at the end of 2022 while assigning relatively lower weight to the data for earlier periods. To implement this strategy within the framework of a weight function derived from the exponential distribution, we encoded December 31, 2022, as $t=1$ and July 1, 2020, as $t=914$ . This coding scheme ensured that data with measurement times closer to $t=1$ received more weight than those measured earlier. In contrast, when implementing weight functions as step functions we retained the original domain, with $t=1$ representing July 1, 2020 and $t=914$ representing December 31, 2022.

To compare the performance of the FLM with weight function selection with the original FLM that uses the Lebesgue measure and therefore a constant weight function, we employed a leave-one-out cross-validation score, see Table 3. The leave-one-out cross-validation score is

[TABLE]

where $\hat{Y}_{i}^{(-i)}$ represents the predicted value from the model after omitting the $i$ th country from the training data.

The optimal number of principal components, i.e., the minimizer of the cross-validation score for the classical FLM was found to be $\text{M}_{Leb}=4$ , while the optimal M for the step weight function and the weight function derived from the exponential distribution were 3 and 2. Table 3 reveals that wFLM (step) and wFLM (exp) achieve better prediction performance, resulting in a $34\%$ and $66\%$ improvement in prediction accuracy compared to the classical FLM with the Lebesgue measure. It clearly emerges that wFLM achieves better prediction accuracy in this application while using fewer principal components as compared to FLM.

5.2 National Health and Nutrition Examination Survey

Behavioral scientists are interested in analyzing the association between cardiovascular risk factors (such as systolic blood pressure and total cholesterol) and physical activity (Luke et al., 2011; Gerage et al., 2015; Ledbetter et al., 2022; Ge et al., 2024). We apply the proposed method to model the effect of physical activity intensity on systolic blood pressure, utilizing data from the National Health and Nutrition Examination Survey (NHANES) 2005-2006; these data are publicly available at https://wwwn.cdc.gov/nchs/nhanes/ContinuousNhanes/Default.aspx?BeginYear=2005. NHANES assesses the health and nutrition status of U.S. adults and children through comprehensive interviews and physical examinations. The survey collects information on demographic, socioeconomic, dietary, and health-related variables, along with medical, dental and physiological measurements. As part of NHANES, participants aged six and older were asked to wear an Actigraph 7164 accelerometer on a waist belt for seven consecutive days, capturing physical activity intensity every minute throughout the day. These accelerometer data have been widely used by researchers to explore the relationship between activity patterns and various health indicators (Troiano et al., 2014; Tudor-Locke et al., 2012). Worn on the right hip, the accelerometer began recording at 12:01 am the day after the participant’s health examination and was removed only during sleep, swimming or bathing.

We restricted our analysis to a subset of male participants who were married, aged over 20 and had four complete blood pressure measurements. This led to a sample size of $n=500$ participants. Denoting the physical activity intensity function at minute $t$ of the $i$ th participant by $U_{i}(t)$ , we observe that its domain is $I_{i}=\{t\in[0,10080]:1\leq U_{i}(t)\}$ , where $10080$ is in minutes and stands for the total number of minutes over the $7$ days where the signal was recorded. We transform the $U_{i}$ to define the predictor for the $i$ th subject as $X_{i}(s)=\#\{t\in I_{i}:U_{i}(t)=s\}$ for a given physical activity intensity level $s>0$ . This function represents the total time in minutes during which the physical activity intensity equals $s$ over the $7$ days of observation. This is possible since the activity levels are discrete. Similar transformations of the physical activity intensity have been considered previously by various authors Chang and McKeague (2022); Lin et al. (2023), as this transformed function provides a better reflection of the actual activity than $U_{i}(t)$ does.

The response of interest is the average systolic blood pressure, averaging over the four available measurements. The potential for large values of physical activity intensity $s$ , which serves as argument of the $X_{i}$ , means that the domain has no clear upper bound, motivating to consider an infinite domain $(0,\infty)$ for the functional predictor. We investigate the performance of the proposed wFLM with a weight function derived from the exponential standard distribution in comparison with the ordinary FLM. Here it is reasonable to implement the exponential density weight, as the majority of the physical intensity values are small. It is natural that most of the time, people will engage in sedentary behavior or light physical activity, and rarely have high-intensity values and therefore low levels of physical activity should receive more weight. By cross-validation we found the optimal number of principal components for the ordinary FLM to be $\text{M}_{Leb}=1$ . With $\text{M}=1$ , Table 4 reveals that wFLM (exp) achieves better prediction performance, resulting in a $27\%$ improvement in prediction accuracy compared to FLM (Lebesgue).

6 Discussion

In this paper, we introduced a weighted functional linear model that generalizes the conventional functional linear model by incorporating a data-driven, optimal measure for defining the Hilbert space. This modified model is shown to achieve better predictive performance by emphasizing more relevant regions of the functional domain and thus improving the representation of the coefficient function. Furthermore, this framework naturally extends to infinite domains, addressing challenges in functional data analysis for unbounded domains where traditional methods may struggle.

Through simulation studies and real data applications, we demonstrated that the approach consistently outperforms the standard functional linear model, offering a more flexible and powerful framework for functional linear regression. We also provide basic representations and relationships of eigen-systems for weighted and unweighted functional principal component analysis.

The proposed method could also be harnessed for other tasks in functional data analysis, such as generalized functional linear regression and functional classification (Müller and Stadtmüller, 2005), among others. Future work may explore alternative methods for selecting optimal measures and expanding the model to accommodate more complex functional data structures.

Acknowledgments

This research was supported in part by NSF grant DMS-2310450. We thank the referees for helpful comments.

S.1 Proof of Proposition 1

Proof.

We show that $\phi_{wj}(t)=\phi_{Zj}(t)/\sqrt{w(t)}$ form an orthonormal system with respect to $d\nu$ ,

[TABLE]

Denoting the auto-covariance operator of $X$ with respect to $d\nu$ , i.e., in the space $L^{2}(\mathcal{T},\nu)$ by $A_{w}$ .

[TABLE]

Next we show that the $\phi_{wj}$ and $\rho_{wj}$ are the eigenfunctions and eigenvalues of $A_{w}$ , Then

[TABLE]

Finally, it is easy to show that the functional principal component scores of process $X$ in $L^{2}(\mathcal{T},\nu)$ are the same as those of process $Z$ in $L^{2}(\mathcal{T})$ ,

[TABLE]

∎

S.2 Proof of Proposition 2

Proof.

Let $w(t)$ be $d\nu_{2}(t)/d\nu_{1}(t)$ . We first show that $\psi_{wk}(t)=\frac{\psi_{Zk}(t)}{\sqrt{w}}$ form an orthonormal system with respect to $d\nu_{2}$ ,

[TABLE]

since $\{\psi_{Zk}\}_{k=1}^{\infty}$ are eigenfunctions of the process $Z=\sqrt{w}(X-\mu_{X})$ . Denoting the auto-covariance operator of $X$ with respect to $d\nu_{2}$ , i.e., in the space $L^{2}(\mathcal{T},\nu_{2})$ by $A_{w}$ .

[TABLE]

Next we show that the $\psi_{wk}$ and $\rho_{wk}$ are the eigenfunctions and eigenvalues of $A_{w}$ . For this we observe

[TABLE]

Finally, it is easy to show that the functional principal component scores of process $X$ in $L^{2}(\mathcal{T},\nu_{2})$ are the same as those of process $Z$ in $L^{2}(\mathcal{T},\nu_{1})$ ,

[TABLE]

∎

S.3 Dyadic splitting algorithm

Initialization: In the first step, we divide the interval $[0,1]$ into two subintervals: $I_{{1}}=[0,1/2)$ and $I_{{2}}=[1/2,1]$ . We seek a constant weight $w_{{1}}$ for $I_{{1}}$ that minimizes the cross-validation mean square prediction error. The weight $w_{{2}}$ for $I_{{2}}$ is determined automatically based on the constraints imposed on the weight function.

Refinement: Following the initialization, we possess weights for both $I_{{1}}$ and $I_{{2}}$ . We further split $I_{{1}}$ into two equal subintervals: $I_{{1,1}}=[0,1/4)$ and $I_{{1,2}}=[1/4,1/2)$ . While keeping $w_{{2}}$ unchanged on $I_{{2}}$ , we search for a weight $w_{{1,1}}$ on $I_{{1,1}}$ as in the first step, with $w_{{1,2}}$ automatically determined.

We then perform a similar procedure for $I_{{2}}$ from the initialization step, splitting it into $I_{{2,1}}=[1/2,3/4)$ and $I_{{2,2}}=[3/4,1]$ , while retaining the weights on the other intervals. This results in weights $w_{{2,1}}$ on $I_{{2,1}}$ and $w_{{2,2}}$ on $I_{{2,2}}$ , automatically adjusted based on the constraints.

Updating Step: At the $k$ th step, where there are $2^{k-1}$ intervals, we iteratively split each interval from the previous step at its midpoint. We determine the corresponding weights as constants on the left and right subintervals, aiming to minimize the cross-validation mean square prediction error.

Termination: The iteration continues until further splitting fails to reduce the cross-validation mean square prediction error or until it reaches the maximum allowable number of splitting steps (typically set to 3). At this point, we conclude the algorithm, and the current weight function is designated as the output.

S.4 Othonormal basis function in $L^{2}([0,\infty),\nu)$ with $d\nu(t)=\lambda e^{-\lambda t}dt$

[TABLE]

S.5 Training time and computational complexity

We present a detailed comparison of the training time for FLM and wFLM under various sample sizes and numbers of measurement points. Tables S.1 and S.2 report average runtime (in minutes) over 100 repetitions, for wFLM (step) under a bounded domain and wFLM (Exp) under a unbounded domain, respectively. All computations were performed on a local machine equipped with an Apple M2 processor running macOS Sequoia.

Table S.1 reports the average training time for classical FLM and the step-based wFLM under Scenario 1 of Section 4.1, where the domain is bounded, $\mathcal{T}=[0,1]$ . The results demonstrate that the step-based wFLM method introduces additional computational cost compared to FLM, but the increase is moderate and scales reasonably with both the sample size $n$ and the number of measurement points $N$ . Table S.2 presents training times for FLM and wFLM with exponential weights in the unbounded domain setting, $\mathcal{T}=[0,\infty)$ , described in Section 4.2. Here, the computational cost is higher. There are two main reasons for this. First, the predictor measurements are irregularly spaced over an infinite interval, which increases computational burden compared to evenly spaced and bounded designs. Second, wFLM (Exp) involves grid searching over a set of candidate parameters, which adds further cost.

Although wFLM requires more computational resources, it consistently outperforms classical FLM in all simulation settings and applications. In particular, for unbounded domains, FLM suffers from fundamental theoretical limitations. The space $L^{2}([0,\infty))$ excludes many commonly used functions, such as polynomials, and thus cannot adequately support standard FPCA or FLM procedures. As shown in Section 4.2 and Section S.7 of the Supplementary Material, FLM performs poorly in these scenarios, whereas wFLM remains stable and accurate. Moreover, even in bounded domains where the true underlying measure is the Lebesgue measure, as in Scenario 1 of Section 4.1 and Supplementary Section S.6, wFLM still yields superior prediction accuracy when the leading functional principal components fail to capture the signal structure effectively. The additional computational cost of wFLM is therefore justified by its substantial gains in predictive performance and theoretical soundness.

S.6 Sensitivity analysis regarding measurement error variance for step-based wFLM

To evaluate the robustness of the proposed weighted functional linear model with step function weight under varying levels of measurement error, we conducted additional simulations based on Scenario 1 in Section 4.1. The functional predictors $X_{i}(t)$ were generated using the same mean and covariance structure as described in the main simulation setting. Specifically, the mean function was defined as

[TABLE]

and the covariance function was constructed using 10 eigenfunctions $\{\psi_{k}\}_{k=1}^{10}$ with corresponding eigenvalues $\rho_{k}=10\times 0.5^{10-k}$ . The eigenfunctions were given by

[TABLE]

For each subject $i$ , the functional trajectory was constructed as

[TABLE]

where $\xi_{ik}\sim\mathcal{N}(0,\rho_{k})$ and $\epsilon_{ij}\sim\mathcal{N}(0,\sigma^{2})$ represent i.i.d. measurement errors. We examined five levels of measurement error variance: $\sigma\in\{0,0.25,0.5,0.75,1.0\}$ . The scalar responses were generated via the functional linear model:

[TABLE]

where $\beta_{k}=5\times 0.5^{k-1}$ and $e_{i}\sim\mathcal{N}(0,0.5^{2})$ .

We compared FLM and wFLM across sample sizes $n\in\{50,100,200,500\}$ and grid resolutions $N\in\{20,50,100\}$ . Prediction performance was measured using average mean squared prediction error (AMSPE), averaged over 200 Monte Carlo replicates.

The results in Tables S.3 and S.4 show that wFLM consistently achieves lower prediction error than FLM across all levels of measurement error, sample sizes, and numbers of measurements. While FLM exhibits relatively stable performance as measurement error increases, its overall accuracy remains limited. In contrast, wFLM demonstrates strong predictive performance in low-noise settings and retains its advantage even as noise levels grow. These findings highlight the robustness of the proposed weighting scheme and underscore the benefit of adapting an optimized measure rather than the default Lebesgue measure across many settings.

S.7 Sensitivity analysis regarding measurement error variance for weight functions derived from parametric distributions

To further assess the robustness of the proposed wFLM (Exp) method on the unbounded domain $[0,\infty)$ , we conducted additional simulations based on the setting in Section 4.2, now incorporating varying levels of measurement error. Specifically, we varied the standard deviation of the additive noise $\epsilon_{ij}\sim\mathcal{N}(0,\sigma^{2})$ with $\sigma\in\{0,0.25,0.5,0.75,1.0\}$ . All other components of the data-generating process, including the eigenbasis functions $\{\psi_{k}\}$ , the exponential measurement locations and the scalar response model are the same as described in Section 4.2.

We evaluated performance under two settings for the number of measurement points per trajectory: either fixed at $N_{i}=20$ or randomly sampled from $\{5,6,7,8,9,10\}$ with equal probability. The locations of the measurements were exponentially distributed with a rate $1/2$ over the infinite interval $[0,\infty)$ . Average mean squared prediction error (AMSPE) was computed over $Q=200$ Monte Carlo runs for each setting and method.

Table S.5 reports the average mean squared prediction error (AMSPE) and associated standard deviations. Across all settings, wFLM (Exp) consistently outperforms classical FLM, often by a large margin. Notably, while the prediction performance of FLM remains relatively stable across different noise levels, it is uniformly worse than wFLM, especially in low-noise or denser sampling regimes. In contrast, wFLM exhibits strong gains when the signal is recoverable, and degrades modestly under increasing noise. To assess the impact of potential outliers in AMSPE due to heavy noise or extreme trajectories, we also report the median mean squared prediction error in Table S.6. The results for median errors confirm and strengthen the trends observed in the results for mean errors: wFLM achieves drastically lower median errors than FLM, particularly in the $N_{i}=20$ setting, where the signal is better captured. These results demonstrate that the exponential weighting scheme not only improves prediction but also enhances robustness to measurement noise and sparsity.

S.8 Sensitivity analysis regarding the choice of tuning parameters $\lambda_{1}$ and $\lambda_{2}$

To evaluate the sensitivity of the step-based wFLM regarding the choice of tuning parameters $\lambda_{1}$ and $\lambda_{2}$ , we conducted additional simulations under Scenario 1 of Section 4.1 in the main text. We fixed the sample size at $n=500$ and considered three settings for the number of measurements $N\in\{20,50,100\}$ . The measurement locations were equally spaced over $[0,1]$ , and the predictor trajectories $X_{i}(t)$ were generated with the same mean and covariance structure as described in the main text.

The parameter $\lambda_{1}$ penalizes the total variation of the step function weights, encouraging smoother transitions between adjacent intervals. A higher value of $\lambda_{1}$ flattens the weight function and for large values forces it to move closer to a uniform weight function and thus towards the classical FLM. The parameter $\lambda_{2}$ penalizes the number of non-zero subintervals in the weight function. This promotes sparsity by allowing parts of the domain to be entirely down-weighted, which can help isolate the most informative regions and improve prediction.

We conducted two separate experiments. In the first, we fixed $\lambda_{2}=0$ and varied $\lambda_{1}$ from [math] to $5$ . In the second, we fixed $\lambda_{1}=0$ and similarly varied $\lambda_{2}$ . The results are summarized in Tables S.7 and S.8, showing the average mean squared prediction error (AMSPE) and standard deviations over $200$ Monte Carlo replications.

When $\lambda_{2}$ is fixed at zero, increasing $\lambda_{1}$ leads to a noticeable increase in prediction error between $\lambda_{1}=0$ and $\lambda_{1}=0.1$ , after which the performance stabilizes. This pattern suggests that even a small amount of total variation penalization can quickly push the step-function weight toward uniformity, diminishing its ability to adapt to the underlying signal. In this simulation setting, the Lebesgue measure is known to be suboptimal because the coefficient function $\beta(\cdot)$ cannot be effectively represented by the leading principal components. As $\lambda_{1}$ increases, the weight function becomes less adaptive and more uniform, resembling the classical FLM. This rigidity prevents the model from exploiting beneficial flexibility in weight function selection, which explains the degradation in predictive performance. Conversely, when $\lambda_{1}$ is fixed at zero, increasing $\lambda_{2}$ leads to small gains in prediction accuracy. While the true underlying measure for this simulation is the Lebesgue measure, the inability of the leading principal components to capture $\beta(\cdot)$ motivates alternative weighting. By allowing the model to concentrate weight on more relevant subregions, the step-based wFLM is able to adapt better to the signal structure.

Importantly, for users whose primary goal is predictive accuracy, we recommend setting $\lambda_{1}=0$ , which removes the variation penalty and reduces computational burden. This configuration allows the model to explore more flexible weight structures without constraints. However, if interpretability of the learned measure is also a concern, such as avoiding abrupt shifts between adjacent intervals, a small positive $\lambda_{1}$ can help smooth the estimated weights. Overall, these findings suggest that the model is reasonably robust to tuning choices within a practical range, and the penalization framework provides users with the flexibility to balance prediction performance and interpretability depending on their analytical goals.

Bibliography36

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Ash and Doléans-Dade (2000) Ash, R.B., Doléans-Dade, C.A., 2000. Probability and Measure Theory. Academic Press.
2Cai and Yuan (2012) Cai, T.T., Yuan, M., 2012. Minimax and adaptive prediction for functional linear regression. Journal of the American Statistical Association 107, 1201–1216.
3Carroll et al. (2020) Carroll, C., Bhattacharjee, S., Chen, Y., Dubey, P., Fan, J., Gajardo, Á., Zhou, X., Müller, H.G., Wang, J.L., 2020. Time dynamics of COVID-19. Scientific Reports 10, 21040.
4Castro et al. (1986) Castro, P.E., Lawton, W.H., Sylvestre, E.A., 1986. Principal modes of variation for processes with continuous sample curves. Technometrics 28, 329–337.
5Chang and Mc Keague (2022) Chang, H.w., Mc Keague, I.W., 2022. Empirical likelihood-based inference for functional means with application to wearable device data. Journal of the Royal Statistical Society Series B: Statistical Methodology 84, 1947–1968.
6Chen et al. (2024) Chen, H., Müller, H.G., Rodovitis, V.G., Papadopoulos, N.T., Carey, J.R., 2024. Daily activity profiles over the lifespan of female medflies as biomarkers of aging and longevity. Aging Cell 23, e 14080.
7Chen et al. (2014) Chen, H., Reiss, P.T., Tarpey, T., 2014. Optimally weighted L 2 distance for functional data. Biometrics 70, 516–525.
8Chen and Lei (2015) Chen, K., Lei, J., 2015. Localized functional principal component analysis. Journal of the American Statistical Association 110, 1266–1275.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Measure Selection for Functional Linear Model

Abstract

keywords:

1 Introduction

2 Methodology

2.1 Weighted functional principal component analysis

Proposition 1**.**

Proposition 2**.**

2.2 Weighted functional linear model

3 Choosing the weight function for the functional linear model

3.1 Step function approach on the finite domain T=[0,1]\mathcal{T}=[0,1]T=[0,1]

3.2 Parametric density approach on the infinite domain T=[0,∞)\mathcal{T}=[0,\infty)T=[0,∞)

4 Simulation studies

4.1 Simulations on T=[0,1]\mathcal{T}=[0,1]T=[0,1]

4.2 Simulations on T=[0,∞)\mathcal{T}=[0,\infty)T=[0,∞)

5 Applications

5.1 Predicting COVID-19 new cases

5.2 National Health and Nutrition Examination Survey

6 Discussion

Acknowledgments

S.1 Proof of Proposition 1

Proof.

S.2 Proof of Proposition 2

Proof.

S.3 Dyadic splitting algorithm

S.4 Othonormal basis function in L2([0,∞),ν)L^{2}([0,\infty),\nu)L2([0,∞),ν) with dν(t)=λe−λtdtd\nu(t)=\lambda e^{-\lambda t}dtdν(t)=λe−λtdt

S.5 Training time and computational complexity

S.6 Sensitivity analysis regarding measurement error variance for step-based wFLM

S.7 Sensitivity analysis regarding measurement error variance for weight functions derived from parametric distributions

S.8 Sensitivity analysis regarding the choice of tuning parameters λ1\lambda_{1}λ1​ and λ2\lambda_{2}λ2​

Proposition 1.

Proposition 2.

3.1 Step function approach on the finite domain $\mathcal{T}=[0,1]$

3.2 Parametric density approach on the infinite domain $\mathcal{T}=[0,\infty)$

4.1 Simulations on $\mathcal{T}=[0,1]$

4.2 Simulations on $\mathcal{T}=[0,\infty)$

S.4 Othonormal basis function in $L^{2}([0,\infty),\nu)$ with $d\nu(t)=\lambda e^{-\lambda t}dt$

S.8 Sensitivity analysis regarding the choice of tuning parameters $\lambda_{1}$ and $\lambda_{2}$