Robust High-Dimensional Time-Varying Coefficient Estimation

Minseok Shin; Donggyu Kim

arXiv:2302.13658·stat.ME·October 22, 2025

Robust High-Dimensional Time-Varying Coefficient Estimation

Minseok Shin, Donggyu Kim

PDF

Open Access

TL;DR

This paper introduces RED-LASSO, a robust method for high-dimensional, time-varying coefficient estimation using high-frequency data, effectively handling heavy tails and sparsity.

Contribution

The paper develops a novel robust estimation procedure combining Huber loss, debiasing, and thresholding for high-dimensional, time-varying models with heavy-tailed data.

Findings

01

Achieves near-optimal convergence rates.

02

Successfully applied to high-frequency trading data.

03

Handles heavy tails and coefficient sparsity effectively.

Abstract

In this paper, we develop a novel high-dimensional coefficient estimation procedure based on high-frequency data. Unlike usual high-dimensional regression procedures such as LASSO, we additionally handle the heavy-tailedness of high-frequency observations as well as time variations of coefficient processes. Specifically, we employ the Huber loss and a truncation scheme to handle heavy-tailed observations, while $ℓ_{1}$ -regularization is adopted to overcome the curse of dimensionality. To account for the time-varying coefficient, we estimate local coefficients which are biased due to the $ℓ_{1}$ -regularization. Thus, when estimating integrated coefficients, we propose a debiasing scheme to enjoy the law of large numbers property and employ a thresholding scheme to further accommodate the sparsity of the coefficients. We call this Robust thrEsholding Debiased LASSO (RED-LASSO)…

Tables4

Table 1. Table 1 : The average in-sample and out-of-sample R 2 superscript 𝑅 2 R^{2} of the RED-LASSO, ED-LASSO, and LASSO estimators over the five assets.

	In-sample $R^{2}$
	Estimator
	RED-LASSO	ED-LASSO	LASSO
whole period	0.261	0.196	0.220
2013	0.254	0.151	0.202
2014	0.233	0.201	0.187
2015	0.282	0.272	0.257
2016	0.267	0.085	0.214
2017	0.206	0.137	0.158
2018	0.339	0.335	0.315
2019	0.247	0.191	0.208
	Out-of-sample $R^{2}$
	Estimator
	RED-LASSO	ED-LASSO	LASSO
whole period	0.248	0.167	0.216
2014	0.214	0.136	0.181
2015	0.270	0.234	0.245
2016	0.248	0.082	0.210
2017	0.194	0.094	0.152
2018	0.329	0.281	0.304
2019	0.231	0.173	0.202

Table 2. Table 2 : The number of non-zero monthly integrated beta estimates from the RED-LASSO (RED), ED-LASSO (ED), and LASSO estimators for the five assets and 60 factors over 84 months.

Type	Symbol	AAPL			BRK.B			GM			GOOG			XOM
		RED	ED	LASSO	RED	ED	LASSO	RED	ED	LASSO	RED	ED	LASSO	RED	ED	LASSO
Commodity	CA	0	20	0	0	22	0	0	27	0	1	28	0	0	29	0
	CL	0	15	0	0	15	2	1	21	1	1	15	0	11	34	48
	GC	1	17	0	2	11	0	4	25	0	2	15	0	0	25	1
	HG	1	16	0	1	23	0	1	19	1	4	18	0	3	16	4
	HO	0	12	0	0	20	0	3	12	1	0	7	0	2	14	42
	ML	0	20	0	0	15	0	0	22	0	1	17	0	0	12	1
	NG	2	8	0	0	9	0	1	10	0	0	3	0	0	6	1
	OJ	1	11	0	0	9	0	0	15	0	0	23	0	1	15	0
	PA	0	9	0	1	7	0	1	11	0	1	13	0	0	11	1
	PL	2	7	0	1	14	0	0	22	0	2	14	0	0	15	1
	RB	1	9	0	2	15	0	2	14	1	0	17	0	2	12	36
	RM	0	15	0	0	14	0	0	15	0	0	12	0	0	10	0
	RS	0	9	0	0	10	0	0	7	0	0	7	0	0	6	0
	SI	0	18	0	1	16	0	3	13	0	2	18	0	0	17	1
	ZC	0	21	0	1	19	0	3	26	0	0	16	0	0	16	0
	ZL	1	19	0	1	15	0	0	20	0	2	15	0	0	19	1
	ZM	1	13	0	1	17	0	1	19	0	1	19	0	2	14	0
	ZO	0	10	0	0	16	0	1	16	0	1	19	0	1	16	0
	ZR	0	12	0	0	14	0	0	12	0	0	16	0	0	17	0
	ZW	1	15	0	0	13	0	0	16	0	3	23	0	0	12	0
Currency	A6	1	15	0	2	23	1	1	24	2	1	17	0	1	11	6
	AD	0	20	0	2	13	0	2	14	2	3	11	0	4	18	13
	B6	0	19	0	0	17	0	4	23	0	1	20	0	1	21	0
	BR	0	6	0	0	14	0	0	10	0	0	13	0	1	11	1
	DX	3	25	1	0	15	0	1	26	0	0	16	0	1	16	1
	E1	1	15	0	2	17	0	2	22	0	1	18	0	0	14	0
	E6	1	16	0	0	26	0	0	25	0	0	15	0	1	17	0
	J1	2	18	2	2	18	9	3	23	5	1	20	1	3	23	3
	RP	0	15	0	0	11	0	1	24	0	0	14	0	0	21	0
	RU	0	7	0	0	9	0	0	9	0	0	11	0	1	8	1
Interest rate	BTP	0	39	0	1	33	0	2	44	0	0	30	0	0	30	0
	ED	0	2	0	0	4	0	0	10	0	0	3	0	0	7	0
	G	0	46	0	2	47	0	2	39	1	1	41	0	1	44	0
	GG	0	27	0	0	19	2	2	27	1	3	20	0	0	27	0
	HR	0	9	0	2	14	0	1	14	0	3	16	0	0	17	0
	US	1	15	1	0	9	5	3	21	1	1	14	0	3	15	1
	ZF	1	14	0	2	12	4	0	19	1	0	10	0	3	14	0
	ZN	0	10	1	0	13	4	1	13	2	1	13	0	2	15	0
	ZQ	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
	ZT	1	14	1	0	9	0	0	8	2	1	10	0	0	3	0
Stock market index	DY	3	34	18	0	38	26	3	40	18	3	30	10	3	43	25
	ES	53	45	81	71	71	84	14	36	67	64	54	81	30	35	76
	EW	8	42	18	20	35	56	28	49	58	6	46	13	6	43	41
	FX	1	25	9	2	35	19	3	27	20	1	26	5	0	21	27
	MME	18	25	28	8	24	33	12	29	26	26	34	23	8	25	51
	MX	1	35	17	1	24	32	3	30	24	4	25	7	1	27	38
	NQ	84	84	84	14	40	37	8	40	36	84	84	84	13	51	21
	RTY	9	26	27	4	28	27	14	34	48	5	39	24	3	29	22
	VX	14	34	29	14	24	37	5	20	21	21	30	23	5	15	29
	X	3	26	11	6	26	32	7	30	26	4	27	5	12	43	58
	XAE	0	19	0	1	17	1	2	22	5	0	20	0	66	71	57
	XAF	1	17	3	39	50	35	4	27	12	2	16	0	1	18	11
	XAI	2	23	2	2	24	4	1	18	5	0	15	0	2	17	6
	YM	43	60	62	67	60	84	13	31	56	6	27	50	55	63	84
Six factors	HML	9	23	0	28	37	12	26	50	9	22	38	1	23	45	42
	SMB	8	30	0	65	68	19	10	29	1	5	22	0	57	63	19
	RMW	2	13	0	25	44	7	37	51	4	8	24	0	59	65	49
	CMA	2	18	2	5	24	0	11	35	4	18	32	5	45	56	39
	MOM	6	30	2	29	46	11	40	54	23	22	37	5	63	72	39
	MKT	13	51	32	84	83	84	82	82	83	19	35	32	84	84	84

Table 3. Table 3 : The monthly average of non-zero frequency over 60 factors and 84 months for the RED-LASSO, ED-LASSO, and LASSO estimators for the five assets.

	AAPL			BRK.B			GM			GOOG			XOM
	RED	ED	LASSO	RED	ED	LASSO	RED	ED	LASSO	RED	ED	LASSO	RED	ED	LASSO
Non-zero frequency	3.595	15.095	5.130	6.083	16.845	7.940	4.392	17.511	6.750	4.261	15.333	4.392	6.904	18.261	11.678

Table 4. Table 4 : The symbols of 54 futures in Section 5 .

Type	Symbol	Description
Commodity	CA	Cocoa
	CL	Crude Oil WTI
	GC	Gold
	HG	Copper
	HO	NY Harbor ULSD (Heating Oil)
	ML	Milling Wheat
	NG	Henry Hub Natural Gas
	OJ	Orange Juice
	PA	Palladium
	PL	Platinum
	RB	RBOB Gasoline
	RM	Robusta Coffee
	RS	Canola
	SI	Silver
	ZC	Corn
	ZL	Soybean Oil
	ZM	Soybean Meal
	ZO	Oats
	ZR	Rough Rice
	ZW	Wheat
Currency	A6	Australian Dollar
	AD	Canadian Dollar
	B6	British Pound
	BR	Brazilian Real
	DX	US Dollar Index
	E1	Swiss Franc
	E6	Euro FX
	J1	Japanese Yen
	RP	Euro/British Pound
	RU	Russian Ruble
Interest rate	BTP	Euro BTP Long-Bond
	ED	Eurodollar
	G	10-Year Long Gilt
	GG	Euro Bund
	HR	Euro Bobl
	US	30-Year US Treasury Bond
	ZF	5-Year US Treasury Note
	ZN	10-Year US Treasury Note
	ZQ	30-Day Fed Funds
	ZT	2-Year US Treasury Note
Stock market index	DY	DAX
	ES	E-mini S&P 500
	EW	E-mini S&P 500 Midcap
	FX	Euro Stoxx 50
	MME	MSCI Emerging Markets Index
	MX	CAC 40
	NQ	E-mini Nasdaq 100
	RTY	E-mini Russell 2000
	VX	VIX
	X	FTSE 100
	XAE	E-mini Energy Select Sector
	XAF	E-mini Financial Select Sector
	XAI	E-mini Industrial Select Sector
	YM	E-mini Dow

Equations306

∥ A ∥_{1} = 1 \leq j \leq p_{2} max i = 1 \sum p_{1} ∣ A_{ij} ∣, ∥ A ∥_{\infty} = 1 \leq i \leq p_{1} max j = 1 \sum p_{2} ∣ A_{ij} ∣, and ∥ A ∥_{m a x} = i, j max ∣ A_{ij} ∣.

∥ A ∥_{1} = 1 \leq j \leq p_{2} max i = 1 \sum p_{1} ∣ A_{ij} ∣, ∥ A ∥_{\infty} = 1 \leq i \leq p_{1} max j = 1 \sum p_{2} ∣ A_{ij} ∣, and ∥ A ∥_{m a x} = i, j max ∣ A_{ij} ∣.

d Y (t) = d Y^{c} (t) + d Y^{J} (t),

d Y (t) = d Y^{c} (t) + d Y^{J} (t),

d Y^{c} (t) = β^{⊤} (t) d X^{c} (t) + d Z^{c} (t), and d Y^{J} (t) = J^{y} (t) d Λ^{y} (t),

d X (t) = d X^{c} (t) + d X^{J} (t), d X^{c} (t) = μ (t) d t + σ (t) d B (t),

d X (t) = d X^{c} (t) + d X^{J} (t), d X^{c} (t) = μ (t) d t + σ (t) d B (t),

d X^{J} (t) = J (t) d Λ (t), and d Z^{c} (t) = ν (t) d W (t),

d β (t) = μ_{β} (t) d t + ν_{β} (t) d W_{β} (t),

d β (t) = μ_{β} (t) d t + ν_{β} (t) d W_{β} (t),

I β = (I β_{i})_{i = 1, \dots, p} = \int_{0}^{1} β (t) d t .

I β = (I β_{i})_{i = 1, \dots, p} = \int_{0}^{1} β (t) d t .

0 \leq t \leq 1 sup i = 1 \sum p ∣ β_{i} (t) ∣^{δ} \leq s_{p} and i = 1 \sum p ∣ I β_{i} ∣^{δ} \leq s_{p} a.s.,

0 \leq t \leq 1 sup i = 1 \sum p ∣ β_{i} (t) ∣^{δ} \leq s_{p} and i = 1 \sum p ∣ I β_{i} ∣^{δ} \leq s_{p} a.s.,

Y_{i} = Δ_{i + 1}^{n} Y Δ_{i + 2}^{n} Y ⋮ Δ_{i + k_{n}}^{n} Y, Z_{i} = Δ_{i + 1}^{n} Z^{c} Δ_{i + 2}^{n} Z^{c} ⋮ Δ_{i + k_{n}}^{n} Z^{c},

Y_{i} = Δ_{i + 1}^{n} Y Δ_{i + 2}^{n} Y ⋮ Δ_{i + k_{n}}^{n} Y, Z_{i} = Δ_{i + 1}^{n} Z^{c} Δ_{i + 2}^{n} Z^{c} ⋮ Δ_{i + k_{n}}^{n} Z^{c},

X_{i} = Δ_{i + 1}^{n} X^{c ⊤} Δ_{i + 2}^{n} X^{c ⊤} ⋮ Δ_{i + k_{n}}^{n} X^{c ⊤}, and Δ_{i}^{n} X^{c} = Δ_{i}^{n} X_{1} 1_{{∣ Δ_{i}^{n} X_{1} ∣ \leq v_{1, n}}} Δ_{i}^{n} X_{2} 1_{{∣ Δ_{i}^{n} X_{2} ∣ \leq v_{2, n}}} ⋮ Δ_{i}^{n} X_{p} 1_{{∣ Δ_{i}^{n} X_{p} ∣ \leq v_{p, n}}},

v_{j, n} = 3 B V_{j} n^{- 1/2},

v_{j, n} = 3 B V_{j} n^{- 1/2},

l_{τ} (x) = {x^{2} /2 τ ∣ x ∣ - τ^{2} /2 if ∣ x ∣ \leq τ if ∣ x ∣ > τ,

l_{τ} (x) = {x^{2} /2 τ ∣ x ∣ - τ^{2} /2 if ∣ x ∣ \leq τ if ∣ x ∣ > τ,

β_{i Δ_{n}} = ar g β \in R^{p} min L_{τ, i} (β) + η ∥ β ∥_{1},

β_{i Δ_{n}} = ar g β \in R^{p} min L_{τ, i} (β) + η ∥ β ∥_{1},

L_{τ, i} (β) = ∥ l_{τ} (Y_{i} - X_{i} β) / k_{n} ∥_{1} .

L_{τ, i} (β) = ∥ l_{τ} (Y_{i} - X_{i} β) / k_{n} ∥_{1} .

Ω_{i Δ_{n}} = ar g min ∥ Ω ∥_{1} s.t. ∥ \frac{1}{k _{n} Δ _{n}} X_{i}^{⊤} X_{i} Ω - I ∥_{m a x} \leq λ,

Ω_{i Δ_{n}} = ar g min ∥ Ω ∥_{1} s.t. ∥ \frac{1}{k _{n} Δ _{n}} X_{i}^{⊤} X_{i} Ω - I ∥_{m a x} \leq λ,

β_{i Δ_{n}}^{'} = β_{i Δ_{n}} + \frac{1}{k _{n} Δ _{n}} Ω_{i Δ_{n}}^{⊤} X_{i}^{⊤} (Y_{i} - X_{i} β_{i Δ_{n}}) .

β_{i Δ_{n}}^{'} = β_{i Δ_{n}} + \frac{1}{k _{n} Δ _{n}} Ω_{i Δ_{n}}^{⊤} X_{i}^{⊤} (Y_{i} - X_{i} β_{i Δ_{n}}) .

ψ_{ϖ} (x) = {x \mbox s i g n (x) ϖ if ∣ x ∣ \leq ϖ if ∣ x ∣ > ϖ,

ψ_{ϖ} (x) = {x \mbox s i g n (x) ϖ if ∣ x ∣ \leq ϖ if ∣ x ∣ > ϖ,

β_{i Δ_{n}} = β_{i Δ_{n}} + ψ_{ϖ} (\frac{1}{k _{n} Δ _{n}} Ω_{i Δ_{n}}^{⊤} X_{(i + k_{n})}^{⊤} (Y_{(i + k_{n})} - X_{(i + k_{n})} β_{i Δ_{n}})),

β_{i Δ_{n}} = β_{i Δ_{n}} + ψ_{ϖ} (\frac{1}{k _{n} Δ _{n}} Ω_{i Δ_{n}}^{⊤} X_{(i + k_{n})}^{⊤} (Y_{(i + k_{n})} - X_{(i + k_{n})} β_{i Δ_{n}})),

I β = i = 0 \sum [1/ (k_{n} Δ_{n})] - 2 β_{i k_{n} Δ_{n}} k_{n} Δ_{n} .

I β = i = 0 \sum [1/ (k_{n} Δ_{n})] - 2 β_{i k_{n} Δ_{n}} k_{n} Δ_{n} .

I β_{i} = s (I β_{i}) 1 (∣ I β_{i} ∣ \geq h_{n}) and I β = (I β_{i})_{i = 1, \dots, p},

I β_{i} = s (I β_{i}) 1 (∣ I β_{i} ∣ \geq h_{n}) and I β = (I β_{i})_{i = 1, \dots, p},

β_{i Δ_{n}} = ar g β \in R^{p} min ∥ l_{τ} (Y_{i} - X_{i} β) / k_{n} ∥_{1} + η ∥ β ∥_{1},

β_{i Δ_{n}} = ar g β \in R^{p} min ∥ l_{τ} (Y_{i} - X_{i} β) / k_{n} ∥_{1} + η ∥ β ∥_{1},

Ω_{i Δ_{n}} = ar g min ∥ Ω ∥_{1} s.t. ∥ \frac{1}{k _{n} Δ _{n}} X_{i}^{⊤} X_{i} Ω - I ∥_{m a x} \leq λ,

Ω_{i Δ_{n}} = ar g min ∥ Ω ∥_{1} s.t. ∥ \frac{1}{k _{n} Δ _{n}} X_{i}^{⊤} X_{i} Ω - I ∥_{m a x} \leq λ,

β_{i Δ_{n}} = β_{i Δ_{n}} + ψ_{ϖ} (\frac{1}{k _{n} Δ _{n}} Ω_{i Δ_{n}}^{⊤} X_{(i + k_{n})}^{⊤} (Y_{(i + k_{n})} - X_{(i + k_{n})} β_{i Δ_{n}})),

β_{i Δ_{n}} = β_{i Δ_{n}} + ψ_{ϖ} (\frac{1}{k _{n} Δ _{n}} Ω_{i Δ_{n}}^{⊤} X_{(i + k_{n})}^{⊤} (Y_{(i + k_{n})} - X_{(i + k_{n})} β_{i Δ_{n}})),

I β = i = 0 \sum [1/ (k_{n} Δ_{n})] - 2 β_{i k_{n} Δ_{n}} k_{n} Δ_{n} .

I β = i = 0 \sum [1/ (k_{n} Δ_{n})] - 2 β_{i k_{n} Δ_{n}} k_{n} Δ_{n} .

I β_{i} = s (I β_{i}) 1 (∣ I β_{i} ∣ \geq h_{n}) and I β = (I β_{i})_{i = 1, \dots, p},

I β_{i} = s (I β_{i}) 1 (∣ I β_{i} ∣ \geq h_{n}) and I β = (I β_{i})_{i = 1, \dots, p},

\displaystyle\max_{1\leq i\leq n}\mathbb{E}\left\{|\Delta_{i}^{n}Z^{c}|^{\gamma}\Big{|}\mathcal{F}_{(i-1)\Delta_{n}}\right\}\leq Cn^{-\gamma/2},

\displaystyle\max_{1\leq i\leq n}\mathbb{E}\left\{|\Delta_{i}^{n}Z^{c}|^{\gamma}\Big{|}\mathcal{F}_{(i-1)\Delta_{n}}\right\}\leq Cn^{-\gamma/2},

0 \leq t \leq 1 sup E {∣ J^{y} (t) ∣^{γ}} \leq C, and 0 \leq t \leq 1 sup 1 \leq i \leq p max E {∣ J_{i} (t) ∣^{γ}} \leq C a.s.

0 \leq t \leq 1 sup i = 1 \sum p ∣ μ_{β, i} (t) ∣^{δ} \leq s_{p} and 0 \leq t \leq 1 sup i = 1 \sum p ∣ Σ_{β, ii} (t) ∣^{δ /2} \leq s_{p} a.s.

0 \leq t \leq 1 sup i = 1 \sum p ∣ μ_{β, i} (t) ∣^{δ} \leq s_{p} and 0 \leq t \leq 1 sup i = 1 \sum p ∣ Σ_{β, ii} (t) ∣^{δ /2} \leq s_{p} a.s.

in f {w^{⊤} \nabla^{2} L_{τ, i} (β) w : w \in W_{i Δ_{n}}, ∥ w ∥_{2} = 1, ∥ β - β_{0} (i Δ_{n}) ∥_{1} \leq D} \geq κ / n .

in f {w^{⊤} \nabla^{2} L_{τ, i} (β) w : w \in W_{i Δ_{n}}, ∥ w ∥_{2} = 1, ∥ β - β_{0} (i Δ_{n}) ∥_{1} \leq D} \geq κ / n .

∣ Σ_{ij} (t) - Σ_{ij} (s) ∣ \leq C ∣ t - s ∣ lo g p a.s.

∣ Σ_{ij} (t) - Σ_{ij} (s) ∣ \leq C ∣ t - s ∣ lo g p a.s.

i max ∥ β_{i Δ_{n}} - β_{0} (i Δ_{n}) ∥_{1} \leq C s_{p} (n η)^{1 - δ} and i max ∥ β_{i Δ_{n}} - β_{0} (i Δ_{n}) ∥_{2} \leq C s_{p} (n η)^{1 - δ /2},

i max ∥ β_{i Δ_{n}} - β_{0} (i Δ_{n}) ∥_{1} \leq C s_{p} (n η)^{1 - δ} and i max ∥ β_{i Δ_{n}} - β_{0} (i Δ_{n}) ∥_{2} \leq C s_{p} (n η)^{1 - δ /2},

∥ I β - I β_{0} ∥_{m a x} \leq C b_{n},

∥ I β - I β_{0} ∥_{m a x} \leq C b_{n},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic processes and financial applications · Risk and Portfolio Optimization · Monetary Policy and Economic Impact

Full text

\floatsetup

[table]capposition=top

Robust High-Dimensional Time-Varying Coefficient Estimation

Minseok Shin and Donggyu Kim***Corresponding author. Address: College of Business, KAIST, Seoul 02455, South Korea. E-mail: [email protected].

Korea Advanced Institute of Science and Technology (KAIST)

Abstract

In this paper, we develop a novel high-dimensional coefficient estimation procedure based on high-frequency data. Unlike usual high-dimensional regression procedure such as LASSO, we additionally handle the heavy-tailedness of high-frequency observations as well as time variations of coefficient processes. Specifically, we employ Huber loss and truncation scheme to handle heavy-tailed observations, while $\ell_{1}$ -regularization is adopted to overcome the curse of dimensionality. To account for the time-varying coefficient, we estimate local coefficients which are biased due to the $\ell_{1}$ -regularization. Thus, when estimating integrated coefficients, we propose a debiasing scheme to enjoy the law of large number property and employ a thresholding scheme to further accommodate the sparsity of the coefficients. We call this Robust thrEsholding Debiased LASSO (RED-LASSO) estimator. We show that the RED-LASSO estimator can achieve a near-optimal convergence rate. In the empirical study, we apply the RED-LASSO procedure to the high-dimensional integrated coefficient estimation using high-frequency trading data.

Keywords: Debias, diffusion process, LASSO, factor model, sparsity, Huber loss, heavy-tail.

1 Introduction

With the wide availability of high-frequency financial data, researchers have developed financial models that can incorporate high-frequency data, and the empirical studies have shown that these models better account for market dynamics. For example, auto-regressive-type models have been introduced based on high-frequency-based measures, such as realized volatility and realized beta estimators (Andersen et al.,, 2006; Corsi,, 2009; Engle and Gallo,, 2006; Hansen et al.,, 2012; Kim and Wang,, 2016; Kim and Fan,, 2019; Shephard and Sheppard,, 2010; Song et al.,, 2021). Empirical studies have demonstrated that capturing the auto-regressive structures of high-frequency measures helps explain financial market dynamics. On the other hand, we often employ the realized volatility estimators when analyzing regression models, such as the Capital Asset Pricing Model (CAPM) (Lintner,, 1965; Sharpe,, 1964) and multi-factor models (Fama and French,, 1992). For example, market beta can be estimated by a ratio of the realized covariance between assets and systematic factors to the realized variance of the systematic factors (Barndorff-Nielsen and Shephard,, 2004). See Andersen et al., (2006); Mykland and Zhang, (2009); Reiß et al., (2015) for the related literatures. Li et al., (2017) derived the asymptotic efficiency bound for betas in a linear continuous-time regression model. Furthermore, to handle the time-varying feature of beta process (Ferson and Harvey,, 1999; Kalnina,, 2022; Reiß et al.,, 2015), Aït-Sahalia et al., (2020) employed time-localized regressions for the multi-factor models. Chen, (2018) introduced the general nonparametric inference for nonlinear volatility functionals of general multivariate Itô semimartingales. These models and estimation methods have shown that incorporating high-frequency data helps better account for the beta dynamics in the finite dimensional set-up.

In modern financial studies and practices, researchers have found a large number of factor candidates (Bali et al.,, 2011; Campbell et al.,, 2008; Cochrane,, 2011; Harvey et al.,, 2016; Hou et al.,, 2020; McLean and Pontiff,, 2016). Thus, we often encounter the curse of dimensionality, and the beta estimation methods designed for the finite dimension are neither efficient nor effective. To handle the high-dimensionality, we often employ LASSO (Tibshirani,, 1996), SCAD (Fan and Li,, 2001), and the Dantzig selector (Candes and Tao,, 2007) under the sparsity condition of model parameters. However, direct application of these methods cannot handle the time-varying feature of beta processes. Recently, Kim and Shin, (2022) developed a Thresholded dEbiased Dantzig (TED) estimator that can handle the high-dimensionality and time variation of beta processes. Specifically, they employed the Dantzig selector (Candes and Tao,, 2007) for each time window and estimated the integrated beta with the debiasing and truncation schemes. They established the asymptotic properties of the TED estimator under the sub-Gaussianity assumption on the high-frequency log-return data. However, we often observe that the high-frequency financial data exhibit heavy tails (Cont,, 2001; Fan and Kim,, 2018; Mao and Zhang,, 2018; Shin et al.,, 2023). Under the heavy-tailedness assumption, the existing estimation methods, including the TED estimator (Kim and Shin,, 2022), cannot consistently estimate the time-varying betas. These facts lead to the demand for developing methodologies that can simultaneously handle heavy-tailed observations, the curse of dimensionality, and time-varying beta processes.

In this paper, we develop a robust integrated beta estimator based on high-dimensional regression jump-diffusion processes. To handle the high-dimensionality and time-varying beta, we assume that the beta processes are sparse and follow a continuous diffusion process. To account for the heavy-tailedness of financial data, we assume that the residual process and jump size processes satisfy the finite $\gamma$ th moment condition for $\gamma>2$ . That is, we assume that the sources of the heavy-tailedness are the residual process and jump. We first estimate the instantaneous betas as follows. We employ the $\ell_{1}$ -penalty, Huber loss, and truncation method to manage the curse of dimensionality, heavy-tailedness of the residual process, and jumps, respectively. We show that the proposed instantaneous beta estimator has the desirable convergence rate. However, the instantaneous beta estimator has non-negligible biases coming from the Huber loss and $\ell_{1}$ -penalty. Thus, to estimate the integrated beta using the instantaneous beta estimators, we need to mitigate the biases. Since the biases are heavy-tailed, the existing debiasing scheme cannot efficiently adjust the biases. To tackle this problem, we propose a novel debiasing scheme and obtain an integrated beta estimator. We show that the debiased integrated beta estimator has a near-optimal convergence rate and outperforms the simple integration of the instantaneous beta estimators without a debiasing scheme. However, due to the bias adjustment, the debiased integrated beta estimator is not sparse; thus, we further regularize it to accommodate the sparsity. We call this the Robust thrEsholding Debiased LASSO (RED-LASSO) estimator. We also show that the RED-LASSO estimator has a near-optimal convergence rate.

The rest of paper is organized as follows. Section 2 introduces the high-dimensional regression jump-diffusion process. Section 3 proposes the RED-LASSO estimator and establishes its asymptotic properties. In Section 4, we conduct a simulation study to check the finite sample performance of the proposed estimation method. In Section 5, we apply the proposed estimation procedure to high-frequency financial data. The conclusion is presented in Section 6, and all of the proofs are collected in the Appendix.

2 The model set-up

We first fix some notations. For any given $p_{1}$ by $p_{2}$ matrix $\mathbf{A}=\left(A_{ij}\right)$ , let

[TABLE]

The Frobenius norm of $\mathbf{A}$ is denoted by $\|\mathbf{A}\|_{F}=\sqrt{\mathrm{tr}(\mathbf{A}^{\top}\mathbf{A})}$ and the matrix spectral norm $\|\mathbf{A}\|_{2}$ is the square root of the largest eigenvalue of $\mathbf{A}\mathbf{A}^{\top}$ . We will use $C$ ’s to denote generic constants whose values are free of $n$ and $p$ and may change from appearance to appearance.

Let $Y(t)$ and $\mathbf{X}(t)=\left(X_{1}(t),\ldots,X_{p}(t)\right)^{\top}$ be the dependent process and $p$ -dimensional multivariate covariate process, respectively. We employ the following non-parametric time-series regression jump-diffusion model:

[TABLE]

where $Y^{c}(t)$ and $\mathbf{X}^{c}(t)=\left(X^{c}_{1}(t),\ldots,X^{c}_{p}(t)\right)^{\top}$ are the continuous parts of $Y(t)$ and $\mathbf{X}(t)$ , respectively, $Y^{J}(t)$ is the jump part of $Y(t)$ , $J^{y}(t)$ is a jump size, $\Lambda^{y}(t)$ is a Poisson process with a bounded intensity process, $\boldsymbol{\beta}(t)=\left(\beta_{1}(t),\ldots,\beta_{p}(t)\right)^{\top}$ is a coefficient process, and $Z^{c}(t)$ is a residual process. We note that the subscript $c$ represents the continuous part of the process. The covariate process $\mathbf{X}(t)$ and residual process $Z^{c}(t)$ satisfy

[TABLE]

where $\mathbf{X}^{J}(t)$ is the jump part of $\mathbf{X}(t)$ , $\mathbf{J}(t)=\left(J_{1}(t),\ldots,J_{p}(t)\right)^{\top}$ is a jump size process, $\boldsymbol{\Lambda}(t)$ is a $p$ -dimensional Poisson process with bounded intensity processes, $\boldsymbol{\sigma}(t)$ is a $p$ by $q$ matrix, and $\mathbf{B}(t)$ and $W(t)$ are $q$ -dimensional and one-dimensional independent Brownian motions, respectively. The stochastic processes $\boldsymbol{\mu}(t)$ , $\boldsymbol{\beta}(t)$ , $\boldsymbol{\sigma}(t)$ , and $\nu(t)$ are defined on a filtered probability space $(\Omega,\mathcal{F},\{\mathcal{F}_{t},t\in[0,1]\},P)$ with filtration $\mathcal{F}_{t}$ satisfying the usual conditions, such as adapted and càdlàg process. We assume that the coefficient $\boldsymbol{\beta}(t)$ satisfies the following diffusion model:

[TABLE]

where $\boldsymbol{\nu}_{\beta}(t)$ is a $p$ by $r$ matrix, $\mathbf{W}_{\beta}(t)$ is a $r$ -dimensional independent Brownian motion, and $\boldsymbol{\mu}_{\beta}(t)$ and $\boldsymbol{\nu}_{\beta}(t)$ are predictable. The main interest of this paper is to investigate the latent regression diffusion process. In this point of view, the jump part can be considered as noises, and we discuss how to overcome this in the following section. The parameter of interest is the integrated beta:

[TABLE]

The integrated beta can be considered as the average of spot betas. That is, the integrated beta presents the average effect of the increment of the covariate process. When the beta process is constant, the integrated beta is the same as the usual beta in the regression model.

In the regression-based financial models, there are hundreds of potential factor candidates (Bali et al.,, 2011; Campbell et al.,, 2008; Cochrane,, 2011; Harvey et al.,, 2016; Hou et al.,, 2020; McLean and Pontiff,, 2016). To account for this, we allow the dimension $p$ can be large; thus, we need to handle the curse of dimensionality. To do this, we assume that the coefficient beta process $\boldsymbol{\beta}(t)=(\beta_{1}(t),\ldots,\beta_{p}(t))^{\top}$ satisfies the following sparsity condition:

[TABLE]

where $\delta\in[0,1)$ , $s_{p}$ is diverging slowly in $p$ , and $0^{0}$ is defined as 0. This general sparsity condition includes the exact sparsity condition, i.e., $\delta=0$ . We note that the exact sparsity condition implies that only several factors are significant, while most factors do not affect the dependent process. Thus, intuitively, we assume that the relatively small number of factors are significant. We note that since the beta process is an Itô diffusion process, in general, the boundedness in the sparsity condition (2.5) is satisfied with high probability. However, for simplicity, we assume the almost sure boundedness.

3 Robust high-dimensional high-frequency regression

3.1 Integrated beta estimation procedure

In this section, we propose a robust integrated beta estimation procedure for the high-dimensional regression diffusion model defined in (2.1)–(2.3). Recently, with the sub-Gaussian assumption, Kim and Shin, (2022) proposed the integrated beta estimator that can handle the curse of dimensionality and time-varying beta. However, empirical studies have demonstrated that the stock log-return data often exhibit heavy-tails (Cont,, 2001; Fan and Kim,, 2018; Mao and Zhang,, 2018; Shin et al.,, 2023). To account for this, we impose the finite moment condition for the residual process, $Z^{c}(t)$ , and jump sizes, $J^{y}(t)$ and $\mathbf{J}(t)$ (see Assumption 1). Then, we propose a robust estimation procedure. We first estimate the instantaneous betas. To do this, we employ the local regression as follows. For any process $g(t)$ and $\Delta_{n}=1/n$ , let $\Delta_{i}^{n}g=g(i\Delta_{n})-g((i-1)\Delta_{n})$ for $1\leq i\leq 1/\Delta_{n}$ . Define

[TABLE]

where $k_{n}$ is the number of observations for each local regression, $\boldsymbol{1}_{\{\cdot\}}$ is an indicator function, and $v_{j,n}$ , $j=1,\ldots,p$ , are the threshold levels. We use $v_{j,n}=C_{j,v}\sqrt{\log p}n^{-1/2}$ for some large constants $C_{j,v}$ , $j=1,\ldots,p$ . In the numerical study, we choose

[TABLE]

where the bipower variation $BV_{j}=\dfrac{\pi}{2}\sum_{i=2}^{n}|\Delta_{i-1}^{n}X_{j}|\cdot|\Delta_{i}^{n}X_{j}|$ . This choice of $v_{j,n}$ is similar to the usual choice in the literatures (Aït-Sahalia et al.,, 2020; Aït-Sahalia and Xiu,, 2019). We note that the thresholding can detect the jumps in the covariate process $X(t)$ and mitigate their impact on beta estimators. On the other hand, the thresholding is not used for the dependent process $Y(t)$ since the robustification method outlined in (3.3) and (3.5) can handle both heavy-tailedness of the residual process $Z^{c}(t)$ and jumps in the dependent process $Y(t)$ . Meanwhile, when calculating local regressions, we need to handle the curse of dimensionality and heavy-tailedness. To overcome high-dimensionality, we often employ the penalized regression procedure under the sparsity assumption. For example, we often use the LASSO (Tibshirani,, 1996) and Dantzig (Candes and Tao,, 2007) estimators with the sub-Gaussian conditions. However, these estimators cannot handle the heavy-tailed observations, and furthermore, they are not consistent. To tackle this issue, we use the following Huber loss $l_{\tau}$ (Huber,, 1964):

[TABLE]

where $\tau>0$ is the robustification parameter. We denote $l_{\tau}\left(\mathbf{x}\right)=\left(l_{\tau}\left(x_{1}\right),\ldots,l_{\tau}\left(x_{p_{1}}\right)\right)^{\top}$ for any vector $\mathbf{x}=\left(x_{1},\ldots,x_{p_{1}}\right)^{\top}\in\mathbb{R}^{p_{1}}$ . The Huber loss $l_{\tau}$ mitigates the effect of outliers coming from the heavy-tailedness of the residual process $Z^{c}(t)$ and jump size process $J^{y}(t)$ . Thus, by employing the truncation, Huber loss, and $\ell_{1}$ -regularization, we can simultaneously deal with the three issues of the jumps, heavy-tailedness, and curse of dimensionality. Specifically, we propose the following instantaneous beta estimator at time $i\Delta_{n}$ :

[TABLE]

where $\eta>0$ is the regularization parameter, and the empirical loss function is

[TABLE]

In Theorem 1, we show that the proposed instantaneous beta estimator $\widehat{\boldsymbol{\beta}}_{i\Delta_{n}}$ is consistent with appropriate $\tau$ and $\eta$ . Then, we can estimate the integrated beta using the integration of $\widehat{\boldsymbol{\beta}}_{i\Delta_{n}}$ ’s. However, their integration cannot enjoy the law of large number properties since each $\widehat{\boldsymbol{\beta}}_{i\Delta_{n}}$ is biased due to the regularization term. That is, the error of their integration is dominated by the bias terms, which leads to the same convergence rate as that of $\widehat{\boldsymbol{\beta}}_{i\Delta_{n}}$ . Thus, to reduce the effect of the bias and obtain faster convergence rate, we propose a debiasing scheme as follows. First, we estimate the inverse instantaneous volatility matrix at time $i\Delta_{n}$ , $\boldsymbol{\Omega}(i\Delta_{n})=\boldsymbol{\Sigma}^{-1}(i\Delta_{n})$ , where $\boldsymbol{\Sigma}(t)=\boldsymbol{\sigma}(t)\boldsymbol{\sigma}^{\top}(t)$ . Specifically, we use the following constrained $\ell_{1}$ -minimization for inverse matrix estimation (CLIME) (Cai et al.,, 2011):

[TABLE]

where $\lambda$ is the tuning parameter, which will be specified in Theorem 2. With the inverse volatility matrix estimator $\widehat{\boldsymbol{\Omega}}_{i\Delta_{n}}$ , we usually adjust the instantaneous beta estimator $\widehat{\boldsymbol{\beta}}_{i\Delta_{n}}$ as follows:

[TABLE]

This debiasing scheme performs well under the sub-Gaussian assumption (Javanmard and Montanari,, 2014, 2018; Kim and Shin,, 2022; Van de Geer et al.,, 2014). However, $\Delta_{i}^{n}Z^{c}$ has only finite $\gamma$ th moment for $\gamma>2$ ; thus, the debiased instantaneous beta estimator has the heavy-tails. To handle this issue, we employ the Winsorization method as follows. Define the truncation (Winsorization) function

[TABLE]

where $\varpi>0$ is a truncation parameter and denote $\psi_{\varpi}\left(\mathbf{x}\right)=\left(\psi_{\varpi}\left(x_{1}\right),\ldots,\psi_{\varpi}\left(x_{p_{1}}\right)\right)^{\top}$ for any vector $\mathbf{x}=\left(x_{1},\ldots,x_{p_{1}}\right)^{\top}\in\mathbb{R}^{p_{1}}$ . Using this truncation function, we adjust $\widehat{\boldsymbol{\beta}}_{i\Delta_{n}}$ as

[TABLE]

where the truncation parameter $\varpi$ will be specified in Theorem 2. We note that for the debiasing step, we use the non-overlapping window for $\mathcal{X}$ and $\mathcal{Y}$ , which helps enjoy the martingale property. Specifically, since $\boldsymbol{\beta}_{0}((i+k_{n})\Delta_{n})-\widehat{\boldsymbol{\beta}}_{i\Delta_{n}}$ is measurable at time $(i+k_{n})\Delta_{n}$ , we can handle the noises from $\mathcal{X}_{(i+k_{n})}$ and $\mathcal{Y}_{(i+k_{n})}$ using the martingale convergence theorem. We also note that the purpose of the debiasing is to enjoy the law of large number property when obtaining the integrated beta estimator. Usually, the debiasing scheme is employed to obtain the asymptotic normality, which enables the hypothesis test or confidence interval construction (Javanmard and Montanari,, 2014, 2018; Van de Geer et al.,, 2014; Zhang and Zhang,, 2014). However, in this paper, we do not focus on this issue and mainly focus on the integrated beta estimation. Then, the integrated beta estimator is defined as follows:

[TABLE]

The debiased LASSO integrated beta estimator $\widehat{I\beta}$ can achieve a faster convergence rate than the simple integration of the instantaneous beta estimators. However, due to the bias adjustment term, it cannot account for the sparsity structure of the integrated beta. To accommodate the sparsity, we employ the following thresholding scheme:

[TABLE]

where the thresholding function $s(\cdot)$ satisfies $|s(x)-x|\leq h_{n}$ and $h_{n}$ is a thresholding level, which will be specified in Theorem 3. For example, we can employ the hard thresholding function $s(x)=x$ or soft thresholding function $s(x)=x-\mbox{sign}(x)h_{n}$ . In the empirical study, we used the hard thresholding function $s(x)=x$ . We call this the Robust thrEsholding Debiased LASSO (RED-LASSO) estimator. We describe the RED-LASSO estimation procedure in Algorithm 1.

3.2 Theoretical results

In this section, we investigate asymptotic properties of the proposed RED-LASSO estimation procedure. To investigate the theoretical properties, we make the following assumptions.

Assumption 1.

(a)

The residual process $Z^{c}(t)$ and jump size processes, $J^{y}(t)$ and $\mathbf{J}(t)=\left(J_{1}(t),\ldots,J_{p}(t)\right)^{\top}$ , satisfy, for some $\gamma>2$ ,

[TABLE]

(b)

The processes $\boldsymbol{\mu}(t)$ , $\boldsymbol{\mu}_{\beta}(t)$ , $\boldsymbol{\beta}(t)$ , $\boldsymbol{\Sigma}(t)$ , and $\boldsymbol{\Sigma}_{\beta}(t)=\boldsymbol{\nu}_{\beta}(t)\boldsymbol{\nu}_{\beta}^{\top}(t)$ are almost surely entry-wise bounded, and $\|\boldsymbol{\Sigma}^{-1}(t)\|_{1}\leq C$ ** a.s.**

(c)

The processes $\boldsymbol{\mu}_{\beta}(t)=\left(\mu_{\beta,1}(t),\ldots,\mu_{\beta,p}(t)\right)^{\top}$ and $\boldsymbol{\Sigma}_{\beta}(t)=\left(\Sigma_{\beta,ij}(t)\right)_{i,j=1,\ldots,p}$ satisfy the following sparsity condition for $\delta\in[0,1)$ :

[TABLE]

(d)

$n^{c_{1}}\leq p\leq c_{2}\exp(n^{c_{3}})$ * for some positive constants $c_{1}$ , $c_{2}$ , and $c_{3}<1/6$ , and $s_{p}^{2}\log p\Delta_{n}k_{n}\rightarrow 0$ as $n,p\rightarrow\infty$ .*

(e)

*Define $\mathcal{W}_{t}=\left\{\mathbf{w}\in\mathbb{R}^{p}:\text{ }\left\|\mathbf{w}_{S_{t}^{c}}\right\|_{1}\leq 3\left\|\mathbf{w}_{S_{t}}\right\|_{1}+4\left\|(\boldsymbol{\beta}_{0}(t))_{S_{t}^{c}}\right\|_{1}\right\}$ , where $\mathbf{w}_{S_{t}^{c}}$ is the subvector obtained by stacking $\left\{\mathbf{w}_{j}:\text{ }j\in S_{t}^{c}\right\}$ , $\mathbf{w}_{S_{t}}$ is the subvector obtained by stacking $\left\{\mathbf{w}_{j}:\text{ }j\in S_{t}\right\}$ , $(\boldsymbol{\beta}_{0}(t))_{S_{t}^{c}}$ is the subvector obtained by stacking $\left\{(\boldsymbol{\beta}_{0}(t))_{j}:\text{ }j\in S_{t}^{c}\right\}$ , and $S_{t}=\{j:\text{ jth element}$ *

of $|\boldsymbol{\beta}_{0}(t)|>n\eta\}$ . Then, there exists a positive constant $\kappa$ such that the following inequality holds for some $D=(8+48/\kappa)s_{p}(n\eta)^{1-\delta}$ and $0\leq i\leq n-k_{n}$ , where the specific value of $\eta$ is given in Theorem 1:

[TABLE]

(f)

The volatility process $\boldsymbol{\Sigma}(t)=(\Sigma_{ij}(t))_{i,j=1,\ldots,p}$ satisfies the following condition:

[TABLE]

Remark 1.

Assumption 1(a) is the finite moment condition, which implies that the dependent process $Y(t)$ , covariate process $\mathbf{X}(t)$ , and residual process $Z^{c}(t)$ have heavy-tails. We note that the moment condition for $Z^{c}(t)$ is satisfied when $\Delta_{i}^{n}Z^{c}$ is an independent random variable and $\mathbb{E}\left\{|\Delta_{i}^{n}Z^{c}|^{\gamma}\right\}\leq Cn^{-\gamma/2}$ , or $\sup_{0\leq t\leq 1}\sup_{t\leq s\leq 1}\mathbb{E}\left\{|\nu(s)|^{\gamma}\Big{|}\mathcal{F}_{t}\right\}\leq C\text{ a.s.}$ The latter condition can be satisfied when $\nu(t)$ consists of the bounded continuous process and independent jump process. The boundedness condition Assumption 1(b) implies the sub-Gaussianity for the continuous part of the covariate process, $\mathbf{X}^{c}(t)$ , and target parameter, $\boldsymbol{\beta}(t)$ , which are often required to investigate high-dimensional inferences. However, the boundedness condition can be relaxed to the locally boundedness condition by Lemma 4.4.9 in Jacod and Protter, (2011). Specifically, if the asymptotic result, such as stable convergence in law or convergence in probability, is satisfied under the boundedness condition, it is also satisfied under the locally boundedness condition. On the other hand, for the continuous-time regression model, we usually assume that the smallest eigenvalue of $\boldsymbol{\Sigma}(t)$ is bounded from below, which implies that the largest eigenvalue of $\boldsymbol{\Sigma}^{-1}(t)$ is bounded. In this point of view, the condition $\|\boldsymbol{\Sigma}^{-1}(t)\|_{1}\leq C\,\text{ a.s.}$ is not restrictive. Even if this condition is replaced by the sparsity condition $\sup_{0\leq t\leq 1}\max_{1\leq i\leq p}\sum_{j=1}^{p}|\omega_{ij}(t)|^{q}\leq s_{\omega,p}\,\text{ a.s.}$ , where $\boldsymbol{\Sigma}^{-1}(t)=(\omega_{ij}(t))_{i,j=1,\ldots,p}$ , and $q\in[0,1)$ and $s_{\omega,p}$ are the sparsity related variables, the difference in theoretical results is up to $s_{\omega,p}$ order. Assumption 1(c) is the sparsity condition for the beta process, which is required to investigate the discretization error when estimating instantaneous betas. Assumption 1(e) is the eigenvalue condition for the Hessian matrix $\nabla^{2}{\mathcal{L}}_{\tau,i}(\boldsymbol{\beta})$ , which is called the localized restricted eigenvalue ( $LRE$ ) condition (Fan et al.,, 2018; Sun et al.,, 2020). This implies strictly positive restricted eigenvalues over a local neighborhood. We note that $n\eta$ converges to zero for the choice of $\eta$ in Theorems 1–2. When the coefficient process $\boldsymbol{\beta}(t)$ satisfies the exact sparsity condition, i.e., $\delta=0$ , $\mathcal{W}_{t}$ is replaced by a $\ell_{1}$ -cone $\left\{\mathbf{w}\in\mathbb{R}^{p}:\text{ }\left\|\mathbf{w}_{S_{t}^{c}}\right\|_{1}\leq 3\left\|\mathbf{w}_{S_{t}}\right\|_{1}\right\}$ , where $S_{t}=\{j:\text{ jth element of }\boldsymbol{\beta}_{0}(t)\neq 0\}$ . Finally, we need the continuity condition Assumption 1(f) to investigate asymptotic behaviors of the CLIME estimator. We note that this condition is obtained with high probability when $\boldsymbol{\Sigma}(t)$ follows a continuous Itô diffusion process with bounded drift and instantaneous volatility processes.

The following theorem derives the asymptotic properties of instantaneous beta estimator $\widehat{\boldsymbol{\beta}}_{i\Delta_{n}}$ . Note that the subscript [math] represents the true parameters.

Theorem 1.

*Under Assumption 1(a)–(e), let $k_{n}=c_{k}n^{c}$ for some constants $c_{k}$ and $c\in[3/8,3/4]$ . For any given positive constant $a$ , choose $\tau=C_{\tau,a}\sqrt{k_{n}\Delta_{n}}(\log p)^{-3/4}$ and $\eta=C_{\eta,a}\Big{[}s_{p}n^{-3/2}\sqrt{k_{n}\log p}$

$+n^{-1}k_{n}^{-1/2}(\log p)^{3/4}\Big{]}$ for some large constants $C_{\tau,a}$ and $C_{\eta,a}$ . Then, we have, for large $n$ ,*

[TABLE]

with probability greater than $1-p^{-a}$ .

Remark 2.

Theorem 1 shows the $\ell_{1}$ and $\ell_{2}$ norm error bounds of the instantaneous beta estimator. We note that as $k_{n}$ increases, the statistical estimation error decreases and time variation approximation error increases. To achieve the optimality, we choose $c=1/2$ , which implies that these two errors have the same convergence rate. Then, the instantaneous beta estimator has the $\ell_{1}$ convergence rate of $n^{-(1-\delta)/4}$ and $\ell_{2}$ convergence rate of $n^{-(2-\delta)/8}$ with the $\log$ order and sparsity level terms.

To estimate the integrated beta, we can use the integration of the instantaneous beta estimators. However, as discussed in Section 3.1, it cannot enjoy the law of large number property due to the heavy-tailed biases. To tackle this problem, we employ the robust debiasing method (3.5) and obtain the debiased LASSO integrated beta estimator $\widehat{I\beta}$ in (3.6). The following theorem establishes the asymptotic behaviors of $\widehat{I\beta}$ .

Theorem 2.

Under the assumptions in Theorem 1 and Assumption 1(f), choose $k_{n}=c_{k}n^{1/2}$ for some constant $c_{k}$ . For any given positive constant $a$ , let $\lambda=C_{\lambda,a}n^{-1/4}\sqrt{\log p}$ and $\varpi=C_{\varpi}s_{p}^{2-\delta}n^{\delta/4}(\log p)^{(1-3\delta)/4}$ for some constants $C_{\lambda,a}$ and $C_{\varpi}$ . Then, we have, with probability greater than $1-p^{-a}$ ,

[TABLE]

where $b_{n}=s_{p}^{2-\delta}n^{(-2+\delta)/4}(\log p)^{(5-3\delta)/4}+s_{p}n^{-1/2}\left(\log p\right)^{3/2}$ .

Remark 3.

Theorem 2 shows the max norm error bound of the debiased LASSO integrated beta estimator. When the beta process satisfies the exact sparsity condition, i.e., $\delta=0$ , the debiased LASSO integrated beta estimator has the convergence rate of $s_{p}^{2}n^{-1/2}\left(\log p\right)^{5/4}+s_{p}n^{-1/2}\left(\log p\right)^{3/2}$ , while we have a slower convergence rate of $s_{p}^{2}n^{-1/4}\sqrt{\log p}+s_{p}n^{-1/4}\left(\log p\right)^{3/4}$ without a debiasing scheme. The $n^{1/2}$ term is the optimal convergence rate of estimating model parameters given $n$ observations. For the $\log$ order term, the usual optimal rate is $\sqrt{\log p}$ in high dimensional inferences. However, we have $\left(\log p\right)^{3/2}$ term since the additional $\log p$ term comes from bounding the time-varying processes, such as the target process $\boldsymbol{\beta}(t)$ . In sum, the debiased LASSO integrated beta estimator has the optimal converence rate with up to $\log p$ and $s_{p}$ orders.

Theorem 2 reveals that the debiased LASSO integrated beta estimator performs better than the integration of the instantaneous beta estimators. Finally, to account for the sparsity structure, we threshold the debiased LASSO integrated beta estimator and obtain the RED-LASSO estimator. Theorem 3 establishes the $\ell_{1}$ convergence rate of the RED-LASSO estimator.

Theorem 3.

Under the assumptions in Theorem 2, for any given positive constant $a$ , choose $h_{n}=C_{h,a}b_{n}$ for some constant $C_{h,a}$ . Then, we have, with probability greater than $1-p^{-a}$ ,

[TABLE]

Theorem 3 shows that the proposed RED-LASSO estimator is consistent in terms of the $\ell_{1}$ norm. We note that under the sub-Gaussian assumption on the log-return data, Kim and Shin, (2022) proposed the integrated beta estimator that has the $\ell_{1}$ convergence rate of $s_{p}a_{n}^{1-\delta}$ , where $a_{n}=s_{p}^{2-\delta}n^{(-2+\delta)/4}(\log p)^{(2-\delta)/2}+s_{p}s_{\omega,p}n^{(-2+q)/4}(\log p)^{(2-q)/2}+s_{p}n^{-1/2}\left(\log p\right)^{3/2}$ , and $s_{\omega,p}$ and $q$ are the sparsity related terms for the inverse volatility matrix. Thus, the cost of handling the heavy-tailedness is at most $\log p$ order.

3.3 Discussion on the tuning parameter selection

In this section, we discuss how to choose the tuning parameters to implement the RED-LASSO estimation procedure. We first obtain the variables $\Delta_{i}^{n}\widehat{X}^{c}_{j}$ , $j=1,\ldots,p$ , based on the threshold level (3.1). Then, to handle the scale problem, we standardize the variables $\Delta_{i}^{n}{Y}$ and $\Delta_{i}^{n}\widehat{X}^{c}_{j}$ , $j=1,\ldots,p$ , to have a zero mean and unit variance. The re-scaling is employed after obtaining the RED-LASSO estimator. In the local regression stage (3.2), we select $k_{n}=[n^{1/2}]$ . Also, we choose

[TABLE]

where $c_{\tau}$ , $c_{\eta}$ , $c_{\lambda}$ , $c_{\varpi}$ , and $c_{h}$ are tuning parameters. For the simulation and empirical studies, we choose $c_{\tau}$ , $c_{\varpi}$ , and $c_{h}$ that minimize the corresponding mean squared prediction error (MSPE). The results are $c_{\tau}=16$ , $c_{\varpi}=1/64$ , and $c_{h}=1/4$ . Details can be found in Section 5. Also, we select $c_{\eta}\in[0.1,10]$ , which minimizes the corresponding Bayesian information criterion (BIC). Finally, we choose $c_{\lambda}\in[0.1,10]$ that minimizes the following loss function:

[TABLE]

where $\mathbf{I}_{p}$ is the $p$ -dimensional identity matrix.

4 A simulation study

To check the finite sample performance of the proposed RED-LASSO estimator, we conducted simulations. Based on the models (2.1)–(2.3), we generated the data using the heavy-tail and sub-Gaussian processes with frequency $1/n^{all}$ . Specifically, we employed the following time-series regression jump-diffusion model:

[TABLE]

where the jump sizes $J_{i}(t)$ and $J^{y}(t)$ were obtained from 0.1 times i.i.d. $t$ -distribution with degrees of freedom $df$ , and $\boldsymbol{\Lambda}(t)=\left(\Lambda_{1}(t),\ldots,\Lambda_{p}(t)\right)^{\top}$ and $\Lambda^{y}(t)$ were generated by Poisson processes with the intensities $\left(20,\ldots,20\right)^{\top}$ and $10$ , respectively. We chose $df$ as $2$ and $\infty$ for the heavy-tailed and sub-Gaussian processes, respectively. The initial values of $X(t)$ and $Y(t)$ were set as zero, and we generated $\nu(t)$ as follows:

[TABLE]

where $t_{df,l}$ , $l=1,\ldots,n^{all}$ , are the i.i.d. $t$ -distributions with degrees of freedom $df$ , and $\nu^{\prime}(t_{l})$ , $l=1,\ldots,n^{all}$ , were generated from the following Ornstein-Uhlenbeck process:

[TABLE]

where $\nu^{\prime}(0)=0.5$ and $\mathbf{W}^{\nu}(t)$ is an independent Brownian motion. We note that the process $\nu(t)$ is not realistic. However, to investigate the effect of the heavy-tailedness of the return process, the structure of $\nu(t)$ is imposed. To generate the volatility process $\boldsymbol{\sigma}(t)$ , we first generated the Ornstein-Uhlenbeck process $u(t)$ as follows:

[TABLE]

where $u(0)=1$ and $\mathbf{W}^{u}(t)$ is an independent Brownian motion. Then, we took $\boldsymbol{\sigma}(t)$ as a Cholesky decomposition of $\boldsymbol{\Sigma}(t)=\left(\Sigma_{ij}(t)\right)_{1\leq i,j\leq p}$ , where $\Sigma_{ij}(t)=u(t)0.8^{|i-j|}$ . To generate the coefficient $\boldsymbol{\beta}(t)$ , we considered the exact sparse process, i.e., $\beta_{i}(t)=0$ for $[s_{p}]+1\leq i\leq p$ . Specifically, we generated $\boldsymbol{\beta}(t)$ as follows:

[TABLE]

where $\boldsymbol{\mu}_{\beta}(t)=\left(\mu_{1,\beta}(t),\ldots,\mu_{p,\beta}(t)\right)^{\top}$ , $\boldsymbol{\nu}_{\beta}(t)=\left(\nu_{i,j,\beta}(t)\right)_{1\leq i,j\leq p}$ , and $\mathbf{W}_{\beta}(t)$ is a $p$ -dimensional independent Brownian motion. For $1\leq i\leq[s_{p}]$ , the initial value $\beta_{i}(0)=1$ and $\mu_{i,\beta}(t)=0.1$ for $0\leq t\leq 1$ . The process $\left(\nu_{i,j,\beta}(t)\right)_{1\leq i,j\leq[s_{p}]}$ was taken to be $\xi(t)\mathbf{I}_{[s_{p}]}$ , where $\mathbf{I}_{[s_{p}]}$ is the $[s_{p}]$ -dimensional identity matrix and $\xi(t)$ follows the Ornstein-Uhlenbeck process:

[TABLE]

where $\xi(0)=0.15$ and $\mathbf{W}^{\xi}(t)$ is an independent Brownian motion. We chose $p=100$ , $s_{p}=\log p$ , $n^{all}=4000$ , and we varied $n$ from $1000$ to $4000$ . When implementing the RED-LASSO estimation procedure, the tuning parameters were selected as discussed in Section 3.3.

To investigate the effect of the robustification of the RED-LASSO estimator, we employed a thrEsholding Debiased LASSO (ED-LASSO) estimator. The ED-LASSO estimator uses the same estimation procedure as the RED-LASSO estimator with $\tau=\varpi=\infty$ . Since the ED-LASSO estimator does not employ the Huber loss and Winsorization method, the jump adjustment for the dependent process $Y(t)$ is needed. Thus, we used $\mathcal{Y}_{i}^{\prime}$ instead of $\mathcal{Y}_{i}$ for the ED-LASSO estimator, where

[TABLE]

In the simulation and empirical studies, we choose $u_{n}=3\sqrt{BV^{Y}}n^{-1/2}$ , where the bipower variation $BV^{Y}=\dfrac{\pi}{2}\sum_{i=2}^{n}|\Delta_{i-1}^{n}Y|\cdot|\Delta_{i}^{n}Y|$ . We note that the ED-LASSO estimator can enjoy the same theoretical properties as the RED-LASSO estimator under the sub-Gaussian process, but it cannot explain the heavy-tailed process. As a benchmark, we also considered the LASSO estimator (Tibshirani,, 1996), which cannot account for any of the heavy-tailed distribution or the time-varying beta process. Specifically, we employed the LASSO estimator as follows:

[TABLE]

where the regularization parameter $\eta^{{\rm LASSO}}\in[0.1,10]$ was selected by minimizing the corresponding Bayesian information criterion (BIC). The average estimation errors under the max norm, $\ell_{1}$ norm, and $\ell_{2}$ norm were computed by 1000 simulations.

Figure 1 plots the log max, $\ell_{1}$ , and $\ell_{2}$ norm errors of the RED-LASSO, ED-LASSO, and LASSO estimators with $n=1000,2000,4000$ for the heavy-tail and sub-Gaussian processes. From Figure 1, we can find that the estimation errors of the RED-LASSO estimator decrease as the sample size $n$ increases. As expected, the RED-LASSO estimator performed the best for the heavy-tail process. This may be because the RED-LASSO estimator can explain the heavy-tailedness while other estimators cannot. For the sub-Gaussian process, the RED-LASSO and ED-LASSO estimators showed better performance than the LASSO estimator. This is because the LASSO estimator cannot account for the time variation of the beta process. We note that, even for the sub-Gaussian process, the RED-LASSO estimator showed better performance than the ED-LASSO estimator. One possible explanation for this is that the true return process can have some extreme values over time even if the sub-Gaussian random variables are used. From this result, we can conjecture that the RED-LASSO estimator is robust to the heavy-tailedness of the log-return process.

5 An empirical study

In this section, we applied the proposed RED-LASSO estimator to high-frequency trading data from January 2013 to December 2019. We took stock price data, futures price data, and firm fundamentals from the End of Day website, FirstRate Data website, and Center for Research in Security Prices (CRSP)/Compustat Merged Database, respectively. We obtained 5-min log-price data with the previous tick scheme (Wang and Zou,, 2010; Zhang,, 2011) and processed the data similar to the procedure in Kim and Shin, (2022). The days with half trading hours were not included. For the dependent process, we collected the log-price data of the following five assets: Apple Inc. (AAPL), Berkshire Hathaway Inc. (BRK.B), General Motors Company (GM), Alphabet Inc. (GOOG), and Exxon Mobil Corporation (XOM). These firms have the top market values in their global industry classification standard (GICS) sectors. For the covariate process, we first obtained the log-prices of 54 futures, which are often used as the market macro variables. For example, we selected 20 commodity data, 10 currency data, 10 interest rate data, and 14 stock market index data. The specific list is presented in Table 4 in the Appendix. Then, we constructed Fama-French five factors (Fama and French,, 2015) and the momentum factor (Carhart,, 1997) with the assets listed on NYSE, NASDAQ, and AMEX, which are widely used in the stock market analysis. We note that the MKT, HML, SMB, RMW, CMA, and MOM represent the market, value, size, profitability, investment, and momentum factors, respectively. First, we calculated MKT as the return of a value-weighted portfolio of whole assets. Then, we obtained other factors as follows:

[TABLE]

where small (S) and big (B) portfolios represent the small and big market equities, respectively, while we classified high (H), medium (M), and low (L) portfolios according to their ratio of book equity to market equity. On the other hand, robust (R), neutral (N), and weak (W) portfolios were classified by their profitability, while we obtained conservative (C), neutral (N), and aggressive (A) portfolios using their investment data. Also, up (U), flat (F), and down (D) portfolios were classified by their momentum of the return. The portfolio constituents were updated monthly, and, with 5-min frequency, we obtained the portfolio return as follows:

[TABLE]

where $WRet_{d,i}$ is the portfolio return for the $d$ th day and $i$ th time interval, $N_{d}$ is the number of portfolio components on the $d$ th day, the superscript $j$ is used to represent the $j$ th stock of the portfolio, and $w_{d,i}^{j}$ is calculated by

[TABLE]

where $w_{d}^{j}$ is the market capitalization of the $j$ th stock at the market close time on the day $d-1$ , and $Ret_{d,0}^{j}$ represents the overnight return from the day $\left(d-1\right)$ to day $d$ . To sum up, the five assets and 60 factors were used for the dependent and covariate processes, respectively. The details of the data processing can be found in Aït-Sahalia et al., (2020) and Kim and Shin, (2022).

To determine the tuning parameters $c_{\tau}$ , $c_{\varpi}$ , and $c_{h}$ , we employed the mean squared prediction error (MSPE) with the data in $2013$ . For the choice of $c_{\tau}$ , we defined

[TABLE]

where $\widehat{\boldsymbol{\beta}}_{i\Delta_{n}}^{j,s}\left(c_{\tau}\right)$ is the instantaneous beta estimator at time $i\Delta_{n}$ with the tuning parameter $c_{\tau}$ for the $j$ th month in $2013$ and $s$ th stock. Then, we selected $c_{\tau}$ by minimizing $\Lambda(c_{\tau})$ over $c_{\tau}\in\left\{2^{l}\mid 0\leq l\leq 10,\,l\in\mathbb{Z}\right\}.$ Based on the selected $c_{\tau}$ , we defined

[TABLE]

where $\widetilde{\boldsymbol{\beta}}_{i\Delta_{n}}^{j,s}\left(c_{\varpi}\right)$ is the debiased instantaneous beta estimator at time $i\Delta_{n}$ with the tuning parameter $c_{\varpi}$ for the $j$ th month in $2013$ and $s$ th stock. We chose $c_{\varpi}$ which minimizes $\Lambda(c_{\varpi})$ over $c_{\varpi}\in\left\{2^{l}\mid-10\leq l\leq 0,\,l\in\mathbb{Z}\right\}$ . Finally, with the selected $c_{\tau}$ and $c_{\varpi}$ , we defined

[TABLE]

where $\widetilde{I\beta}^{j,s}\left(c_{h}\right)$ is the RED-LASSO estimator with the tuning parameter $c_{h}$ and $\widehat{I\beta}^{j,s}$ is the debiased integrated beta estimator for the $j$ th month in $2013$ and $s$ th stock. Then, we selected $c_{h}$ by minimizing $\Lambda(c_{h})$ over $c_{h}\in\left\{2^{l}\mid-5\leq l\leq 5,\,l\in\mathbb{Z}\right\}$ . The results are $c_{\tau}=16$ , $c_{\varpi}=1/64$ , and $c_{h}=1/4$ . We note that the stationarity assumption for the beta process is reasonable, which motivates and justifies the above tuning parameter selection procedure. Then, using the RED-LASSO, ED-LASSO, and LASSO estimation procedures, we obtained the monthly integrated betas for each of the five assets. The tuning parameters were selected based on Section 3.3 and Section 4. For the non-trading period, we set the beta estimates as zero.

We first compare the performances of the RED-LASSO, ED-LASSO, and LASSO estimators. To do this, we calculated the monthly in-sample and out-of-sample $R^{2}$ with the monthly integrated beta estimates. The out-of-sample $R^{2}$ was calculated using the integrated betas from the previous month, and it was obtained excluding the year 2013 since the tuning parameters were chosen based on the data in 2013. For each year, we calculated the average $R^{2}$ across the five assets and twelve months. Table 1 shows the average in-sample and out-of-sample $R^{2}$ of the RED-LASSO, ED-LASSO, and LASSO estimators. As seen in Table 1, the RED-LASSO estimator shows the best performance for all periods. This may be because only the RED-LASSO estimator can handle both the heavy-tailed distribution of the return process and time-varying property of the beta process.

Table 2 shows the non-zero frequency of the RED-LASSO, ED-LASSO, and LASSO estimators for the five assets and 60 factors over 84 months. Table 3 shows the monthly average of non-zero frequency over factors and time for the RED-LASSO, ED-LASSO, and LASSO estimators for the five assets. As seen in Tables 2 and 3, the RED-LASSO estimator can better account for the sparsity of the integrated betas than the ED-LASSO and LASSO estimators. From this result, we can conjecture that the proposed RED-LASSO provides more sparse beta estimates, which is the important property in practice. Furthermore, as discussed above, the RED-LASSO estimator shows the best performance in terms of $R^{2}$ in Table 1. That is, the RED-LASSO estimator can explain the market dynamics well with a simpler model. We note that for the RED-LASSO estimates, the stock market index futures factors had non-zero integrated betas more often than the other futures factors. This result is consistent with the multi-factor models (Asness et al.,, 2013; Carhart,, 1997; Fama and French,, 1992, 2015) since the market factors can be partially explained by the stock market index futures factors.

Now, we investigate the result of the RED-LASSO estimator. Figure 2 shows the monthly integrated betas from the RED-LASSO estimator for the five assets and 60 factors. Figure 3 depicts the non-zero frequency of the RED-LASSO estimator for the five groups, consisting of the commodity futures group, currency futures group, interest rate futures group, stock market index futures group, and market factor group. From Figures 2 and 3, we see that integrated betas change over time, and only a small number of factors had non-zero integrated betas in most periods. To investigate time-series of the significant betas, we plotted the integrated beta estimates for the three factors that most frequently had non-zero integrated betas in Figure 4. The AAPL has NQ (E-mini Nasdaq 100), ES (E-mini S&P 500), and YM (E-mini Dow); BRK.B has MKT, ES, and YM; GM has MKT, MOM, and RMW; GOOG has NQ, ES, and MME (MSCI Emerging Markets Index); and XOM has MKT, XAE (E-mini Energy Select Sector), and MOM. In sum, either the NQ factor or MKT factor most frequently had non-zero integrated betas, while the other factors had non-zero integrated betas only for some time periods.

When modeling regression-based financial models, we often employ the six factors, MKT, HML, SMB, RMW, CMA, and MOM (Asness et al.,, 2013; Carhart,, 1997; Fama and French,, 2015, 2016). To investigate their beta behaviors in more detail, we plotted the integrated betas with the RED-LASSO and ED-LASSO estimators for these six factors in Figure 5. As expected, the MKT factor played a significant role for BRK.B, GM, and XOM; however, the six factors had zero integrated betas in most periods for the AAPL and GOOG. This may be because technology companies, such as AAPL and GOOG, have recently shown outstanding performance in the U.S. market; thus, the NQ (E-mini Nasdaq-100) factor can explain their movements well, as shown in Figure 4. We note that the results of the two other estimators are similar, but the RED-LASSO estimator has the more stable result. Thus, we can conjecture that considering both heavy-tailed distribution and time variation of beta process helps better explain the beta dynamics.

6 Conclusion

In this paper, we developed a novel RED-LASSO estimation procedure that can handle the heavy-tailedness of financial data and account for the time variation and sparsity of the high-dimensional beta process. To estimate the instantaneous beta, we propose a robust estimator that employs the Huber loss, truncation method, and $\ell_{1}$ -penalty. We demonstrated that the proposed instantaneous beta estimator can handle the heavy-tailedness and the curse of dimensionality with a desirable convergence rate. To handle the heavy-tailed bias coming from the Huber loss and $\ell_{1}$ -penalty, we developed a robust debiasing scheme and propose an integrated beta estimator. We showed that the proposed debiasing method sufficiently mitigates the effect of the bias, and the integrated beta estimator can enjoy the law of large number property. Then, the debiased integrated beta estimator is further regularized to account for the sparsity of the integrated beta. We demonstrated that the proposed RED-LASSO estimator can achieve the near-optimal convergence rate.

In the empirical study, the RED-LASSO estimation procedure shows the best performance in terms of $R^{2}$ and the sparsity of the beta estimates. It suggests that when estimating integrated beta in the high-dimensional high-frequency set-up, the RED-LASSO estimation method helps account for the features of the time-varying beta process and heavy-tailed distributions of observed log-returns. On the other hand, we did not consider microstructure noises. The microstructure noise could be another source of the heavy tails and accommodating them leads to an application for higher frequency observations. However, if we impose the microstructure noise structure on the regression diffusion model, we have an unbalanced order relationship between the noise and regression variables, which ruins the usual regression structure. Hence, it is difficult to apply the existing estimation methods. It would be interesting and important to develop a robust estimation method that can handle microstructure noises. We leave this issue for a future study.

Funding

This work was supported by the National Research Foundation of Korea [2021R1C1C1003216].

Appendix A Appendix

A.1 Proof of Theorem 1

Without loss of generality, it is enough to show the statement for fixed $i$ . For simplicity, we denote $\boldsymbol{\beta}_{0}(i\Delta_{n})$ by $\boldsymbol{\beta}_{0}=(\beta_{10},\ldots,\beta_{p0})^{\top}$ .

Proposition 1.

Under the assumptions in Theorem 1, we have

[TABLE]

with probability greater than $1-p^{-a}$ for any given positive constant $a$ .

Proof of Proposition 1. Define

[TABLE]

We have

[TABLE]

Thus, for $1\leq j\leq p$ , we have

[TABLE]

where

[TABLE]

First, we consider $(I)_{j}$ . By the boundedness condition Assumption 1(b), we can show, with probability at least $1-p^{-2-a}$ ,

[TABLE]

for some positive constant $C_{X}$ . Then, we have

[TABLE]

For $(I)_{j}^{(1)}$ , by (A.3), we have

[TABLE]

For $(I)_{j}^{(2)}$ , let $f(h)=\psi_{\tau}\left(\mathcal{Z}_{ih}+\mathcal{\widetilde{X}}_{ih}\right)\Delta_{i+h}^{n}X_{j}^{c}\boldsymbol{1}\left(|\Delta_{i+h}^{n}X_{j}^{c}|\leq C_{X}\sqrt{\log p}n^{-1/2}\right)$ . Then, we have

[TABLE]

Consider the first term. Similar to the proofs of Theorem 1 (Kim and Shin,, 2022), we can show, for any constant $b\geq 1$ ,

[TABLE]

Then, by the Cauchy–Schwarz inequality, we have, with probability at least $1-p^{-2-a}$ ,

[TABLE]

Also, for $1\leq h\leq k_{n}$ , we have

[TABLE]

Thus, we have, with probability at least $1-p^{-2-a}$ ,

[TABLE]

Consider the second term. By (A.10) and Hölder’s inequality, we have, with probability at least $1-p^{-2-a}$ ,

[TABLE]

Then, using the fact that

[TABLE]

we have, with probability at least $1-p^{-2-a}$ ,

[TABLE]

where the second inequality is due to the Hölder’s inequality and the last inequality is from (A.10) and (A.14). By (A.12) and (A.15), we have

[TABLE]

For $(I)_{j}^{(3)}$ , by (2.1) in Freedman, (1975), we have

[TABLE]

Also, by (A.10) and (A.14), we have, with probability at least $1-p^{-2-a}$ ,

[TABLE]

where the second inequality is due to the Hölder’s inequality. Thus, we have

[TABLE]

By (A.7), (A.9), (A.19), and (A.22), we have, with probability at least $1-5p^{-1-a}$ ,

[TABLE]

Consider $(II)_{j}$ . For some large constant $C>0$ , define

[TABLE]

By (A.3), we have

[TABLE]

By the boundedness of the intensity process, we have

[TABLE]

Under the event $Q_{1}\cap Q_{2}$ , we have, for large $n$ ,

[TABLE]

Thus, we have

[TABLE]

We note that, for any $x_{1},x_{2},y_{1},y_{2}\in\mathbb{R}$ ,

[TABLE]

Hence, under the event $Q_{1}\cap Q_{2}\cap Q_{3}$ , we have

[TABLE]

which implies

[TABLE]

Combining (A.2), (A.23), and (A.25), we have, with probability greater than $1-p^{-a}$ ,

[TABLE]

$\blacksquare$

Proof of Theorem 1. By Proposition 1, it is enough to show the statement under (A.1). First, we investigate $\widehat{\boldsymbol{\beta}}_{i\Delta_{n}}-\boldsymbol{\beta}_{0}$ . Since

[TABLE]

we have

[TABLE]

Then, we have

[TABLE]

Thus, we have

[TABLE]

where $W_{i\Delta_{n}}$ is defined in Assumption 1(e).

Now, we investigate $\|\widehat{\boldsymbol{\beta}}_{i\Delta_{n}}-\boldsymbol{\beta}_{0}\|_{1}$ and $\|\widehat{\boldsymbol{\beta}}_{i\Delta_{n}}-\boldsymbol{\beta}_{0}\|_{2}$ . By (2.5), we have

[TABLE]

Thus, by (A.27)–(A.28), we have

[TABLE]

where the second inequality is due to the Cauchy–Schwarz inequality. Suppose that

[TABLE]

Then, we have

[TABLE]

From the optimality of $\widehat{\boldsymbol{\beta}}_{i\Delta_{n}}$ and the integral form of the Taylor expansion, we have

[TABLE]

For the first and second terms, we have

[TABLE]

where the second inequality is due to (A.32). For the last term, let

[TABLE]

Then, for any $0\leq t\leq z$ , we have

[TABLE]

where the last inequality is due to (A.32). Thus, by Assumption 1(e), we have

[TABLE]

Combining (A.33)–(A.39), we have

[TABLE]

which implies

[TABLE]

This contradicts to (A.31), thus, we obtain the $\ell_{2}$ norm error bound. Then, by (A.29), we can show the $\ell_{1}$ norm error bound. $\blacksquare$

A.2 Proof of Theorem 2

Proof of Theorem 2. We first investigate $\widehat{\boldsymbol{\beta}}_{i\Delta_{n}}$ and $\widehat{\boldsymbol{\Omega}}_{i\Delta_{n}}$ . By (2.5), (3.7), and Assumption 1(c), we can show, with probability at least $1-p^{-2-a}$ ,

[TABLE]

For $\widehat{\boldsymbol{\Omega}}_{i\Delta_{n}}$ , similar to the proofs of Theorem 1 (Kim and Shin,, 2022), we can show, with probability at least $1-p^{-2-a}$ ,

[TABLE]

Thus, we have, with probability at least $1-p^{-2-a}$ ,

[TABLE]

Consider $\widetilde{\boldsymbol{\beta}}_{i\Delta_{n}}$ . For each $1\leq m\leq p$ , there exists standard Brownian motion $W_{m}^{*}(t)$ such that

[TABLE]

Then, by the proofs of Theorem 1 (Kim and Shin,, 2022), we have

[TABLE]

where

[TABLE]

Note that

[TABLE]

Hence, similar to the proofs of (A.25), we can show

[TABLE]

where

[TABLE]

Let

[TABLE]

where

[TABLE]

Then, we have

[TABLE]

Consider $(I)$ . By the boundedness of the intensity, we can show $\Pr\left\{\int_{0}^{1}d\Lambda^{y}(t)\leq C\log p\right\}\geq 1-p^{-1-a}$ . Thus, we have

[TABLE]

For $(II)$ , by (A.45)–(A.46), we have, with probability at least $1-2p^{-1-a}$ ,

[TABLE]

Consider $(III)$ . Similar to the proofs of (A.20) in Kim and Shin, (2022), we can show, with probability at least $1-p^{-1-a}$ ,

[TABLE]

Consider $(IV)$ . By Assumption 1(b) and (f), we can show, with probability at least $1-p^{-1-a}$ ,

[TABLE]

Thus, by Assumption 1(f), we can show, with probability at least $1-p^{-1-a}$ ,

[TABLE]

Then, by (A.24), (A.42), and (A.45), we have, with probability at least $1-p^{-1-a}$ ,

[TABLE]

For $(V)$ , let $g(i)=\boldsymbol{\beta}_{0}((i+k_{n})\Delta_{n})-\widehat{\boldsymbol{\beta}}_{i\Delta_{n}}+\mathcal{B}_{i+k_{n}}$ . We have

[TABLE]

For the first term, by the boundedness of the intensity process and (A.45), we can show, with probability at least $1-p^{-2-a}$ ,

[TABLE]

Thus, from (A.42), we have, with probability at least $1-2p^{-2-a}$ ,

[TABLE]

Then, by (2.1) in Freedman, (1975), we have, for $1\leq j\leq p$ ,

[TABLE]

which implies

[TABLE]

For the second term, we have, with probability at least $1-2p^{-2-a}$ ,

[TABLE]

Thus, we have

[TABLE]

Consider $(VI)$ . By the sub-Gaussianity of the beta process, we can show, with probability at least $1-p^{-1-a}$ ,

[TABLE]

For $(VII)$ , by Assumption 1(b), we have

[TABLE]

Combining (A.53)–(A.69), we have, with probability greater than $1-p^{-a}$ ,

[TABLE]

$\blacksquare$

A.3 Proof of Theorem 3

Proof of Theorem 3. By (3.8), there exists $h_{n}$ such that, with probability greater than $1-p^{-a}$ ,

[TABLE]

Thus, it is enough to show the statement under the event $\{\|\widehat{I\beta}-I\beta_{0}\|_{\max}\leq h_{n}/2\}$ . Similar to the proofs of Theorem 1 (Kim and Shin,, 2022), we can show

[TABLE]

$\blacksquare$

Bibliography50

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Aït-Sahalia et al., (2020) Aït-Sahalia, Y., Kalnina, I., and Xiu, D. (2020). High-frequency factor models and regressions. Journal of Econometrics , 216(1):86–105.
2Aït-Sahalia and Xiu, (2019) Aït-Sahalia, Y. and Xiu, D. (2019). Principal component analysis of high-frequency data. Journal of the American Statistical Association , 114(525):287–303.
3Andersen et al., (2006) Andersen, T. G., Bollerslev, T., Diebold, F. X., and Wu, G. (2006). Realized beta: Persistence and predictability . Emerald Group Publishing Limited.
4Asness et al., (2013) Asness, C. S., Moskowitz, T. J., and Pedersen, L. H. (2013). Value and momentum everywhere. The Journal of Finance , 68(3):929–985.
5Bali et al., (2011) Bali, T. G., Cakici, N., and Whitelaw, R. F. (2011). Maxing out: Stocks as lotteries and the cross-section of expected returns. Journal of financial economics , 99(2):427–446.
6Barndorff-Nielsen and Shephard, (2004) Barndorff-Nielsen, O. E. and Shephard, N. (2004). Econometric analysis of realized covariation: High frequency based covariance, regression, and correlation in financial economics. Econometrica , 72(3):885–925.
7Cai et al., (2011) Cai, T., Liu, W., and Luo, X. (2011). A constrained ℓ 1 subscript ℓ 1 \ell_{1} minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association , 106(494):594–607.
8Campbell et al., (2008) Campbell, J. Y., Hilscher, J., and Szilagyi, J. (2008). In search of distress risk. The Journal of Finance , 63(6):2899–2939.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Robust High-Dimensional Time-Varying Coefficient Estimation

Abstract

1 Introduction

2 The model set-up

3 Robust high-dimensional high-frequency regression

3.1 Integrated beta estimation procedure

3.2 Theoretical results

Assumption 1**.**

Remark 1**.**

Theorem 1**.**

Remark 2**.**

Theorem 2**.**

Remark 3**.**

Theorem 3**.**

3.3 Discussion on the tuning parameter selection

4 A simulation study

5 An empirical study

6 Conclusion

Funding

Appendix A Appendix

A.1 Proof of Theorem 1

Proposition 1**.**

A.2 Proof of Theorem 2

A.3 Proof of Theorem 3

Assumption 1.

Remark 1.

Theorem 1.

Remark 2.

Theorem 2.

Remark 3.

Theorem 3.

Proposition 1.