Inference on Functionals under First Order Degeneracy

Qihui Chen; Zheng Fang

arXiv:1901.04861·econ.EM·January 16, 2019

Inference on Functionals under First Order Degeneracy

Qihui Chen, Zheng Fang

PDF

TL;DR

This paper develops a second order asymptotic framework for inference on functionals with null first order derivatives, identifying bootstrap limitations and proposing corrections for degenerate and nondifferentiable cases.

Contribution

It introduces a unified second order inference framework, analyzes bootstrap failures under degeneracy, and proposes correction methods for reliable inference in such settings.

Findings

01

Standard bootstrap is inconsistent when second order derivative is nonzero.

02

The correction procedure from Babu (1984) can be extended to this setting.

03

Modified bootstrap methods achieve local size control under certain conditions.

Abstract

This paper presents a unified second order asymptotic framework for conducting inference on parameters of the form $ϕ (θ_{0})$ , where $θ_{0}$ is unknown but can be estimated by $\hat{θ}_{n}$ , and $ϕ$ is a known map that admits null first order derivative at $θ_{0}$ . For a large number of examples in the literature, the second order Delta method reveals a nondegenerate weak limit for the plug-in estimator $ϕ (\hat{θ}_{n})$ . We show, however, that the `standard' bootstrap is consistent if and only if the second order derivative $ϕ_{θ_{0}}^{''} = 0$ under regularity conditions, i.e., the standard bootstrap is inconsistent if $ϕ_{θ_{0}}^{''} \neq = 0$ , and provides degenerate limits unhelpful for inference otherwise. We thus identify a source of bootstrap failures distinct from that in Fang and Santos (2018) because the problem (of consistently bootstrapping a…

Tables8

Table 1. Table 1: Comparison with Fang_Santos2014HDD

		First Order Degeneracy (i.e. $ϕ_{θ_{0}}^{'} = 0$ )
		Yes	No
Nondifferentiability (1st or 2nd order)	Yes	This paper $r_{n}^{2} {ϕ ({\hat{θ}}_{n}^{}) - ϕ ({\hat{θ}}_{n})} \overset{L^{}}{↛} ϕ_{θ_{0}}^{''} (𝔾)$	Fang_Santos2014HDD $r_{n} {ϕ ({\hat{θ}}_{n}^{}) - ϕ ({\hat{θ}}_{n})} \overset{L^{}}{↛} ϕ_{θ_{0}}^{'} (𝔾)$
	No	This paper $r_{n}^{2} {ϕ ({\hat{θ}}_{n}^{}) - ϕ ({\hat{θ}}_{n})} \overset{L^{}}{↛} ϕ_{θ_{0}}^{''} (𝔾)$	Standard $r_{n} {ϕ ({\hat{θ}}_{n}^{}) - ϕ ({\hat{θ}}_{n})} \overset{L^{}}{\to} ϕ_{θ_{0}}^{'} (𝔾)$

Table 2. Table 2: Simulation Designs

Design	# of Assets	# of Factors	GARCH Parameters	Factor Loadings
D1	$k = 2$	$p = 1$	$(ω_{1}, α_{1}, β_{1}) = (0.2, 0.2, 0.6)$	$Λ = {(1, 1)}^{⊺}$
D2	$k = 2$	$p = 2$	$(ω_{1}, α_{1}, β_{1}) = (0.2, 0.2, 0.6)$	$Λ = I_{2}$
			$(ω_{2}, α_{2}, β_{2}) = (0.2, 0.4, 0.4)$
D3	$k = 3$	$p = 1$	$(ω_{1}, α_{1}, β_{1}) = (0.2, 0.2, 0.6)$	$Λ = {(1, 1, 1)}^{⊺}$
D4	$k = 3$	$p = 2$	$(ω_{1}, α_{1}, β_{1}) = (0.2, 0.2, 0.6)$	$Λ = {[\begin{matrix} 1 & 1 & 1 \\ - 1 & 0 & 1 \end{matrix}]}^{⊺}$
			$(ω_{2}, α_{2}, β_{2}) = (0.2, 0.4, 0.4)$
D5	$k = 3$	$p = 3$	$(ω_{1}, α_{1}, β_{1}) = (0.2, 0.2, 0.6)$	$Λ = I_{3}$
			$(ω_{2}, α_{2}, β_{2}) = (0.2, 0.4, 0.4)$
			$(ω_{3}, α_{3}, β_{3}) = (0.1, 0.1, 0.8)$

Table 3. Table 3: Rejection rates under the null: Design D1

$T \ Tests$	CF1			CF2			DG				DR
$T \ Tests$	$T^{- 1 / 4}$	$T^{- 1 / 3}$	$T^{- 2 / 5}$	$T^{- 1 / 4}$	$T^{- 1 / 3}$	$T^{- 2 / 5}$	DG1	DG2	M-DG1	M-DG2	DR	M-DR
$1000$	0.0850	0.0640	$0.0420$	0.0395	$0.0185$	0.0100	$0.3975$	$0.4015$	0.0140	0.0160	0.1740	0.0075
$2000$	0.0940	0.0715	$0.0530$	0.0550	$0.0320$	0.0120	$0.5060$	$0.5045$	0.0290	0.0315	0.2855	0.0125
$5000$	0.1010	0.0740	$0.0515$	0.0505	$0.0290$	0.0075	$0.6215$	$0.6185$	0.0485	0.0510	0.3805	0.0185
$10000$	0.1010	0.0820	$0.0585$	0.0550	$0.0285$	0.0090	$0.6375$	$0.6270$	0.0480	0.0545	0.4005	0.0240
$20000$	0.1005	0.0725	$0.0525$	0.0495	$0.0285$	0.0115	$0.6750$	$0.6705$	0.0425	0.0550	0.4405	0.0225
$40000$	0.1180	0.0900	$0.0670$	0.0700	$0.0410$	0.0165	$0.6865$	$0.6845$	0.0635	0.0625	0.4710	0.0400
$50000$	0.1070	0.0830	$0.0660$	0.0665	$0.0410$	0.0145	$0.6895$	$0.6870$	0.0425	0.0515	0.4430	0.0335

Table 4. Table 4: Rejection rates under the null: Design D3

$T \ Tests$	CF1			CF2			DG				DR
$T \ Tests$	$T^{- 1 / 4}$	$T^{- 1 / 3}$	$T^{- 2 / 5}$	$T^{- 1 / 4}$	$T^{- 1 / 3}$	$T^{- 2 / 5}$	DG1	DG2	M-DG1	M-DG2	DR	M-DR
$1000$	0.0605	0.0390	$0.0285$	0.0660	$0.0605$	0.0430	$0.2300$	$0.2400$	0.0025	0.0030	0.0305	0.0000
$2000$	0.0645	0.0385	$0.0280$	0.0655	$0.0570$	0.0380	$0.3425$	$0.3470$	0.0040	0.0040	0.0565	0.0005
$5000$	0.0520	0.0385	$0.0315$	0.0505	$0.0455$	0.0275	$0.3970$	$0.3965$	0.0025	0.0015	0.0715	0.0000
$10000$	0.0690	0.0565	$0.0450$	0.0830	$0.0665$	0.0320	$0.4385$	$0.4415$	0.0030	0.0040	0.0960	0.0000
$20000$	0.0660	0.0600	$0.0490$	0.0850	$0.0660$	0.0335	$0.4765$	$0.4790$	0.0070	0.0065	0.1145	0.0005
$40000$	0.0520	0.0460	$0.0390$	0.0645	$0.0475$	0.0225	$0.5030$	$0.5065$	0.0025	0.0040	0.1175	0.0000
$50000$	0.0745	0.0670	$0.0585$	0.0920	$0.0635$	0.0395	$0.5255$	$0.5290$	0.0065	0.0040	0.1540	0.0005

Table 5. Table 5: Rejection rates under the null: Design D4

$T \ Tests$	CF1			CF2			DG				DR
$T \ Tests$	$T^{- 1 / 4}$	$T^{- 1 / 3}$	$T^{- 2 / 5}$	$T^{- 1 / 4}$	$T^{- 1 / 3}$	$T^{- 2 / 5}$	DG1	DG2	M-DG1	M-DG2	DR	M-DR
$1000$	0.0715	0.0445	$0.0265$	0.1305	$0.0915$	0.0415	$0.4795$	$0.4870$	0.0240	0.0240	0.1795	0.0010
$2000$	0.0895	0.0515	$0.0380$	0.1485	$0.0935$	0.0330	$0.6380$	$0.6515$	0.0345	0.0335	0.3210	0.0055
$5000$	0.1055	0.0720	$0.0545$	0.1590	$0.0960$	0.0300	$0.7810$	$0.7820$	0.0400	0.0400	0.4625	0.0075
$10000$	0.1135	0.0615	$0.0485$	0.1440	$0.0750$	0.0290	$0.8055$	$0.8030$	0.0445	0.0370	0.4840	0.0080
$20000$	0.1155	0.0715	$0.0555$	0.1530	$0.0960$	0.0290	$0.8495$	$0.8485$	0.0565	0.0460	0.5555	0.0170
$40000$	0.1280	0.0810	$0.0640$	0.1655	$0.0900$	0.0300	$0.8650$	$0.8670$	0.0635	0.0700	0.5650	0.0145
$50000$	0.1150	0.0775	$0.0660$	0.1650	$0.0855$	0.0260	$0.8610$	$0.8590$	0.0535	0.0685	0.5980	0.0125

Table 6. Table 6: Rejection rates under the alternative: Design D2

$T \ Tests$	CF1			CF2			DG		DR
$T \ Tests$	$T^{- 1 / 4}$	$T^{- 1 / 3}$	$T^{- 2 / 5}$	$T^{- 1 / 4}$	$T^{- 1 / 3}$	$T^{- 2 / 5}$	M-DG1	M-DG2	M-DR
$1000$	0.6450	0.5915	$0.5050$	0.7255	$0.6890$	0.5570	0.2420	0.2170	0.3740
$2000$	0.9410	0.9185	$0.8805$	0.9530	$0.9365$	0.8785	0.4935	0.3945	0.8325
$5000$	0.9975	0.9975	$0.9960$	0.9995	$0.9990$	0.9950	0.9070	0.9180	0.9940
$10000$	0.9980	0.9980	$0.9975$	0.9985	$0.9985$	0.9985	0.9995	0.9995	0.9985
$20000$	0.9985	0.9990	$0.9985$	0.9995	$0.9995$	0.9985	1.0000	1.0000	0.9985
$40000$	0.9995	0.9995	$0.9995$	1.0000	$1.0000$	1.0000	1.0000	1.0000	0.9950
$50000$	0.9995	0.9995	$0.9995$	0.9995	$0.9995$	0.9995	1.0000	1.0000	0.9995

Table 7. Table 7: Rejection rates under the alternative: Design D5

$T \ Tests$	CF1			CF2			DG		DR
$T \ Tests$	$T^{- 1 / 4}$	$T^{- 1 / 3}$	$T^{- 2 / 5}$	$T^{- 1 / 4}$	$T^{- 1 / 3}$	$T^{- 2 / 5}$	M-DG1	M-DG2	M-DR
$1000$	0.1240	0.0740	$0.0630$	0.3990	$0.3645$	0.3000	0.0385	0.0395	0.0140
$2000$	0.3520	0.2710	$0.2300$	0.6975	$0.6675$	0.5570	0.1065	0.0870	0.1295
$5000$	0.8250	0.7710	$0.7255$	0.9610	$0.9460$	0.8885	0.3470	0.3365	0.6675
$10000$	0.9865	0.9850	$0.9755$	0.9995	$0.9985$	0.9955	0.5945	0.6765	0.9420
$20000$	0.9980	0.9970	$0.9955$	1.0000	$1.0000$	1.0000	0.6385	0.6005	0.9665
$40000$	1.0000	1.0000	$0.9985$	1.0000	$1.0000$	1.0000	0.7225	0.7135	0.9710
$50000$	0.9995	0.9995	$0.9990$	1.0000	$1.0000$	1.0000	0.7755	0.7445	0.9765

$a ≲ b$	$a \leq M b$ for some constant $M$ that is universal in the proof.
$A^{ϵ}$	For $A$ in a metric space $(T, d)$ , $A^{ϵ} \equiv {t \in T : {inf}_{a \in A} d (t, a) \leq ϵ}$ .
$𝐌^{m \times k}$	The space of $m \times k$ real matrices.
$ℓ^{\infty} (T)$	For a set $T$ , $ℓ^{\infty} (T) \equiv {f : T \to 𝐑 : {sup}_{t \in T} \| f (t) \| < \infty}$ .
$C (T)$	For a set $T$ , $C (T) \equiv {f : T \to 𝐑 : {sup}_{t \in T} \| f (t) \| < \infty and f is continuous}$ .
$C^{1} (T)$	For a set $T \subset 𝐑^{k}$ , $C^{1} (T)$ is the set of continuously differentiable functions on $T$ .
$d_{H} (\cdot, \cdot)$	For sets $A, B$ , $d_{H} (A, B)$ is the Hausdorff distance between $A$ and $B$ .

Equations579

r_{n} {ϕ (\hat{θ}_{n}) - ϕ (θ_{0})} L ϕ_{θ_{0}}^{'} (G),

r_{n} {ϕ (\hat{θ}_{n}) - ϕ (θ_{0})} L ϕ_{θ_{0}}^{'} (G),

r_{n}^{2} {ϕ (\hat{θ}_{n}) - ϕ (θ_{0}) - ϕ_{θ_{0}}^{'} (\hat{θ}_{n} - θ_{0})} L ϕ_{θ_{0}}^{''} (G),

r_{n}^{2} {ϕ (\hat{θ}_{n}) - ϕ (θ_{0}) - ϕ_{θ_{0}}^{'} (\hat{θ}_{n} - θ_{0})} L ϕ_{θ_{0}}^{''} (G),

r_{n}^{2} {ϕ (\hat{θ}_{n}) - ϕ (θ_{0})} L ϕ_{θ_{0}}^{''} (G) .

r_{n}^{2} {ϕ (\hat{θ}_{n}) - ϕ (θ_{0})} L ϕ_{θ_{0}}^{''} (G) .

r_{n}^{2} {ϕ (\hat{θ}_{n}^{*}) - ϕ (\hat{θ}_{n})}

r_{n}^{2} {ϕ (\hat{θ}_{n}^{*}) - ϕ (\hat{θ}_{n})}

r_{n}^{2} {ϕ (\hat{θ}_{n}^{*}) - ϕ (\hat{θ}_{n}) - ϕ_{\hat{θ}_{n}}^{'} (\hat{θ}_{n}^{*} - \hat{θ}_{n})} .

r_{n}^{2} {ϕ (\hat{θ}_{n}^{*}) - ϕ (\hat{θ}_{n}) - ϕ_{\hat{θ}_{n}}^{'} (\hat{θ}_{n}^{*} - \hat{θ}_{n})} .

ϕ (θ_{0}) = (E [X])^{2} .

ϕ (θ_{0}) = (E [X])^{2} .

ϕ (θ_{0}) = (max {θ_{0}, 0})^{2},

ϕ (θ_{0}) = (max {θ_{0}, 0})^{2},

ϕ (θ_{0}) = \int (F - F_{0})^{2} d F_{0} .

ϕ (θ_{0}) = \int (F - F_{0})^{2} d F_{0} .

ϕ (θ_{0}) = \int_{R} max {F^{(1)} (u) - F^{(2)} (u), 0}^{2} w (u) d u,

ϕ (θ_{0}) = \int_{R} max {F^{(1)} (u) - F^{(2)} (u), 0}^{2} w (u) d u,

ϕ (θ_{0}) = f \in F sup {[max (E [Z^{(1)} f (W)], 0)]^{2} + (E [Z^{(2)} f (W)])^{2}} .

ϕ (θ_{0}) = f \in F sup {[max (E [Z^{(1)} f (W)], 0)]^{2} + (E [Z^{(2)} f (W)])^{2}} .

ϕ (θ_{0}) = γ \in Γ in f E [g (X, γ)]^{⊺} W E [g (X, γ)] .

ϕ (θ_{0}) = γ \in Γ in f E [g (X, γ)]^{⊺} W E [g (X, γ)] .

n \to \infty lim ∥ \frac{ϕ ( θ + t _{n} h _{n} ) - ϕ ( θ )}{t _{n}} - ϕ_{θ}^{'} (h) ∥_{E} = 0,

n \to \infty lim ∥ \frac{ϕ ( θ + t _{n} h _{n} ) - ϕ ( θ )}{t _{n}} - ϕ_{θ}^{'} (h) ∥_{E} = 0,

n \to \infty lim ∥ \frac{ϕ ( θ + t _{n} h _{n} ) - ϕ ( θ )}{t _{n}} - ϕ_{θ}^{'} (h) ∥_{E} = 0,

n \to \infty lim ∥ \frac{ϕ ( θ + t _{n} h _{n} ) - ϕ ( θ )}{t _{n}} - ϕ_{θ}^{'} (h) ∥_{E} = 0,

n \to \infty lim ∥ \frac{ϕ ( θ + t _{n} h _{n} ) - ϕ ( θ ) - t _{n} ϕ _{θ}^{'} ( h _{n} )}{t _{n}^{2}} - ϕ_{θ}^{''} (h) ∥_{E} = 0,

n \to \infty lim ∥ \frac{ϕ ( θ + t _{n} h _{n} ) - ϕ ( θ ) - t _{n} ϕ _{θ}^{'} ( h _{n} )}{t _{n}^{2}} - ϕ_{θ}^{''} (h) ∥_{E} = 0,

n \to \infty lim ∥ \frac{ϕ ( θ + t _{n} h _{n} ) - ϕ ( θ ) - t _{n} ϕ _{θ}^{'} ( h _{n} )}{t _{n}^{2}} - ϕ_{θ}^{''} (h) ∥_{E} = 0,

n \to \infty lim ∥ \frac{ϕ ( θ + t _{n} h _{n} ) - ϕ ( θ ) - t _{n} ϕ _{θ}^{'} ( h _{n} )}{t _{n}^{2}} - ϕ_{θ}^{''} (h) ∥_{E} = 0,

ϕ (θ + t_{n} h_{n}) = ϕ (θ) + j = 1 \sum p - 1 t_{n}^{j} ϕ_{θ}^{(j)} (h_{n}) + t_{n}^{p} ϕ_{θ}^{(p)} (h) + o (t_{n}^{p}),

ϕ (θ + t_{n} h_{n}) = ϕ (θ) + j = 1 \sum p - 1 t_{n}^{j} ϕ_{θ}^{(j)} (h_{n}) + t_{n}^{p} ϕ_{θ}^{(p)} (h) + o (t_{n}^{p}),

ϕ_{θ}^{'} (h) = 2 θ h, ϕ_{θ}^{''} (h) = h^{2} .

ϕ_{θ}^{'} (h) = 2 θ h, ϕ_{θ}^{''} (h) = h^{2} .

ϕ_{θ}^{''} (h) = γ_{0} \in Γ_{0} (θ) min h (γ_{0})^{⊺} W^{1/2} M (γ_{0}) W^{1/2} h (γ_{0}),

ϕ_{θ}^{''} (h) = γ_{0} \in Γ_{0} (θ) min h (γ_{0})^{⊺} W^{1/2} M (γ_{0}) W^{1/2} h (γ_{0}),

ϕ_{θ}^{''} (h) = h (γ_{0})^{⊺} W^{1/2} M (γ_{0}) W^{1/2} h (γ_{0}),

ϕ_{θ}^{''} (h) = h (γ_{0})^{⊺} W^{1/2} M (γ_{0}) W^{1/2} h (γ_{0}),

r_{n} {ϕ (\hat{θ}_{n}) - ϕ (θ_{0})} L ϕ_{θ_{0}}^{'} (G) \equiv 0 .

r_{n} {ϕ (\hat{θ}_{n}) - ϕ (θ_{0})} L ϕ_{θ_{0}}^{'} (G) \equiv 0 .

[ϕ (\hat{θ}_{n}) - \frac{c _{1 - α /2}}{r _{n}}, ϕ (\hat{θ}_{n}) - \frac{c _{α /2}}{r _{n}}] = {ϕ (\hat{θ}_{n})},

[ϕ (\hat{θ}_{n}) - \frac{c _{1 - α /2}}{r _{n}}, ϕ (\hat{θ}_{n}) - \frac{c _{α /2}}{r _{n}}] = {ϕ (\hat{θ}_{n})},

r_{n}^{2} {ϕ (\hat{θ}_{n}) - ϕ (θ_{0}) - ϕ_{θ_{0}}^{'} (\hat{θ}_{n} - θ_{0})} = ϕ_{θ_{0}}^{''} (r_{n} {\hat{θ}_{n} - θ_{0}}) + o_{p} (1) .

r_{n}^{2} {ϕ (\hat{θ}_{n}) - ϕ (θ_{0}) - ϕ_{θ_{0}}^{'} (\hat{θ}_{n} - θ_{0})} = ϕ_{θ_{0}}^{''} (r_{n} {\hat{θ}_{n} - θ_{0}}) + o_{p} (1) .

r_{n}^{2} {ϕ (\hat{θ}_{n}) - ϕ (θ_{0}) - ϕ_{θ_{0}}^{'} (\hat{θ}_{n} - θ_{0})} L ϕ_{θ_{0}}^{''} (G) .

r_{n}^{2} {ϕ (\hat{θ}_{n}) - ϕ (θ_{0}) - ϕ_{θ_{0}}^{'} (\hat{θ}_{n} - θ_{0})} L ϕ_{θ_{0}}^{''} (G) .

ϕ (θ_{0} + t_{n} h_{n}) = ϕ (θ_{0}) + t_{n} ϕ_{θ_{0}}^{'} (h_{n}) + t_{n}^{2} ϕ_{θ_{0}}^{''} (h) + o (t_{n}^{2}),

ϕ (θ_{0} + t_{n} h_{n}) = ϕ (θ_{0}) + t_{n} ϕ_{θ_{0}}^{'} (h_{n}) + t_{n}^{2} ϕ_{θ_{0}}^{''} (h) + o (t_{n}^{2}),

r_{n}^{2} {ϕ (\hat{θ}_{n}) - ϕ (θ_{0})} L ϕ_{θ_{0}}^{''} (G) .

r_{n}^{2} {ϕ (\hat{θ}_{n}) - ϕ (θ_{0})} L ϕ_{θ_{0}}^{''} (G) .

r_{n}^{p}\big{[}\phi(\hat{\theta}_{n})-\{\phi(\theta_{0})+\sum_{j=1}^{p-1}\phi_{\theta_{0}}^{(j)}(\hat{\theta}_{n}-\theta_{0})\}\big{]}=\phi_{\theta_{0}}^{(p)}(r_{n}\{\hat{\theta}_{n}-\theta_{0}\})+o_{p}(1)~{},

r_{n}^{p}\big{[}\phi(\hat{\theta}_{n})-\{\phi(\theta_{0})+\sum_{j=1}^{p-1}\phi_{\theta_{0}}^{(j)}(\hat{\theta}_{n}-\theta_{0})\}\big{]}=\phi_{\theta_{0}}^{(p)}(r_{n}\{\hat{\theta}_{n}-\theta_{0}\})+o_{p}(1)~{},

r_{n}^{p}\big{[}\phi(\hat{\theta}_{n})-\{\phi(\theta_{0})+\sum_{j=1}^{p-1}\phi_{\theta_{0}}^{(j)}(\hat{\theta}_{n}-\theta_{0})\}\big{]}\xrightarrow{L}\phi_{\theta_{0}}^{(p)}(\mathbb{G})~{}.

r_{n}^{p}\big{[}\phi(\hat{\theta}_{n})-\{\phi(\theta_{0})+\sum_{j=1}^{p-1}\phi_{\theta_{0}}^{(j)}(\hat{\theta}_{n}-\theta_{0})\}\big{]}\xrightarrow{L}\phi_{\theta_{0}}^{(p)}(\mathbb{G})~{}.

d_{BL} (L_{1}, L_{2}) \equiv f \in BL_{1} (D) sup ∣ \int f d L_{1} - \int f d L_{2} ∣,

d_{BL} (L_{1}, L_{2}) \equiv f \in BL_{1} (D) sup ∣ \int f d L_{1} - \int f d L_{2} ∣,

BL_{1} (D) \equiv {f : D \to R : t \in D sup ∣ f (t) ∣ + t_{1}, t_{2} \in D, t_{1} \neq = t_{2} sup \frac{∣ f ( t _{1} ) - f ( t _{2} ) ∣}{∥ t _{1} - t _{2} ∥ _{D}} \leq 1} .

BL_{1} (D) \equiv {f : D \to R : t \in D sup ∣ f (t) ∣ + t_{1}, t_{2} \in D, t_{1} \neq = t_{2} sup \frac{∣ f ( t _{1} ) - f ( t _{2} ) ∣}{∥ t _{1} - t _{2} ∥ _{D}} \leq 1} .

d_{BL} (\hat{G}_{n}^{*}, G) = f \in BL_{1} (D) sup ∣ E_{W} [f (r_{n} {\hat{θ}_{n}^{*} - \hat{θ}_{n}})] - E [f (G)] ∣,

d_{BL} (\hat{G}_{n}^{*}, G) = f \in BL_{1} (D) sup ∣ E_{W} [f (r_{n} {\hat{θ}_{n}^{*} - \hat{θ}_{n}})] - E [f (G)] ∣,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\EdefEscapeHex

title.1title.1\EdefEscapeHexTitleTitle\[email protected]\hyper@anchorend

Inference on Functionals under First Order Degeneracy

Qihui Chen

School of Management and Economics

The Chinese University of Hong Kong, Shenzhen

[email protected]

Zheng Fang

Department of Economics

Texas A&M University

[email protected] We would like to thank Brendan Beare, Andres Santos, Yixiao Sun and anonymous referees for valuable suggestions that have helped greatly improve this paper. We are also grateful for Xiaohong Chen, Qi Li and seminar participants for helpful discussions and comments.

Abstract

This paper presents a unified second order asymptotic framework for conducting inference on parameters of the form $\phi(\theta_{0})$ , where $\theta_{0}$ is unknown but can be estimated by $\hat{\theta}_{n}$ , and $\phi$ is a known map that admits null first order derivative at $\theta_{0}$ . For a large number of examples in the literature, the second order Delta method reveals a nondegenerate weak limit for the plug-in estimator $\phi(\hat{\theta}_{n})$ . We show, however, that the “standard” bootstrap is consistent if and only if the second order derivative $\phi_{\theta_{0}}^{\prime\prime}=0$ under regularity conditions, i.e., the standard bootstrap is inconsistent if $\phi_{\theta_{0}}^{\prime\prime}\neq 0$ , and provides degenerate limits unhelpful for inference otherwise. We thus identify a source of bootstrap failures distinct from that in Fang_Santos2014HDD because the problem (of consistently bootstrapping a nondegenerate limit) persists even if $\phi$ is differentiable. We show that the correction procedure in Babu1984bootstrap can be extended to our general setup. Alternatively, a modified bootstrap is proposed when the map is in addition second order nondifferentiable. Both are shown to provide local size control under some conditions. As an illustration, we develop a test of common conditional heteroskedastic (CH) features, a setting with both degeneracy and nondifferentiability – the latter is because the Jacobian matrix is degenerate at zero and we allow the existence of multiple common CH features.

Keywords: First order degeneracy, Second order Delta method, Bootstrap consistency, Babu correction, Common CH features, $J$ -test.

JEL Classification: C12, C15

1 Introduction

There is a large number of inference problems in economics and statistics in which the parameter of interest is of the form $\phi(\theta_{0})$ , where $\theta_{0}$ is an unknown parameter depending on the underlying distribution of the data and $\phi$ is a known map. In these settings, it is common practice to employ the plug-in estimator $\phi(\hat{\theta}_{n})$ , where $\hat{\theta}_{n}$ is an estimator for $\theta_{0}$ , as a building block for conducting inference on $\phi(\theta_{0})$ . The Delta method asserts that if $r_{n}\{\hat{\theta}_{n}-\theta_{0}\}\xrightarrow{L}\mathbb{G}$ for some sequence $r_{n}\uparrow\infty$ , then

[TABLE]

provided $\phi$ is at least Hadamard directionally differentiable at $\theta_{0}$ , where $\phi_{\theta_{0}}^{\prime}$ is the derivative of $\phi$ at $\theta_{0}$ (Shapiro1991; Dumbgen1993). As powerful as the Delta method has proven to be (Vaart1998; Fang_Santos2014HDD), an implicit and yet crucial assumption for the convergence (1) to be useful for inferential purposes is that $\phi_{\theta_{0}}^{\prime}(\mathbb{G})$ or $\phi_{\theta_{0}}^{\prime}$ is nondegenerate, i.e., $\phi_{\theta_{0}}^{\prime}\neq 0$ . Unfortunately, such first order degeneracy arises frequently in asymptotic analysis, with applications including Wald tests or Wald type functionals (Wald1943tests; Engle1984Handbook), unconditional and conditional moment inequality models (AndrewsandSoares2010; Andrews_Shi2013CMI), Cramér-von Mises functionals (Darling1957KSCvM), the study of stochastic dominance (Linton2010), and the $J$ -test for overidentification in GMM settings (Hall_Horowitz1996bootstrap).

In the presence of first order degeneracy, one may resort to a higher order analysis for the sake of a nondegenerate limiting distribution. Shapiro2000inference established that if $\phi$ is second order Hadamard directionally differentiable (see Definition 2.2) – a feature shared by aforementioned examples, then

[TABLE]

where $\phi_{\theta_{0}}^{\prime\prime}$ denotes the second order derivative of $\phi$ at $\theta_{0}$ . Thus, when first order degeneracy occurs, (2) suggests that we may base our asymptotic analysis on

[TABLE]

Usefulness of the limiting distribution in (3), however, relies on our ability to consistently estimate it. In this regard, Efron1979’s bootstrap seems to be a potential option. Specifically, if $\hat{\theta}_{n}^{*}$ is a bootstrap analog of $\hat{\theta}_{n}$ that works for estimating the law of $\mathbb{G}$ , then in view of (3) one may hope that

[TABLE]

can be employed as an estimator for the law of $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ , at least when $\phi$ is smooth. Unfortunately, there are simple examples where the law of (4) conditional on the data, referred to as the standard bootstrap, fails to provide consistent estimates (Babu1984bootstrap).

As the first contribution of this paper, we show that the standard bootstrap (4) is consistent if and only if $\phi_{\theta_{0}}^{\prime\prime}=0$ under mild conditions. Thus, the standard bootstrap is necessarily inconsistent when $\phi_{\theta_{0}}^{\prime\prime}$ is nondegenerate, while when $\phi_{\theta_{0}}^{\prime\prime}$ is degenerate, the resulting asymptotic distribution is degenerate and hence not useful for inference. Therefore, the failure of the standard bootstrap is an inherent implication of first order degeneracy. It is worth noting that the failure of the standard bootstrap persists even when $\phi$ is differentiable. Hence, we identify a source of bootstrap inconsistency distinct from that in Fang_Santos2014HDD, i.e., nondifferentiability of the map $\phi$ , as explained further towards the end of this section.

Heuristically, the reason why the standard bootstrap fails is that even though $r_{n}^{2}\phi_{\theta_{0}}^{\prime}(\hat{\theta}_{n}-\theta_{0})=0$ in the “real world”, its bootstrap counterpart is nondegenerate, i.e., $r_{n}^{2}\phi_{\hat{\theta}_{n}}^{\prime}(\hat{\theta}_{n}^{*}-\hat{\theta}_{n})=O_{p}(1)$ , echoing Efron1979’s point that the bootstrap provides approximate frequency statements rather than approximate likelihood statements. This observation was picked up by Babu1984bootstrap who provided a consistent resampling procedure by including the first order correction term:

[TABLE]

As the second contribution, we generalize the above modified bootstrap (5), referred to as the Babu correction, to settings that accommodate infinite dimensional models and a wide range of bootstrap schemes for $\hat{\theta}_{n}^{*}$ . However, we stress that the Babu correction is inappropriate when $\phi$ is only Hadamard directionally differentiable.

As the third contribution, we follow Fang_Santos2014HDD and provide a modified bootstrap which is consistent regardless of the presence of first order degeneracy and nondifferentiability of $\phi$ . The insight we exploit is that the weak limit $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ in (3) is a composition of the limit $\mathbb{G}$ and the derivative $\phi_{\theta_{0}}^{\prime\prime}$ . Therefore, we may estimate the law of $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ by composing a suitable estimator $\hat{\phi}_{n}^{\prime\prime}$ for $\phi_{\theta_{0}}^{\prime\prime}$ with a bootstrap approximation $r_{n}\{\hat{\theta}_{n}^{*}-\hat{\theta}_{n}\}$ for $\mathbb{G}$ . Since the conditions on $\hat{\phi}_{n}^{\prime\prime}$ proposed by Fang_Santos2014HDD in order for this approach to work are either demanding or hard to check in our setup, we provide a high level condition that is easy to verify. We further demonstrate that numerical differentiation provides a desirable estimator $\hat{\phi}_{n}^{\prime\prime}$ in general; alternatively, we show how to estimate $\phi_{\theta_{0}}^{\prime\prime}$ by exploiting its structure in particular examples. Our inference procedures are also shown to enjoy the local size control property under a key condition that is algebraically simple.

Finally, to further demonstrate the applicability of our framework, we develop a test of common conditional heteroskedastic (CH) features studied by Dovonon_Renault2013testing but under weaker assumptions that allow more than one common CH features. Thus, in addition to the first order identification failure they focused on, we further allow second order (and hence global) identification failures, which renders the functional involved highly (second-order) nondifferentiable as well as first order degenerate. Such a generalization is important because it is unknown a priori how many common features there are and in the context of asset pricing the number can be large (Engle_Ng_Rothschild1990asset). Moreover, the linear normalization in Dovonon_Renault2013testing can falsely exclude the existence of common features even when there does exist a unique common CH feature, a deficiency which we avoid by the unit-length normalization. Monte Carlo simulations indicate our tests substantially alleviate size distortion and have good power performance. We stress that first order degeneracy is of a nature different from that of the degeneracy of Jacobian matrices which is the focus of Dovonon_Renault2013testing; see Section 4 for details. Our approach may also be used to develop tests for other common features (Engle_Kozicki1993CF).

There have been extensive studies on the bootstrap consistency (Hall1992bootstrap; HorowitzBoot). It was realized soon after Efron1979 that the bootstrap is not always successful (BickelandFreedman1981bootstrap); see also Andrews2000Bootstrap for a summary. Babu1984bootstrap provided a simple example of bootstrap failure due to first order degeneracy, and established the validity of the Babu correction for the special case studied there. Shao1994bootstrap and Bertail_Politis_Romano1999subsampling showed that $m$ out of $n$ resampling and subsampling can serve as alternative remedies. There are, however, three reasons we choose not to use these methods. First, they entail the choice of tuning parameters while our proposal can work without such nuisances when $\phi$ is differentiable. Second, when $\phi$ is nondifferentiable, both can lead to invalid tests due to lack of uniform approximations (AndrewsandGuggen2010ET). We provide a simple algebraic condition which, together with regularity of $\hat{\theta}_{n}$ , delivers local uniformity of our inferential procedure. Third, they have been shown to be dominated by other inferential methods, for example, in moment inequality models (AndrewsandSoares2010) which our framework includes as special cases. Datta1995bootstrap revisited Babu’s example and offered a bias correction procedure that depends on a first stage shrinkage type estimator. Somewhat similar methods were later proposed in Andrews2000Bootstrap and Giurcanu2012bootsrtap. These methods are not easily extendable to more general settings.

Bootstrap inconsistency due to nondifferentiability of $\phi$ was studied in Dumbgen1993 and recently in Fang_Santos2014HDD who formally established that (first order) differentiability of $\phi$ is a necessary as well as sufficient condition for the standard bootstrap to work under regularity conditions. Our work complements theirs by identifying a different source of bootstrap failure. Specifically, given bootstrap consistency of $\hat{\theta}_{n}^{*}$ and if $\phi$ is first order degenerate (and hence fully differentiable!), then Fang_Santos2014HDD implies that the standard bootstrap $r_{n}\{\phi(\hat{\theta}_{n}^{*})-\phi(\hat{\theta}_{n})\}$ is consistent for the law of $\phi_{\theta_{0}}^{\prime}(\mathbb{G})$ which is degenerate (and unhelpful for inference). We further show that the law of the second order limit $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ cannot be consistently estimated by the second order standard bootstrap (4) unless $\phi_{\theta_{0}}^{\prime\prime}$ itself is degenerate – this remains true regardless of whether $\phi$ is (second order) differentiable or not! Moreover, extra work is needed in order to show our bootstrap inferential procedures work well in the local uniformity sense. In applications, first order degeneracy and second order nondifferentiability are often mixed together, for example, in Romano_Shaikh2010, AndrewsandSoares2010, Linton2010, and Andrews_Shi2013CMI. The numerical differentiation approach of estimating derivatives was somewhat implicit in Dumbgen1993’s rescaled bootstrap, recently employed by Song2014minimax and studied by Hong_Li2015numericaldelta. We provide a more general condition that may be used to verify “consistency” of derivative estimators (not necessarily constructed via numerical differentiation). Our theory has been utilized in ChenFang2016Rank to develop a rank test where, unlike previous studies, the true rank is potentially strictly less than the hypothesized value, a longstanding problem in the literature.

We now introduce some notation. For a set $T$ , we let $\ell^{\infty}(T)$ denote the space of bounded real-valued functions defined on $T$ and $C(T)$ the space of real-valued continuous functions on a compact set $T$ (endowed with some topology). Both $\ell^{\infty}(T)$ and $C(T)$ are equipped with the uniform norm, i.e., $\|f\|_{\infty}\equiv\sup_{t\in T}|f(t)|$ . For a normed space $\mathbb{D}$ endowed with norm $\|\cdot\|_{\mathbb{D}}$ and $m\in\mathbf{N}$ , we equip the product space $\prod_{j=1}^{m}\mathbb{D}$ with the product norm $\max_{j=1}^{m}\|\theta^{(j)}-\vartheta^{(j)}\|_{\mathbb{D}}$ , denoted $\|\cdot\|_{\mathbb{D}}$ with some abuse of notation, for $\theta,\vartheta\in\prod_{j=1}^{m}\mathbb{D}$ , where $\theta^{(j)}$ and $\vartheta^{(j)}$ are the $j$ th coordinates of $\theta$ and $\vartheta$ respectively. For a subset $A\subset T$ , we write $1\{A\}$ for the indicator function of $A$ .

The remainder of the paper is structured as follows. Section 2 formalizes the general setup, shows the wide applicability of our framework by introducing related examples, and establishes the asymptotic framework by presenting a mild extension of the second order Delta method. Section 3 characterizes the inherent difficulties caused by first order degeneracy, extends the Babu correction to our general setup, and offers a flexible modified bootstrap procedure. Section 4 develops a test for common CH features that allows multiple common CH features, while Section 5 concludes. Appendix A demonstrates that our inferential procedure is robust to local perturbations of the distribution of the data under regularity conditions. The remaining appendices collect all the proofs and additional discussions.

2 Setup and Background

In this section, we formalize the general setup, introduce related examples, and review notions of differentiability based on which we present the second order Delta method.

2.1 General Setup

The treatment in this paper is general in the sense that we allow both the parameter $\theta_{0}$ and the map $\phi$ to take values in infinite dimensional spaces, though attention is confined to real-valued $\phi$ when studying tests. In particular, we assume $\theta_{0}\in\mathbb{D}_{\phi}\subset\mathbb{D}$ and $\phi:\mathbb{D}_{\phi}\to\mathbb{E}$ , where $\mathbb{D}$ and $\mathbb{E}$ are normed spaces with norms $\|\cdot\|_{\mathbb{D}}$ and $\|\cdot\|_{\mathbb{E}}$ respectively. Moreover, the data generating process is general as well in that the model can be parametric, semiparametric and nonparametric and that the data $\{X_{i}\}_{i=1}^{n}$ need not be i.i.d.. However, we do impose i.i.d. assumption in our local analysis, but only for simplicity. The results there can presumably be extended to general asymptotically normal experiments (Vaart_Wellner1990prohorov).

The common probability space on which all (random) maps are defined is the canonical one. For example, in the simplest i.i.d. setup, we think of the data $\{X_{i}\}_{i=1}^{n}$ as the coordinate projections on the first $n$ coordinates in the product probability space $(\prod_{i=1}^{\infty}\mathscr{X},\bigotimes_{i=1}^{\infty}\mathcal{A},\prod_{i=1}^{\infty}P)$ where $(\mathscr{X},\mathcal{A})$ is the sample space each $X_{i}$ lives in and $P$ is the common Borel probability measure that governs each $X_{i}$ . In the presence of bootstrap weights, we further think of the product space as the “first $\infty$ ” coordinates of the even “larger” product space $\big{(}(\prod_{i=1}^{\infty}\mathscr{X})\times\mathscr{W},(\bigotimes_{i=1}^{\infty}\mathcal{A})\otimes\mathcal{W},(\prod_{i=1}^{\infty}P)\times Q\big{)}$ , where $(\mathscr{W},\mathcal{W},Q)$ governs the infinite sequence of bootstrap weights.

Given the generality of our setup, weak convergence throughout the paper is meant in the Hoffmann-Jørgensen sense (Vaart1996). Expectations and probabilities should therefore be interpreted as outer expectations and outer probabilities respectively defined relative to the canonical probability space, though we obviate the distinction in the notation. The notation is made explicit in the appendices whenever differentiating between inner and outer expectations is necessary.

2.2 Related Examples

To fix ideas, we now turn to related examples that serve to illustrate the wide applicability of our framework. The first example is taken from Babu1984bootstrap, which provides an easy illustration of bootstrap inconsistency in the presence of first order degeneracy even if the transformation $\phi$ is smooth.

Example 2.1 (Wald Functional: Squared Mean).

Let $X\in\mathbf{R}$ be a random variable, and suppose that we are interested in conducting inference on

[TABLE]

Here, $\theta_{0}=E[X]$ , $\mathbb{D}=\mathbb{E}=\mathbf{R}$ , and $\phi:\mathbf{R}\to\mathbf{R}$ is defined by $\phi(\theta)=\theta^{2}$ . In fact, $\phi$ is a special case of the more general quadratic functionals of the form $\|W\theta\|^{2}$ for $\theta\in\mathbf{R}^{k}$ and $W$ a $k\times k$ weighting matrix. This seemingly toy example also arises in VAR models for inference on impulse responses (Benkwitz_Neumann_Lutekpohl2000) and in some nonseparable models with structural measurement errors (Hoderlein_Winter2010). ∎

The second example is a special case of the unconditional moment inequality models studied in CHT2007, Romano_Shaikh2008; Romano_Shaikh2010, AndrewsandGuggen2009ET, and AndrewsandSoares2010.

Example 2.2 (Unconditional Moment Inequalities).

Let $X\in\mathbf{R}$ be a scalar random variable and suppose we want to test the moment inequality $E[X]\leq 0$ . The modified method of moments approach is based on estimating the functional

[TABLE]

where $\theta_{0}=E[X]$ , $\mathbb{D}=\mathbb{E}=\mathbf{R}$ , and $\phi:\mathbf{R}\to\mathbf{R}$ is defined by $\phi(\theta)=(\max\{\theta,0\})^{2}$ . The functional $\phi$ can be easily adapted to handle general moment inequality models.∎

The third example concerns the classical Cramér-von Mises functional employed to test goodness of fit (Darling1957KSCvM; Vaart1998).

Example 2.3 (Cramér-von Mises Functional).

Suppose that we are interested in testing if the distribution function of a random vector $X\in\mathbf{R}^{d_{x}}$ is a given function $F_{0}$ . The Cramér-von Mises approach considers the functional

[TABLE]

Here, $\theta_{0}=F$ , $\mathbb{D}=\ell^{\infty}(\mathbf{R}^{d_{x}})$ , $\mathbb{E}=\mathbf{R}$ , and $\phi:\ell^{\infty}(\mathbf{R}^{d_{x}})\to\mathbf{R}$ is defined to be $\phi(\theta)=\int(\theta-F_{0})^{2}\,dF_{0}$ . More generally, it is possible to test if $F$ belongs to a parametric family $\{F_{\gamma}:\gamma\in\Gamma\}$ by studying $\phi(\theta_{0})=\inf_{\gamma\in\Gamma}\int(\theta_{0}-F_{\gamma})^{2}\,dF_{\gamma}$ . ∎

The fourth example, closely related to but significantly different from Example 2.3, is based on Linton2010 for testing stochastic dominance.

Example 2.4 (Stochastic Dominance).

Let $X=(X^{(1)},X^{(2)})^{\intercal}\in\mathbf{R}^{2}$ be continuously distributed, and define the marginal cdfs $F^{(j)}(u)\equiv P(X^{(j)}\leq u)$ for $j\in\{1,2\}$ . For a weighting function $w:\mathbf{R}\rightarrow\mathbf{R}^{+}\equiv\{x\in\mathbf{R}:x\geq 0\}$ , Linton2010 estimate

[TABLE]

to construct a test of whether $X^{(1)}$ first order stochastically dominates $X^{(2)}$ . In this example, we set $\theta_{0}=(F^{(1)},F^{(2)})$ , $\mathbb{D}=\ell^{\infty}(\mathbf{R})\times\ell^{\infty}(\mathbf{R})$ , $\mathbb{E}=\mathbf{R}$ and $\phi(\theta)=\int\max\{\theta^{(1)}(u)-\theta^{(2)}(u),0\}^{2}w(u)du$ for any $\theta\equiv(\theta^{(1)},\theta^{(2)})\in\ell^{\infty}(\mathbf{R})\times\ell^{\infty}(\mathbf{R})$ . We note that the Cramér-von Mises type functionals in Andrews_Shi2013CMI; Andrews_Shi2014CMI shares the common structure of the functional $\phi$ in (8) and hence can be taken care of by our framework as well.∎

The fifth example is a special case of the Kolmogorov-Smirnov type functionals for inference on conditional moment inequalities studied by Andrews_Shi2013CMI.

Example 2.5 (Conditional Moment Inequalities).

Let $Z\in\mathbf{R}^{2}$ and $W\in\mathbf{R}^{d_{w}}$ be random vectors satisfying $E[Z^{(1)}|W]\leq 0$ and $E[Z^{(2)}|W]=0$ . For a suitably chosen class of nonnegative functions $\mathcal{F}$ on $\mathbf{R}^{d_{w}}$ , the above conditional moment inequality is equivalent to $E[Z^{(1)}f(W)]\leq 0$ and $E[Z^{(2)}f(W)]=0$ for all $f\in\mathcal{F}$ . Andrews_Shi2013CMI propose testing the above restriction by estimating the functional

[TABLE]

Here, $\theta_{0}\in\ell^{\infty}(\mathcal{F})\times\ell^{\infty}(\mathcal{F})$ satisfies $\theta_{0}(f)=E[Zf(W)]$ for all $f\in\mathcal{F}$ , $\mathbb{D}=\ell^{\infty}(\mathcal{F})\times\ell^{\infty}(\mathcal{F})$ , $\mathbb{E}=\mathbf{R}$ , and $\phi:\mathbb{D}\to\mathbb{E}$ is given by $\phi(\theta)=\sup_{f\in\mathcal{F}}\{[\max(\theta^{(1)}(f),0)]^{2}+[\theta^{(2)}(f)]^{2}\}$ . ∎

Our final example is concerned with the $J$ -test of overidentification in GMM settings proposed by Sargan1958iv; Sargan1959IV and further developed in Hansen1982.

Example 2.6 (Overidentification Test).

Let $X\in\mathbf{R}^{d_{x}}$ be a random vector and consider the model defined by the moment restriction $E[g(X,\gamma_{0})]=0$ for some $\gamma_{0}\in\Gamma\subset\mathbf{R}^{k}$ where $g:\mathbf{R}^{d_{x}}\times\Gamma\to\mathbf{R}^{m}$ is a known function with $m>k$ . The conventional $J$ -test can be recast by estimating the functional $\phi$ defined as: for some known $m\times m$ symmetric positive definite matrix $W$ ,

[TABLE]

Here, $\theta_{0}\in\prod_{j=1}^{m}\ell^{\infty}(\Gamma)$ is defined by $\theta_{0}(\gamma)=E[g(X,\gamma)]$ , $\mathbb{D}=\prod_{j=1}^{m}\ell^{\infty}(\Gamma)$ , $\mathbb{E}=\mathbf{R}$ , and $\phi:\prod_{j=1}^{m}\ell^{\infty}(\Gamma)\to\mathbf{R}$ is defined by $\phi(\theta)=\inf_{\gamma\in\Gamma}\theta(\gamma)^{\intercal}W\theta(\gamma)$ . The bootstrap for the $J$ statistic has been studied by Hall_Horowitz1996bootstrap and Andrews2002higher. Note that $\theta_{0}$ is always identified even though $\gamma_{0}$ is potentially partially identified, which makes $\phi$ second order nondifferentiable as will be shown below. ∎

2.3 Concepts of Differentiability

All examples in the previous subsection exhibit first order degeneracy, i.e., there exist points $\theta$ in $\mathbb{D}$ such that the first order derivative $\phi_{\theta}^{\prime}$ is [math] and in some cases $\phi$ is not even differentiable at $\theta$ , which can be seen from Examples 2.1 and 2.2 respectively. As such, we resort to a second order expansion that handles first order degeneracy and meanwhile accommodates potential nondifferentiability of $\phi$ . Let us proceed by recalling notions of first order differentiability (Shapiro1990; Fang_Santos2014HDD)

Definition 2.1.

Let $\mathbb{D}$ and $\mathbb{E}$ be normed spaces equipped with norms $\|\cdot\|_{\mathbb{D}}$ and $\|\cdot\|_{\mathbb{E}}$ respectively, and $\phi:\mathbb{D}_{\phi}\subseteq\mathbb{D}\to\mathbb{E}$ .

(i)

The map $\phi$ is said to be Hadamard differentiable at $\theta\in\mathbb{D}_{\phi}$ tangentially to a set $\mathbb{D}_{0}\subseteq\mathbb{D}$ , if there is a continuous linear map $\phi_{\theta}^{\prime}:\mathbb{D}_{0}\to\mathbb{E}$ such that:

[TABLE]

for all sequences $\{h_{n}\}\subset\mathbb{D}$ and $\{t_{n}\}\subset\mathbf{R}$ such that $t_{n}\to 0$ , $h_{n}\to h\in\mathbb{D}_{0}$ as $n\to\infty$ and $\theta+t_{n}h_{n}\in\mathbb{D}_{\phi}$ for all $n$ .

(ii)

The map $\phi$ is said to be Hadamard directionally differentiable at $\theta\in\mathbb{D}_{\phi}$ tangentially to a set $\mathbb{D}_{0}\subseteq\mathbb{D}$ , if there is a continuous map $\phi_{\theta}^{\prime}:\mathbb{D}\to\mathbb{E}$ such that:111We note that the “tangential set” in Shapiro1991 refers to the domain of $\phi$ (i.e., $\mathbb{D}_{\phi}$ in our context), whereas here it refers to the domain $\mathbb{D}_{0}$ of the derivative $\phi_{\theta}^{\prime}$ .

[TABLE]

for all sequences $\{h_{n}\}\subset\mathbb{D}$ and $\{t_{n}\}\subset\mathbf{R}_{+}$ such that $t_{n}\downarrow 0$ , $h_{n}\to h\in\mathbb{D}_{0}$ as $n\to\infty$ and $\theta+t_{n}h_{n}\in\mathbb{D}_{\phi}$ for all $n$ .

Inspecting Definition 2.1, we see that the main difference between Hadamard differentiability and directional differentiability lies in the linearity of the derivative. This turns out to be the exact gap between these two notions of differentiability. In particular, (12) ensures that the directional derivative $\phi_{\theta}^{\prime}$ is necessarily continuous and positively homogeneous of degree one, though potentially nonlinear (Shapiro1990).

Given the introduced notions of differentiability and in view of the remarkable fact that Delta method is valid under even Hadamard directional differentiability in terms of deriving asymptotic distributions (Shapiro1991; Dumbgen1993), it seems a natural next step to invoke the Delta method. However, in the presence of first order degeneracy, the resulting limiting distribution is degenerate at zero, rendering substantial challenges for inferential purposes. In essence, the Delta method is a stochastic version of Taylor expansion. Therefore, one could go one step further to explore the quadratic term when the linear term is degenerate. We thus follow Shapiro2000inference and define

Definition 2.2.

Let $\phi:\mathbb{D}_{\phi}\subseteq\mathbb{D}\to\mathbb{E}$ be a map as in Definition 2.1.

(i)

Suppose that $\phi:\mathbb{D}_{\phi}\to\mathbb{E}$ is Hadamard differentiable tangentially to $\mathbb{D}_{0}\subset\mathbb{D}$ such that the derivative $\phi_{\theta}^{\prime}:\mathbb{D}_{0}\to\mathbb{E}$ is well defined on $\mathbb{D}$ . We say that $\phi$ is second order Hadamard differentiable at $\theta\in\mathbb{D}_{\phi}$ tangentially to $\mathbb{D}_{0}$ if there is a bilinear map $\Phi_{\theta}^{\prime\prime}:\mathbb{D}_{0}\times\mathbb{D}_{0}\to\mathbb{E}$ such that: for $\phi_{\theta}^{\prime\prime}(h)\equiv\Phi_{\theta}^{\prime\prime}(h,h)$ ,

[TABLE]

for all sequences $\{h_{n}\}\subset\mathbb{D}$ and $\{t_{n}\}\subset\mathbf{R}^{+}$ such that $t_{n}\to 0$ , $h_{n}\to h\in\mathbb{D}_{0}$ as $n\to\infty$ and $\theta+t_{n}h_{n}\in\mathbb{D}_{\phi}$ for all $n$ .

(ii)

Suppose that $\phi:\mathbb{D}_{\phi}\to\mathbb{E}$ is Hadamard directionally differentiable tangentially to $\mathbb{D}_{0}\subset\mathbb{D}$ such that the derivative $\phi_{\theta}^{\prime}:\mathbb{D}_{0}\to\mathbb{E}$ is well defined on $\mathbb{D}$ . We say that $\phi$ is second order Hadamard directionally differentiable at $\theta\in\mathbb{D}_{\phi}$ tangentially to $\mathbb{D}_{0}$ if there is a map $\phi_{\theta}^{\prime\prime}:\mathbb{D}_{0}\to\mathbb{E}$ such that:222Compared with Shapiro2000inference, we omitted $\frac{1}{2}$ in the denominator for notational compactness.

[TABLE]

for all sequences $\{h_{n}\}\subset\mathbb{D}$ and $\{t_{n}\}\subset\mathbf{R}^{+}$ such that $t_{n}\downarrow 0$ , $h_{n}\to h\in\mathbb{D}_{0}$ as $n\to\infty$ and $\theta+t_{n}h_{n}\in\mathbb{D}_{\phi}$ for all $n$ .

The second order derivative $\phi_{\theta}^{\prime\prime}$ in both cases is necessarily continuous on $\mathbb{D}_{0}$ , which can be shown in a straightforward manner as in the proof of Proposition 3.1 in Shapiro1990. Similar in spirit to Definition 2.1, the key difference between the above two notions of second order differentiability is that the former is a quadratic form corresponding to a bilinear map while the latter is in general only positively homogeneous of degree two, i.e., $\phi_{\theta}^{\prime\prime}(th)=t^{2}\phi_{\theta}^{\prime\prime}(h)$ for all $t\geq 0$ and all $h\in\mathbb{D}_{0}$ . Note that it is possible that $\phi$ is first order Hadamard differentiable but only second order Hadamard directionally differentiable (see Example 2.2). In all our examples, $\phi$ is first order Hadamard differentiable though $\phi_{\theta}^{\prime}$ may be degenerate; see Subsection 2.3.1. We stress that requiring $\phi_{\theta}^{\prime}$ to be well defined on the entirety of $\mathbb{D}$ does not demand differentiability on $\mathbb{D}$ . Instead, it just means that $\phi_{\theta}^{\prime}$ can take elements potentially not in $\mathbb{D}_{0}$ as arguments. Finally, we note that first and second order (directional) derivatives share the same domain $\mathbb{D}_{0}$ .

If $\phi_{\theta}^{\prime\prime}$ in turn is degenerate, one can go beyond the second order, a possibility we do not pursue at length in this paper; see Remark 2.1.

Remark 2.1.

Suppose that $\phi:\mathbb{D}_{\phi}\subseteq\mathbb{D}\to\mathbb{E}$ is $(p-1)$ -th order Hadamard directionally differentiable tangentially to $\mathbb{D}_{0}\subset\mathbb{D}$ such that the derivative $\phi_{\theta}^{(j)}:\mathbb{D}_{0}\to\mathbb{E}$ is well defined on $\mathbb{D}$ for all $j=1,\ldots,p-1$ , where $p\geq 2$ . Then we say that $\phi$ is $p$ th order Hadamard directionally differentiable at $\theta\in\mathbb{D}_{\phi}$ tangentially to $\mathbb{D}_{0}$ if there is a map $\phi_{\theta}^{(p)}:\mathbb{D}_{0}\to\mathbb{E}$ such that:

[TABLE]

for all sequences $\{h_{n}\}\subset\mathbb{D}$ and $\{t_{n}\}\subset\mathbf{R}^{+}$ such that $t_{n}\downarrow 0$ , $h_{n}\to h\in\mathbb{D}_{0}$ as $n\to\infty$ and $\theta+t_{n}h_{n}\in\mathbb{D}_{\phi}$ for all $n$ . Note that, similar to the treatment of $\phi_{\theta}^{\prime\prime}$ , the factors $1/j!$ are incorporated in the definition of the derivatives $\phi_{\theta}^{(j)}$ to reflect the nature of them as approximating maps. Demyanov1974Minimax established the above high order expansion for $\mathbb{D}=\mathbf{R}^{k}$ with $k\in\mathbf{N}$ and $\mathbb{E}=\mathbf{E}$ ;333We thank an anonymous referee for bringing this reference to our attention. see also Demyanov2009Minimax. ∎

2.3.1 Examples Revisited

From now on, we shall focus on Examples 2.1 and 2.6 exclusively for conciseness; Examples 2.2, 2.3, 2.4 and 2.5 will be treated in Appendix C.

Example 02.1 (Continued).

In this example, the functional involved is second order Hadamard differentiable. Trivially we have

[TABLE]

Note that the first order derivative $\phi_{\theta}^{\prime}$ is degenerate when $\theta=0$ , whereas $\phi_{\theta}^{\prime\prime}$ is everywhere nondegenerate. The bilinear map $\Phi_{\theta}^{\prime\prime}:\mathbf{R}^{2}\to\mathbf{R}$ here is given by $\Phi_{\theta}^{\prime\prime}(h,g)=hg$ . ∎

In Example 2.6, the domain $\mathbb{D}_{0}$ of the derivative $\phi_{\theta_{0}}^{\prime\prime}$ is a strict subset of $\mathbb{D}$ .

Example 02.6 (Continued).

Consider $\theta\in\prod_{j=1}^{m}\ell^{\infty}(\Gamma)$ such that $\theta(\gamma_{0})=0$ for some $\gamma_{0}\in\Gamma$ . Then $\phi$ is Hadamard differentiable at $\theta$ and $\phi_{\theta}^{\prime}(h)=0$ for all $h\in\prod_{j=1}^{m}\ell^{\infty}(\Gamma)$ . Suppose further that $\Gamma$ is compact and that $\Gamma_{0}(\theta)\equiv\{\gamma_{0}\in\Gamma:\theta(\gamma_{0})=0\}$ is in the interior of $\Gamma$ . For $C^{1}(\Gamma)$ the space of continuously differentiable functions on $\Gamma$ , if $\theta\in\prod_{j=1}^{m}C^{1}(\Gamma)$ , then by Lemma C.3, under additional regularity conditions, $\phi$ is second order Hadamard directionally differentiable at $\theta$ tangentially to $\prod_{j=1}^{m}C(\Gamma)$ with the derivative given by: for any $h\in\prod_{j=1}^{m}C(\Gamma)$ ,

[TABLE]

where $M(\gamma_{0})=I_{m}-W^{1/2}J(\gamma_{0})[J(\gamma_{0})^{\intercal}WJ(\gamma_{0})]^{-1}J(\gamma_{0})^{\intercal}W^{1/2}$ with $J(\gamma_{0})\equiv\frac{d\theta(\gamma)}{d\gamma^{\intercal}}\big{|}_{\gamma=\gamma_{0}}$ the Jacobian matrix and $I_{m}$ the identity matrix of size $m$ . Here, invertibility of $J(\gamma_{0})$ is an implied requirement in Lemma C.3; see Remark C.2. Note that if $\gamma_{0}$ is point identified, then $\phi$ becomes second order Hadamard differentiable with

[TABLE]

which in turn yields $\chi^{2}(m-k)$ as the asymptotic distribution of the $J$ -statistic under optimal weighting. We emphasize that the regularity conditions in Lemma C.3 are sufficient for applying our framework but by no means necessary – as explained in Section 4, those sufficient conditions exclude the setup of Dovonon_Renault2013testing, and so we shall provide an alternative set of sufficient conditions there. ∎

2.4 Second Order Delta Method

The Delta method for potentially directionally differentiable maps as well as differentiable ones has proven powerful in asymptotic analysis (Vaart1998; Shapiro1991; Fang_Santos2014HDD; Hansen2015regression). Unfortunately, it is insufficient to handle substantial challenges for inference arising from first order degeneracy. Heuristically, if $r_{n}\{\hat{\theta}_{n}-\theta_{0}\}\xrightarrow{L}\mathbb{G}$ and $\phi_{\theta_{0}}^{\prime}=0$ , then the Delta method implies that

[TABLE]

For real-valued $\phi$ , the usual confidence interval for $\phi(\theta_{0})$ at asymptotic level $1-\alpha$ is

[TABLE]

where the $c_{\alpha}$ is the $\alpha$ -th quantile of $\phi_{\theta_{0}}^{\prime}(\mathbb{G})\equiv 0$ and is zero for all $\alpha\in(0,1)$ . Clearly, $P(\phi(\theta_{0})\in\{\phi(\hat{\theta}_{n})\})=0$ if, for example, $\phi(\hat{\theta}_{n})$ is a continuous random variable.

To circumvent the above difficulty, we impose the following conditions in order to obtain a suitable second order Delta method.

Assumption 2.1.

(i) $\mathbb{D}$ and $\mathbb{E}$ are normed spaces with norms $\|\cdot\|_{\mathbb{D}}$ and $\|\cdot\|_{\mathbb{E}}$ respectively; (ii) $\phi:\mathbb{D}_{\phi}\subset\mathbb{D}\to\mathbb{E}$ is second order Hadamard directionally differentiable at $\theta_{0}\in\mathbb{D}_{\phi}$ tangentially to $\mathbb{D}_{0}\subset\mathbb{D}$ ; (iii) $\phi_{\theta_{0}}^{\prime}(h)=0$ for all $h\in\mathbb{D}_{0}$ .

Assumption 2.2.

(i) There is $\hat{\theta}_{n}:\{X_{i}\}_{i=1}^{n}\to\mathbb{D}_{\phi}$ such that $r_{n}\{\hat{\theta}_{n}-\theta_{0}\}\xrightarrow{L}\mathbb{G}$ in $\mathbb{D}$ for some $r_{n}\uparrow\infty$ ; (ii) $\mathbb{G}$ is tight and its support is in $\mathbb{D}_{0}$ ;444The support of $\mathbb{G}$ is the set of points in $\mathbb{D}$ all of whose open neighborhoods have positive probability. (iii) $\mathbb{D}_{0}$ is closed under vector addition, i.e., $h_{1}+h_{2}\in\mathbb{D}_{0}$ whenever $h_{1},h_{2}\in\mathbb{D}_{0}$ .

Assumption 2.1 formalizes the requirement that the map $\phi:\mathbb{D}_{\phi}\rightarrow\mathbb{E}$ be second order Hadamard directionally differentiable at $\theta_{0}$ , and the defining feature of this paper, namely, degeneracy of the first order derivative. Assumption 2.2(i) defines another key ingredient: there is an estimator $\hat{\theta}_{n}$ for $\theta_{0}$ that admits a weak limit $\mathbb{G}$ at a potentially non- $\sqrt{n}$ rate $r_{n}$ ; see Remark 3.1. Assumption 2.2(ii) ensures that the support of $\mathbb{G}$ is included in the domain of the derivative $\phi_{\theta_{0}}^{\prime\prime}$ so that $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ is well defined, while tightness of $\mathbb{G}$ is only a minimal requirement. Assumption 2.2(iii) is a mild condition, which shall play a technical role in the proof of our bootstrap results.

Given Assumptions 2.1 and 2.2, we now present a second order Delta method building upon Shapiro2000inference and Romish2004delta but without requiring $\mathbb{D}_{\phi}$ to be convex.

Theorem 2.1.

If Assumptions 2.1(i)(ii) and 2.2(i)(ii) hold, then555The term $\phi_{\theta_{0}}^{\prime\prime}(r_{n}\{\hat{\theta}_{n}-\theta_{0}\})$ is interpreted as some continuous extension of $\phi_{\theta_{0}}^{\prime\prime}$ (which always exists in our setup) evaluated at $r_{n}\{\hat{\theta}_{n}-\theta_{0}\}$ whenever $r_{n}\{\hat{\theta}_{n}-\theta_{0}\}\notin\mathbb{D}_{0}$ ; see the comment preceding the proof of Theorem 2.1. Since (18) is an asymptotic result, the choice of the continuous extension is irrelevant.

[TABLE]

and hence

[TABLE]

The essence of Theorem 2.1 is in complete accord with that underlying the first order Delta method. In particular, the definition of second order Hadamard directional differentiability is engineered so that the second order Delta method is nothing more than a stochastic version of the Taylor expansion of order two, i.e.,

[TABLE]

where $t_{n}$ corresponds to $r_{n}^{-1}$ , and $h_{n}$ to $r_{n}\{\hat{\theta}_{n}-\theta_{0}\}$ . Note that Theorem 2.1 is valid regardless of the nature of the differentiability (i.e., fully differentiable or directionally differentiable) and the presence of first order degeneracy. When $\phi_{\theta_{0}}^{\prime}$ is degenerate, the convergence (19) simplifies to

[TABLE]

Finally, we note that higher order versions of the Delta method can be developed along the lines of Remark 2.1; see Remark 2.2.

Remark 2.2.

Suppose that Assumptions 2.1(i) and 2.2(i)(ii) hold and $\phi$ is $p$ -th order Hadamard directionally differentiable at $\theta_{0}\in\mathbb{D}_{\phi}$ tangentially to $\mathbb{D}_{0}$ . It follows that

[TABLE]

and hence

[TABLE]

3 The Bootstrap

Establishing asymptotic distributions as in Theorem 2.1 is the first step towards conducting statistical inference on $\phi(\theta_{0})$ , the usefulness of which relies on our ability to accurately estimate the limiting law. In this section, we discuss how first order degeneracy of $\phi$ can complicate inference using the standard bootstrap based on first and especially second order asymptotics, and provide alternative consistent resampling schemes.

3.1 Bootstrap Setup

Throughout, we let $\hat{\theta}_{n}^{*}$ denote a “bootstrapped version” of $\hat{\theta}_{n}$ , which is defined as a function mapping the data $\{X_{i}\}_{i=1}^{n}$ and random weights $\{W_{i}\}_{i=1}^{n}$ that are independent of $\{X_{i}\}_{i=1}^{n}$ into the domain $\mathbb{D}_{\phi}$ of $\phi$ . This general definition allows us to include diverse resampling schemes such as nonparametric, Bayesian, block, score, more generally multiplier and exchangeable bootstrap as special cases. Next, making sense of bootstrap consistency necessitates a metric that quantifies distances between probability measures. As is standard in the literature, we employ the bounded Lipschitz metric $d_{\operatorname{BL}}$ formalized by Dudley1966Baire; Dudley1968distance: for two Borel probability measures $L_{1}$ and $L_{2}$ on $\mathbb{D}$ , define

[TABLE]

where we recall that $\operatorname{BL}_{1}(\mathbb{D})$ denotes the set of Lipschitz functionals whose absolute level and Lipschitz constant are bounded by one, i.e.,

[TABLE]

Since weak convergence in the Hoffmann-Jørgensen sense to separable limits can be metrized by $d_{\operatorname{BL}}$ (Dudley1990nonlinear; Vaart_Wellner1990prohorov), we may now measure the distance between the “conditional law” of $\hat{\mathbb{G}}_{n}^{*}\equiv r_{n}\{\hat{\theta}_{n}^{*}-\hat{\theta}_{n}\}$ given $\{X_{i}\}$ and the limiting law of $r_{n}\{\hat{\theta}_{n}-\theta_{0}\}$ by

[TABLE]

where $E_{W}$ denotes expectation with respect to the bootstrap weights $\{W_{i}\}_{i=1}^{n}$ holding the data $\{X_{i}\}_{i=1}^{n}$ fixed. Employing the distribution of $r_{n}\{\hat{\theta}_{n}^{*}-\hat{\theta}_{n}\}$ conditional on the data as an approximation to the distribution of $\mathbb{G}$ is then asymptotically justified if their distance, equivalently (21), converges in probability to zero.

We formalize the above discussion by imposing the following assumptions on $\hat{\theta}_{n}^{*}$ .

Assumption 3.1.

(i) $\hat{\theta}_{n}^{*}:\{X_{i},W_{i}\}_{i=1}^{n}\rightarrow\mathbb{D}_{\phi}$ with $\{W_{i}\}_{i=1}^{n}$ independent of $\{X_{i}\}_{i=1}^{n}$ ; (ii) $\hat{\theta}_{n}^{*}$ satisfies $\sup_{f\in\text{BL}_{1}(\mathbb{D})}|E_{W}[f(r_{n}\{\hat{\theta}_{n}^{*}-\hat{\theta}_{n}\})]-E[f(\mathbb{G})]|=o_{p}(1)$ .

Assumption 3.2.

(i) $E[f(r_{n}\{\hat{\theta}_{n}^{*}-\hat{\theta}_{n}\})^{*}]-E[f(r_{n}\{\hat{\theta}_{n}^{*}-\hat{\theta}_{n}\})_{*}]\to 0$ for all $f\in\text{BL}_{1}(\mathbb{D})$ where $f(r_{n}\{\hat{\theta}_{n}^{*}-\hat{\theta}_{n}\})^{*}$ and $f(r_{n}\{\hat{\theta}_{n}^{*}-\hat{\theta}_{n}\})_{*}$ denote minimal measurable majorant and maximal measurable minorant (with respect to $\{X_{i},W_{i}\}_{i=1}^{n}$ jointly) respectively; (ii) $f(r_{n}\{\hat{\theta}_{n}^{*}-\hat{\theta}_{n}\})$ is a measurable function of $\{W_{i}\}_{i=1}^{n}$ outer almost surely in $\{X_{i}\}_{i=1}^{n}$ for any continuous and bounded $f:\mathbb{D}\rightarrow\mathbf{R}$ .

Assumption 3.1(i) formally defines the bootstrap analog $\hat{\theta}_{n}^{*}$ of $\hat{\theta}_{n}$ , while Assumption 3.1(ii) simply imposes the consistency of the “law” of $r_{n}\{\hat{\theta}_{n}^{*}-\hat{\theta}_{n}\}$ conditional on the data for the law of $\mathbb{G}$ , i.e., the bootstrap “works” for the estimator $\hat{\theta}_{n}$ . Assumption 3.2 is of technical concern. In particular, Assumption 3.2(i) can often be established as a result of bootstrap consistency (Vaart1996), while Assumption 3.2(ii) is easy to verify for particular resampling schemes. For example, if $\{W_{i}\}_{i=1}^{n}\mapsto f(r_{n}\{\hat{\theta}_{n}^{*}-\hat{\theta}_{n}\})$ is continuous, then Assumption 3.2(ii) is fulfilled. When $\theta_{0}$ is Euclidean-valued, i.e., $\mathbb{D}=\mathbf{R}^{k}$ with $k\in\mathbf{N}$ , one can dispense with Assumption 3.2.

3.2 Failures of the Standard Bootstrap

We now turn to the challenges for inferences using the standard bootstrap caused by first order degeneracy. As is well known in the literature, the law of

[TABLE]

conditional on the data provides a consistent estimator of the law of $\phi_{\theta_{0}}^{\prime}(\mathbb{G})$ provided $\phi$ is Hadamard differentiable (Vaart1996), which in particular includes the case when $\phi_{\theta_{0}}^{\prime}=0$ . In other words, the standard bootstrap, meaning the law of (22) conditional on the data, is consistent for the law of $\phi_{\theta_{0}}^{\prime}(\mathbb{G})$ regardless of the presence of first order degeneracy.

Substantial difficulties, however, arise from using (22) for inferential purposes when first order degeneracy does occur. Ignoring the first order degeneracy or perhaps as a way to avoid ridiculous confidence intervals such as (17), one might consider the following confidence interval for real-valued $\phi(\theta_{0})$ :

[TABLE]

where $\tilde{c}_{1-\alpha}$ is the $(1-\alpha)$ -th bootstrapped quantile for $\alpha\in(0,1)$ defined as

[TABLE]

However, establishing the validity of (23) as a level $1-\alpha$ confidence interval for $\phi(\theta_{0})$ is problematic because $\tilde{c}_{1-\alpha}\xrightarrow{p}0$ for all $\alpha\in(0,1)$ and [math] is a discontinuity point of the cdf of the limit (see Lemma B.1).

In fact, simple algebra reveals that (23) is numerically identical to

[TABLE]

where $\bar{c}_{\alpha}$ is defined as

[TABLE]

In other words, $\bar{c}_{\alpha}$ is the $\alpha$ -th bootstrapped quantile of the standard bootstrap based on second order asymptotics:

[TABLE]

As illustrated by Babu1984bootstrap for the squared mean example, the conditional law of (25) is inconsistent for the law of $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ when $\theta_{0}=0$ , the point at which first order degeneracy arises. We next demonstrate that the bootstrap failure in this simple example is a reflection of a deeper principle: the second order standard bootstrap is consistent if and only if $\phi_{\theta_{0}}^{\prime\prime}$ is degenerate, under regularity conditions.

Theorem 3.1.

Suppose that Assumptions 2.1, 2.2, 3.1 and 3.2 hold, and that $\mathbb{G}$ is centered Gaussian. Then $\phi_{\theta_{0}}^{\prime\prime}=0$ on the support of $\mathbb{G}$ if and only if

[TABLE]

If, in addition, $\phi$ is second order Hadamard differentiable, then the conclusion holds without requiring $\mathbb{G}$ to be centered Gaussian.

The sufficiency part of the theorem is somewhat expected and not a deep result, while the necessity is perhaps surprising and has far-reaching implications for statistical inference as we shall detail shortly. The proof of the latter consists of two steps: in the first step, we show that bootstrap consistency as in (26) implies existence of a bilinear map $\Phi_{\theta_{0}}^{\prime\prime}$ corresponding to $\phi_{\theta_{0}}^{\prime\prime}$ , in similar fashion as the proof of Theorem 3.1 in Fang_Santos2014HDD; in the second step, we establish that $\Phi_{\theta_{0}}^{\prime\prime}$ and hence $\phi_{\theta_{0}}^{\prime\prime}$ is necessarily degenerate. Both steps involve the insights of equating distributions through their characteristic functionals as in Vaart1991differentibility and Hirano_Porter2012.

Theorem 3.1 implies that, in the presence of first order degeneracy, if the second order derivative $\phi_{\theta_{0}}^{\prime\prime}$ is nondegenerate, then the standard bootstrap based on second order asymptotics is necessarily inconsistent whenever $\mathbb{G}$ is centered Gaussian. If $\phi_{\theta_{0}}^{\prime\prime}$ is degenerate, we have a degenerate limiting distribution that can not be directly used for inference. We thus conclude that bootstrap failure is an inherent implication of models with first order degeneracy.

Heuristically, the reason why the standard bootstrap fails is that even though $r_{n}^{2}\phi_{\theta_{0}}^{\prime}(\hat{\theta}_{n}-\theta_{0})=0$ in the “real world”, its bootstrap counterpart is non-negligible. To see this, consider the squared mean example. If $\theta_{0}=0$ , then

[TABLE]

This is an emphatic reflection of Efron1979’s caveat that the bootstrap, as well as other resampling schemes, provides frequency approximations rather than likelihood approximations. These heuristics suggest that the standard bootstrap might work if the first order term $r_{n}^{2}\phi_{\hat{\theta}_{n}}^{\prime}(\hat{\theta}_{n}^{*}-\hat{\theta}_{n})$ is included, which turns out to be true for sufficiently smooth maps; see Theorem 3.2.

It is worth noting that Theorem 3.1 holds even if $\phi$ is smooth. Consequently, first order degeneracy is a source of bootstrap inconsistency completely different from that discussed in Fang_Santos2014HDD, i.e., nondifferentiability of $\phi$ . In addition, we note that, without the qualifier that $\mathbb{G}$ is centered Gaussian, bootstrap consistency (26) holds if and only if $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G}+h)-\phi_{\theta_{0}}^{\prime\prime}(h)\overset{d}{=}\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ for all $h\in\mathrm{Supp}(\mathbb{G})$ under mild support conditions; see Theorem A.1 in Fang_Santos2014HDD.

Finally, to further articulate the relations between the current work and that of Fang_Santos2014HDD, we present a table that describes the scopes we work in.

3.3 The Babu Correction

We now extend the Babu correction under our more general setup. We proceed by imposing the following assumption.

Assumption 3.3.

(i) The map $\phi:\mathbb{D}_{\phi}\subset\mathbb{D}\to\mathbb{E}$ is second order Hadamard differentiable at $\theta_{0}\in\mathbb{D}_{\phi}$ tangentially to $\mathbb{D}_{0}$ ; (ii) $\phi$ is first order Hadamard differentiable at every point in some neighborhood of $\theta_{0}$ tangentially to $\mathbb{D}_{0}$ such that 666The appearance of the factor 2 is due to omission of the factor $1/2$ in Definition 2.2.

[TABLE]

for all sequences $\{g_{n},h_{n}\}\subset\mathbb{D}$ and $\{t_{n}\}\subset\mathbf{R}^{+}$ such that $t_{n}\downarrow 0$ , $(g_{n},h_{n})\to(g,h)\in\mathbb{D}_{0}\times\mathbb{D}_{0}$ as $n\to\infty$ and $\theta+t_{n}g_{n},\theta+t_{n}h_{n}\in\mathbb{D}_{\phi}$ for all sufficiently large $n$ , where $\Phi_{\theta_{0}}^{\prime\prime}:\mathbb{D}_{0}\times\mathbb{D}_{0}\to\mathbb{E}$ is the bilinear map underlying $\phi_{\theta_{0}}^{\prime\prime}$ .

Assumption 3.3(i) defines the scope of the Babu correction: it shall be applied to smooth maps, which excludes, for example, the functional associated with the $J$ -test in GMM settings when first order or global identification fails – see Section 4. Assumption 3.3(ii) is stronger than $\phi$ being simply second order Hadamard differentiable, in that it requires the existence of first order derivative at all points in a neighborhood of $\theta_{0}$ such that (3.3) holds. Assumption 3.3 is fulfilled for the setup considered in Babu1984bootstrap and for Examples 2.1 and 2.3, but violated for the remaining examples.

Under Assumption 3.3, the corrected bootstrap

[TABLE]

is consistent for the law of $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ regardless of the degeneracy of $\phi_{\theta_{0}}^{\prime}$ .

Theorem 3.2.

If Assumptions 2.1(i)(ii), 2.2, 3.1, 3.2 and 3.3 hold, then

[TABLE]

Theorem 3.2 generalizes Babu1984bootstrap considerably in that it accommodates semiparametric and nonparametric models, and allows wider resampling schemes beyond the nonparametric bootstrap of Efron1979. The Babu correction works nicely with smooth maps in the sense of Assumption 3.3, but unfortunately is inadequate to handle nonsmooth ones. This is because when $\phi$ is only second order directionally differentiable, often times the derivative $\phi_{\theta_{0}}^{\prime\prime}$ is not “continuous” in $\theta_{0}$ , implying that the Babu correction (28) is unable to estimate $\phi_{\theta_{0}}^{\prime\prime}$ properly and in this way results in inconsistent estimates. For this reason, we next provide yet another resampling method which accommodates (second order) nondifferentiable maps.

3.4 A Modified Bootstrap

In this subsection, we shall present a modified bootstrap following Fang_Santos2014HDD that is consistent for the law of $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ , and adaptive to both the presence of first order degeneracy and nondifferentiability of $\phi$ .

The heuristics underlying our proposal, however, are connected to those in Fang_Santos2014HDD in a subtle way. In the context of first order asymptotics where $\phi$ is only directionally differentiable, inconsistency of the standard bootstrap arises from its inability to properly estimate the directional derivative $\phi_{\theta_{0}}^{\prime}$ . In our setup, however, there are examples in which the derivative $\phi_{\theta_{0}}^{\prime\prime}$ is a known map; see Examples 2.1 and 2.3 which are all differentiable maps. The standard bootstrap in these settings fails because there is a non-negligible term being neglected. However, in all other examples where $\phi$ is not smooth enough, Fang_Santos2014HDD’s arguments will come into play as well.

In any case, the second order weak limit $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ is a composition of the derivative $\phi_{\theta_{0}}^{\prime\prime}$ and the limit $\mathbb{G}$ of $\hat{\theta}_{n}$ , as is the first order limit $\phi_{\theta_{0}}^{\prime}(\mathbb{G})$ . Thus, the law of $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ can be estimated by composing a suitable estimator $\hat{\phi}_{n}^{\prime\prime}$ for $\phi_{\theta_{0}}^{\prime\prime}$ with a consistent bootstrap approximation for the law of $\mathbb{G}$ , in exactly the same fashion as the resampling scheme proposed by Fang_Santos2014HDD. That is, we propose employing the law of

[TABLE]

conditional on the data as an approximation for the law of $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ , where $\hat{\phi}_{n}^{\prime\prime}:\mathbb{D}\to\mathbb{E}$ is a suitable estimator of $\phi_{\theta_{0}}^{\prime\prime}$ . Certainly, we would like $\hat{\phi}_{n}^{\prime\prime}$ to converge to $\phi_{\theta_{0}}^{\prime\prime}$ in some sense as $n\to\infty$ . This can be made precise as follows.

Assumption 3.4.

$\hat{\phi}_{n}^{\prime\prime}:\mathbb{D}\rightarrow\mathbb{E}$ * is a function of $\{X_{i}\}_{i=1}^{n}$ satisfying that for every sequence $\{h_{n}\}\subset\mathbb{D}$ and every $h\in\mathbb{D}_{0}$ such that $h_{n}\to h$ as $n\to\infty$ ,*

[TABLE]

Assumption 3.4 says that $\hat{\phi}_{n}^{\prime\prime}$ converges in probability to $\phi_{\theta_{0}}^{\prime\prime}$ along any convergent sequence $h_{n}\to h$ as $n\to\infty$ . In cases when $\phi_{\theta_{0}}^{\prime\prime}$ is a known map, we may simply set $\hat{\phi}_{n}^{\prime\prime}=\phi_{\theta_{0}}^{\prime\prime}$ for all $n\in\mathbf{N}$ . It is worth noting that Assumption 3.4 is equivalent to requiring: for every compact set $K\subset\mathbb{D}_{0}$ and every $\epsilon>0$ ,

[TABLE]

where $K^{\delta}\equiv\{a\in\mathbb{D}:\inf_{b\in K}\|a-b\|_{\mathbb{D}}<\delta\}$ ; see Lemma B.2. Condition (32) was employed in Fang_Santos2014HDD who also provided several sufficient conditions for it to hold. For example, if $\hat{\phi}_{n}^{\prime\prime}:\mathbb{D}\rightarrow\mathbb{E}$ is Lipschitz continuous, then pointwise consistency of $\hat{\phi}_{n}^{\prime\prime}$ suffices for (32). Unfortunately, second order derivatives often lack uniform continuity and hence those sufficient conditions are inapplicable. Nonetheless, condition (31) is straightforward to verify in all our examples.

Given the equivalence of conditions (31) and (32), consistency of our modified bootstrap (30) follows from Theorem 3.2 in Fang_Santos2014HDD.

Theorem 3.3.

Under Assumptions 2.1(i)(ii), 2.2, 3.1, 3.2 and 3.4, it follows that

[TABLE]

Theorem 3.3 shows that the law of $\hat{\phi}_{n}^{\prime\prime}(r_{n}\{\hat{\theta}_{n}^{*}-\hat{\theta}_{n}\})$ conditional on the data is indeed consistent for the law of $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ , regardless of the degree of smoothness of $\phi$ and degeneracy of $\phi_{\theta_{0}}^{\prime}$ . Interestingly, the resampling scheme in Theorem 3.3 is a mixture of the classical bootstrap and analytical asymptotic approximations. Finally, we note that Assumption 3.4 allows us to think of Theorem 3.3 as a variant of the extended continuous mapping theorem.

Theorems 3.2 and 3.3 are useful for hypothesis testing. Specifically, consider

[TABLE]

Under first order degeneracy, as is the case in all our examples, we employ the test of rejecting $\mathrm{H}_{0}$ if $r_{n}^{2}\phi(\hat{\theta}_{n})>\hat{c}_{1-\alpha}$ where $\hat{c}_{1-\alpha}$ is the critical value constructed from the Babu correction or our proposed bootstrap, i.e.,

[TABLE]

or

[TABLE]

Note that $\hat{c}_{1-\alpha}$ is generally infeasible but can be estimated by Monte Carlo simulations (Efron1979; Hall1992bootstrap; HorowitzBoot). The pointwise size control of our test then follows according to Theorems 3.2 and 3.3. In fact, under additional restrictions, it can provide local size control. This property is particularly attractive because of the irregularity arising from nondifferentiability of $\phi$ . In this case, pointwise asymptotic approximations can be misleading (Imbens_Manski; AndrewsandGuggen2009ETA). Interestingly, it turns out that there is another source of irregularity due to the nature of first order degeneracy (see Lemma A.1). We relegate the detailed discussions to Appendix A in order to make our presentation concise.

We now briefly compare the Babu correction, the above composition procedure and the recentered bootstrap (Hall_Horowitz1996bootstrap; HorowitzBoot). In some cases (for instance, Example 2.1 and the regular $J$ -test), they coincide with each other. However, the Babu correction applies to general smooth functionals, rather than just quadratic forms, and hence can be thought of as a generalization of the recentered bootstrap. The composition procedure, which works for an even larger class of functionals, is a direct approach by exploiting the structure of the limits, and hence is more tractable.

Remark 3.1.

Examples where the convergence rate is not $\sqrt{n}$ include inference based on kernel estimators with undersmoothing (Hall1992bootstrap), smoothed maximum score estimators (Horowitz2002maxscore), and cointegration regressions (ChangParkSong2006BootCoint). For nonstandard convergence rates, however, the bootstrap process $r_{n}\{\hat{\theta}_{n}^{*}-\hat{\theta}_{n}\}$ can fail to consistently estimate the law of $\mathbb{G}$ , violating Assumption 3.1(ii). Fortunately, as far as Theorem 3.3 is concerned, any consistent estimator, which need not satisfy Assumption 3.1(ii), will do. For example, in cube-root estimation problems, one could instead employ some smoothed bootstrap $r_{n}\{\tilde{\theta}_{n}^{*}-\tilde{\theta}_{n}\}$ where $\tilde{\theta}_{n}^{*}$ and $\tilde{\theta}_{n}$ are some smoothed estimators, or $m$ out of $n$ resampling (or subsampling) $m_{n}\{\hat{\theta}_{m_{n}}^{*}-\hat{\theta}_{n}\}$ where $\hat{\theta}_{m_{n}}^{*}$ is a bootstrap estimator based on subsamples of size $m_{n}$ . In the context of estimating nonincreasing density functions, see Kosorok2008Grenander and Sen_Banerjee_Woodroofe2010; for bootstrapping the maximum score estimators, see Delgado_Poo_Wolf2001 and Patra_Seijo_Sen2015.∎

3.5 Estimation of the Derivative

Given the posited bootstrap consistency for the law of $\mathbb{G}$ , the remaining crucial piece towards consistent bootstrap for the law of $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ based on Theorem 3.3 is then an estimator $\hat{\phi}_{n}^{\prime\prime}$ of the derivative $\phi_{\theta_{0}}^{\prime\prime}$ that satisfies Assumption 3.4. There are two general approaches for estimation of $\phi_{\theta_{0}}^{\prime\prime}$ : one by exploiting the structure of $\phi_{\theta_{0}}^{\prime\prime}$ , and the other one based on numerical differentiation as we describe now.

When first order degeneracy occurs, we have

[TABLE]

We may thus estimate $\phi_{\theta_{0}}^{\prime\prime}$ via numerical differentiation as follows: for any $h\in\mathbb{D}$ ,

[TABLE]

If $t_{n}$ tends to zero at a suitable rate, the sense of which is made precise by the following assumption, then $\hat{\phi}_{n}^{\prime\prime}$ is a good estimator for $\phi_{\theta_{0}}^{\prime\prime}$ in the sense of Assumption 3.4.

Assumption 3.5.

$\{t_{n}\}_{n=1}^{\infty}$ * is a sequences of scalars such that $t_{n}\downarrow 0$ and $r_{n}t_{n}\to\infty$ .*

Assumption 3.5 allows a wide range of tuning parameters that can deliver first order validity of our method. The optimal choice of $t_{n}$ is challenging and beyond the scope of the present paper, which we hope to address in future. The next proposition confirms the validity of the numerical estimator (38).

Proposition 3.1 (Hong_Li2015numericaldelta).

If Assumptions 2.1, 2.2(i)(ii), and 3.5 hold, then the numerical estimator $\hat{\phi}_{n}^{\prime\prime}$ in (38) satisfies Assumption 3.4.

The numerical differentiation approach of estimating the derivatives, in the context of the Delta method, dates back to at least Dumbgen1993 in his proposal of the rescaled bootstrap. However, the way it was presented is quite implicit in revealing this, and so the bootstrap procedure is sometimes misunderstood as the $m$ out of $n$ resampling. Effectively, the rescaled bootstrap amounts to estimating the derivative numerically and the law of $\mathbb{G}$ using $n$ bootstrap samples; see Beare_Fang2016Grenander for more details. The recent work of Hong_Li2015numericaldelta provided a range of extensions of the numerical Delta method that have wide applications in econometrics.

Proposition 3.1 provides a way of estimating the derivative $\phi_{\theta_{0}}^{\prime\prime}$ that is tractable in the sense that there is no need to explore the particular structures of $\phi$ or $\phi_{\theta_{0}}^{\prime\prime}$ as long as the tuning parameter $t_{n}$ is properly chosen. On the other hand, the expression of $\phi_{\theta_{0}}^{\prime\prime}$ itself often suggests an intuitive estimator as we elaborate in the next subsection.

3.5.1 Examples Revisited

Examples 2.1 is trivial since $\phi_{\theta_{0}}^{\prime\prime}$ is a known map and hence one can simply set $\hat{\phi}_{n}^{\prime\prime}=\phi_{\theta_{0}}^{\prime\prime}$ for all $n\in\mathbf{N}$ . Example 2.6 is more complicated.

Example 02.6 (Continued).

In the classical case when $\Gamma_{0}(\theta)$ is singleton, we may estimate $\phi_{\theta_{0}}^{\prime\prime}$ based on the GMM estimator $\hat{\gamma}_{n}$ and the estimated Jacobian matrix $\hat{J}_{n}$ . Generally, there are two unknown objects involved in the second order derivative: the identified set $\Gamma_{0}(\theta)$ and $J(\cdot)$ . Let $\mathbf{M}^{m\times k}$ be the space of $m\times k$ matrices. Suppose that $\hat{\Gamma}_{n}\subset\Gamma$ is a $d_{H}$ -consistent estimator for $\Gamma_{0}(\theta)$ , and $\hat{J}_{n}:\Gamma\to\mathbf{M}^{m\times k}$ an estimator for $J:\Gamma\to\mathbf{M}^{m\times k}$ such that $\sup_{\gamma\in\Gamma}\|\hat{J}_{n}(\gamma)-J(\gamma)\|\xrightarrow{p}0$ . Then we may estimate $\phi_{\theta_{0}}^{\prime\prime}$ by

[TABLE]

where $B_{n}\equiv\{v\in\mathbf{R}^{k}:\|v\|\leq t_{n}^{-1}\}$ for $t_{n}\downarrow 0$ satisfying $t_{n}\sqrt{n}\to\infty$ . Consistency of $\hat{\Gamma}_{n}$ can be established by appealing to CHT2007, while uniform consistency of $\hat{J}_{n}$ can be derived using Glivenko-Cantelli type arguments. Following the proof of Lemma D.3, it is straightforward to show that $\hat{\phi}_{n}^{\prime\prime}$ satisfies Assumption 3.4. ∎

4 Application: Testing for Common CH Features

In this section, we apply our framework to develop a robust test of common conditionally heteroskedastic (CH) factor structure by allowing multiple common CH features. Let $\{Y_{t}\}_{t=1}^{T}$ be a $k$ -dimensional time series. According to Engle_Kozicki1993CF, a feature that is present in each component of $Y_{t}$ is said to be common to $Y_{t}$ if there exists a linear combination of $Y_{t}$ that fails to have the feature. A canonical example is the notion of cointegration developed by Engle_Granger1987Co-In in order to characterize the common feature of stochastic trend.

4.1 The Setup

Following Engle_Ng_Rothschild1990asset and Dovonon_Renault2013testing, suppose that the $k$ -dimensional process $\{Y_{t}\}$ satisfies

[TABLE]

where $\Lambda$ is a $k\times p$ matrix of full column rank with $p\leq k$ , $D_{t}$ a $p\times p$ diagonal matrix with diagonal (random) elements $\sigma_{jt}^{2}$ for $j=1,\ldots,p$ , $\Omega$ a $k\times k$ positive semidefinite matrix, and $\{\mathcal{F}_{t}\}_{t=1}^{\infty}$ a filtration to which $\{Y_{t}\}_{t=1}^{\infty}$ and $\{\sigma_{jt}^{2}:j=1,\ldots,p\}_{t=1}^{\infty}$ are adapted. By Engle_Kozicki1993CF, we say that $\{Y_{t}\}$ has a common CH feature if there exists some nonzero $\gamma_{0}\in\mathbf{R}^{k}$ such that $\operatorname{Var}(\gamma_{0}^{\intercal}Y_{t}|\mathcal{F}_{t})$ is constant. The conditional covariance structure (40) has some attractive properties that help to understand, for example, asset excess returns in a parsimonious way (Engle_Ng_Rothschild1990asset). Thus, tests of common CH features can be used to detect the underlying common factor structures that simplify capturing interrelations of economic and financial variables under consideration.

With the help of instrumental variables, a common CH feature can be reformulated by unconditional moments that fit into the classical GMM framework. The following assumption is taken directly from Dovonon_Renault2013testing.

Assumption 4.1.

(i) $\Lambda$ is of full column rank; (ii) $\operatorname{Var}(\sigma_{t}^{2})$ is nonsingular for $\sigma_{t}^{2}\equiv(\sigma_{1t}^{2},\ldots,\sigma_{pt}^{2})^{\intercal}$ ; (iii) $E[Y_{t+1}|\mathcal{F}_{t}]=0$ ; (iv) $Z_{t}$ is an $m\times 1$ $\mathcal{F}_{t}$ -measurable random vector such that $\operatorname{Var}(Z_{t})$ is nonsingular; (v) $\operatorname{Cov}(Z_{t},\sigma_{t}^{2})$ has full column rank $p$ ; (vi) $\{Y_{t},Z_{t}\}$ is stationary and ergodic such that $E[\|Z_{t}\|^{2}]<\infty$ and $E[\|Y_{t}\|^{4}]<\infty$ .

Assumption 4.1(i)-(ii) ensure that there are exactly $k-p$ linearly independent vectors $\gamma_{0}$ , spanning the null space of $\Lambda^{\intercal}$ , such that $\operatorname{Var}(\gamma_{0}^{\intercal}Y_{t}|\mathcal{F}_{t})$ is constant. In other words, the common CH features $\gamma_{0}$ are nonzero solutions of the equation $\Lambda^{\intercal}\gamma_{0}=0$ .777If $\gamma_{0}$ is a common CH feature, so is $a\gamma_{0}$ for any nonzero $a\in\mathbf{R}$ . For mathematical purpose, however, the number of common CH features is defined to the dimension of the null space of $\Lambda^{\text{\scalebox{0.7}{$ \intercal $}}}$ . Assumption 4.1(iii) is a normalization condition that helps to simplify the exposition. Assumption 4.1(iv) defines the instrument $Z_{t}$ formed from the information set $\mathcal{F}_{t}$ , while Assumption 4.1(v) implicitly requires that the number of instruments is no less than that of factors. Assumption 4.1(vi) further specifies the data generating process. We refer the readers to Dovonon_Renault2013testing for further details on Assumption 4.1.

Assumption 4.1 allows us to characterize common CH features as nonzero $\gamma_{0}$ satisfying the vector of unconditional moment equalities (Dovonon_Renault2013testing):

[TABLE]

where $c(\gamma_{0})=E[(\gamma_{0}^{\intercal}Y_{t+1})^{2}]$ . It is then tempting to employ Hansen’s $J$ statistic to test the existence of common CH features (Engle_Kozicki1993CF). Unfortunately, as noted by Dovonon_Renault2013testing, the Jacobian matrix evaluated at the truth is degenerate at zero, rendering standard theory inapplicable. Though, as shall be illustrated, such degeneracy is of a nature different from first order degeneracy. By expanding the moment function to the second order, Dovonon_Renault2013testing showed that the asymptotic distribution of the $J$ statistic is highly nonstandard. Nonetheless, Dovonon_Goncalves2017bootstrapping developed a corrected bootstrap that can consistently estimate the limiting law when the bootstrap of Hall_Horowitz1996bootstrap fails to do so.

However, a key assumption in previous studies is that there exists a unique nonzero $\gamma_{0}$ such that (41) is satisfied, ensured by exclusion restrictions and linear normalization $\sum_{j=1}^{k}\gamma_{0}^{(j)}=1$ (Dovonon_Renault2013testing; Dovonon_Goncalves2017bootstrapping; Lee_Liao2017LocalIDfailure). This is undesirable for the following reasons. First, it is unknown a priori how many (linearly independent) CH features are common to the series under consideration. Second, as pointed out by Engle_Ng_Rothschild1990asset in the context of asset pricing, empirical work often considers large numbers of assets and the numbers of common CH features are expected to be large as well. Third, the linear normalization may in fact lead to no $\gamma_{0}$ satisfying (41) (i.e. non-existence). For example, suppose $\Lambda=[1,1]^{\intercal}$ . Then any common CH feature $\gamma_{0}$ must satisfy $\gamma_{0}^{(1)}+\gamma_{0}^{(2)}=0$ , contradicting the linear normalization $\gamma_{0}^{(1)}+\gamma_{0}^{(2)}=1$ proposed in Dovonon_Renault2013testing. Fourth, in addition to the possibility that exclusion restrictions may be hard to form, the linear normalization is not susceptible of a unique common CH feature (i.e. non-uniqueness). To see this, suppose $\Lambda=[1,-1,-1]^{\intercal}$ . Then for any common CH feature satisfying the normalization, we must have $\gamma_{0}^{(1)}-\gamma_{0}^{(2)}-\gamma_{0}^{(3)}=0$ and $\gamma_{0}^{(1)}+\gamma_{0}^{(2)}+\gamma_{0}^{(3)}=1$ , which admit infinitely many solutions, i.e., the uniqueness is undermined in this case. These arguments motivate us to modify the $J$ -test in a way that accommodates partial identification as well as degenerate Jacobian matrices. Such an extension is nontrivial because the second order (and hence global) identification,888Given first order identification failure, second order identification is equivalent to global identification in the current context because the moment function is quadratic in $\gamma_{0}$ . a condition that Dovonon_Renault2013testing and Dovonon_Goncalves2017bootstrapping heavily rely on, fails.

4.2 A Modified $J$ Test

To exclude the zero solution and avoid falsely excluding the existence of CH features, we employ the following normalization

[TABLE]

Next, to map the current setup into our developed framework, we define a function $\phi:\prod_{j=1}^{m}\ell^{\infty}(\mathbb{S}^{k})\to\mathbf{R}$ by: for any $\theta\in\prod_{j=1}^{m}\ell^{\infty}(\mathbb{S}^{k})$ ,

[TABLE]

Then in view of the moment conditions (41), the hypothesis that there exists at least one common CH feature can be reformulated as

[TABLE]

where $\theta_{0}:\mathbb{S}^{k}\to\mathbf{R}^{m}$ is defined as $\theta_{0}(\gamma)\equiv E[Z_{t}\{(\gamma^{\intercal}Y_{t+1})^{2}-c(\gamma)\}]$ . In this formulation, we have taken the identity matrix $I_{m}$ as the weighting matrix for simplicity.

Given our treatment of Example 2.6, one might next try appealing to the results developed there. Unfortunately, they are not directly applicable. First, the parameter space $\Gamma$ of $\gamma_{0}$ is required to have nonempty interior (see Lemma C.3), whereas in the current context $\Gamma=\mathbb{S}^{k}$ which has empty interior. Second, there is a technical condition there that prevents the Jacobian matrix from being degenerate even when there does exist a unique common CH feature; see Remark C.2 for details. Consequently, we have to re-verify the differentiability conditions for the map (43). By Lemma D.1, under the null, $\phi$ is Hadamard differentiable with degenerate derivative, and second order Hadamard directionally differentiable at $\theta_{0}$ tangentially to $\prod_{j=1}^{m}C(\mathbb{S}^{k})$ with the derivative

[TABLE]

for any $h\in\prod_{j=1}^{m}C(\mathbb{S}^{k})$ , where $\Gamma_{0}=\{\gamma_{0}\in\mathbb{S}^{k}:\theta_{0}(\gamma_{0})=0\}$ is the identified set of $\gamma_{0}$ , and $G\in\mathbf{M}^{m\times k^{2}}$ with the $j$ th row given by $\operatorname{vec}(\Delta_{j})^{\intercal}$ and

[TABLE]

We now make some remarks before proceeding further. First, we stress that first order degeneracy refers to the first order derivative $\phi_{\theta_{0}}^{\prime}$ of the functional $\phi$ , mapping from the function space $\prod_{j=1}^{m}\ell^{\infty}(\mathbb{S}^{k})$ to $\mathbf{R}$ , being degenerate, while the degeneracy Dovonon_Renault2013testing focused on refers to degeneracy of the Jacobian matrix $J(\gamma_{0})\equiv\frac{d}{d\gamma}\theta_{0}(\gamma)|_{\gamma=\gamma_{0}}$ of the moment function $\theta_{0}$ that maps from the parameter space $\Gamma\subset\mathbf{R}^{k}$ of $\gamma_{0}$ to $\mathbf{R}^{m}$ . Thus, the two types of degeneracy are conceptually different. Second, perhaps more importantly, they are also different in terms of the consequences. By Theorem 3.1 and in view of (45), $\phi$ being first order degenerate means that the second order standard bootstrap is inconsistent regardless of whether the Jacobian matrix is degenerate or not, while degeneracy of the Jacobian matrix generates the additional complication that $\phi$ is second order nondifferentiable as reflected by the inside minimization in (45). Third, further allowing multiple (linearly independent) common CH features reinforces the nondifferentiability of $\phi$ as can be seen from the outside minimization in (45).

Next, let the estimator $\hat{\theta}_{T}:\mathbb{S}^{k}\to\mathbf{R}^{m}$ be defined by $\hat{\theta}_{T}(\gamma)=\frac{1}{T}\sum_{t=1}^{T}Z_{t}\{(\gamma^{\intercal}Y_{t+1})^{2}-\hat{c}(\gamma)\}$ with $\hat{c}(\gamma)=\frac{1}{T}\sum_{t=1}^{T}(\gamma^{\intercal}Y_{t+1})^{2}$ . Given the established differentiability of $\phi$ , the asymptotic distribution of $\phi(\hat{\theta}_{T})$ is then an immediate consequence of Theorem 2.1 provided $\hat{\theta}_{T}$ converges weakly. Towards this end, we impose the following assumption as in Dovonon_Renault2013testing.

Assumption 4.2.

$Z_{t}$ , $\operatorname{vec}(Y_{t}Y_{t}^{\intercal})$ and $\operatorname{vec}(Y_{t}Y_{t}^{\intercal})\otimes Z_{t}$ fulfill CLT.999The symbol $\otimes$ denotes Kronecker product.

Assumptions 4.1 and 4.2 together imply that

[TABLE]

where $\mathbb{G}$ is a zero mean Gaussian process with the covariance functional satisfying: for any $\gamma_{1}$ , $\gamma_{2}\in\Gamma_{0}$ and $\mu_{z}\equiv E[Z_{t}]$ ,

[TABLE]

The proposition below delivers the limiting distribution of test statistic $T\phi(\hat{\theta}_{T})$ .

Proposition 4.1.

Let Assumptions 4.1 and 4.2 hold. Then we have under $\mathrm{H}_{0}$

[TABLE]

The asymptotic distribution in (47) is a highly nonlinear functional of the Gaussian process $\mathbb{G}$ in general, which turns out to be consistent with the limits obtained in Dovonon_Renault2013testing and Dovonon_Goncalves2017bootstrapping whenever their second order identification (and global) condition holds; see Remark 4.1. In the latter setting, Dovonon_Goncalves2017bootstrapping showed that the recentered bootstrap of Hall_Horowitz1996bootstrap is inconsistent and thus proposed corrected versions of the standard GMM bootstrap. Unfortunately, their methods are not directly applicable to our setup that allows multiple common CH features (i.e. partial identification), because they crucially rely on the second order and global identification.

We next demonstrate how our bootstrap works. First, let $\{Y_{t+1}^{*},Z_{t}^{*}\}_{t=1}^{T}$ be a bootstrap sample, which can be obtained by block bootstrap, nonoverlapping or overlapping (Carlstein1986subseries; Kunsch1989Jackknife). Because the limiting process $\{\mathbb{G}(\gamma):\gamma\in\Gamma_{0}\}$ is determined by a martingale difference sequence indexed by $\gamma\in\Gamma_{0}$ , the dependence structure of the data does not enter into the limit and we may thus employ Efron1979’s nonparametric bootstrap or more general bootstrap schemes. In any case, we set

[TABLE]

To accommodate diverse resampling schemes, we simply impose the high level condition that $\hat{\theta}_{T}^{*}$ satisfies Assumptions 3.1 and 3.2 (DehlingMikoschSorensen2002EPDep).

It remains to estimate the derivative (45). The numerical differentiation approach can be implemented as in the beginning of Section 3.5. That is, we estimate $\phi_{\theta_{0}}^{\prime\prime}$ by

[TABLE]

where $\kappa_{T}$ satisfies Assumption 3.5. We now describe how to estimate $\phi_{\theta_{0}}^{\prime\prime}$ by exploiting its structure. Let $B_{T}\equiv\{v\in\mathbf{R}^{k}:\|v\|\leq\kappa_{T}^{-1/2}\}$ and $\hat{\Gamma}_{T}\equiv\{\gamma\in\mathbb{S}^{k}:\|\hat{\theta}_{T}(\gamma)\|^{2}-\phi(\hat{\theta}_{T})\leq\kappa_{T}^{2}\}$ ,101010One can theoretically ignore $\phi(\hat{\theta}_{T})$ in the expression of $\hat{\Gamma}_{T}$ . As pointed out by CHT2007, however, such a modification helps avoid an empty set of solutions and improve power. where $\kappa_{T}$ is to be specified. Then we may estimate $\phi_{\theta_{0}}^{\prime\prime}(h)$ by:

[TABLE]

where $\hat{G}\in\mathbf{M}^{m\times k^{2}}$ with its $j$ th row given by $\operatorname{vec}(\hat{\Delta}_{j})^{\intercal}$ for

[TABLE]

In fact, we may further restrict the bounded set $B_{T}$ to reduce the computation burden for $\hat{\phi}_{T}^{\prime\prime}$ ; see Remark D.1. Clearly, the sequence $\{\kappa_{T}\}$ should tend to zero at a suitable rate as $T\to\infty$ . This is made precise as follows.

Assumption 4.3.

$\{\kappa_{T}\}$ * satisfies (i) $\kappa_{T}\downarrow 0$ , and (ii) $\sqrt{T}\kappa_{T}\to\infty$ .*

Assumption 4.3 regulates the rates at which the tuning parameters $\kappa_{T}$ should approach zero, in order to deliver first order validity of our bootstrap inference procedure. The optimal choice of $\kappa_{T}$ is concerned with higher order accuracy of our method, which we do not touch in this paper. Combining the bootstrap $\hat{\theta}_{T}^{*}$ in (48) and the derivative estimator, we are then able to consistently estimate the law of the weak limit in (47) following Theorem 3.3, which in turn allows us to construct critical values. Specifically, let $\hat{c}_{1-\alpha}$ be the $1-\alpha$ quantile of $\hat{\phi}_{T}^{\prime\prime}(\sqrt{T}\{\hat{\theta}_{T}^{*}-\hat{\theta}_{T}\})$ conditional on the data:111111As usual, $P_{W}$ denotes the probability taken with respect to the bootstrap weights $\{W_{T}\}$ , though in the current setup they are implicitly defined. Alternatively, one can think of $P_{W}$ as the probability with respect to the bootstrap sample $\{Z_{t}^{*},Y_{t+1}^{*}\}$ holding data fixed.

[TABLE]

The following proposition confirms that the test of rejecting the existence of common CH features when $T\phi(\hat{\theta}_{T})>\hat{c}_{1-\alpha}$ is valid.

Proposition 4.2.

Suppose Assumptions 3.1, 3.2, 4.1, 4.2, and 4.3 hold. If the cdf of the limit in (47) is continuous and strictly increasing at its $1-\alpha$ quantile for $\alpha\in(0,1)$ , then we have under $\mathrm{H}_{0}$ ,

[TABLE]

Proposition 4.2 implies our test has pointwise asymptotic exact size $\alpha$ and thus is not conservative (in the pointwise sense). Establishing local size control, unfortunately, is challenging in this case, because asymptotic distributions of the statistic under local perturbations do not have definitive relations (to us) to the corresponding pointwise limits in terms of first order dominance. It appears that the problem of developing (at least) locally valid and non-conservative overidentification tests is prevalent in the literature of partial identification (CHT2007; AndrewsandSoares2010).

Finally, we stress that the quadratic structure of the moment function plays no essential roles in our framework. Building upon Example 2.6, one may work with a general moment function that admits a zero Jacobian matrix, but without the requirement that the parameter space have nonempty interior. It is also possible to deal with GMM problems with a rank deficient but possibly nonzero Jacobian matrix. For example, consider testing whether a matrix $\Pi_{0}\in\mathbf{M}^{m\times k}$ with $m\geq k$ has rank $k$ . This amounts to testing

[TABLE]

Here, the moment function is $\gamma\mapsto\theta_{0}(\gamma)\equiv\Pi_{0}\gamma$ which is non-quadratic and whose Jacobian matrix, namely, $\Pi_{0}$ , may have rank less than or equal to $k-1$ . Note also that the parameter space $\mathbb{S}^{k}$ of $\gamma$ has empty interior. We refer the reader to ChenFang2016Rank for more detailed discussions.

Remark 4.1.

The weak limit in Proposition 4.1 is consistent with the one in Dovonon_Renault2013testing, when there does exist a unique common CH feature which satisfies their linear normalization and when the weighting matrix is the identity matrix (for reasons we have mentioned at the beginning of this section) – otherwise the two are not comparable. At the first sight, our testing statistic is different from Dovonon_Renault2013testing’s because we adopted a different normalization, resulting in a different parameter space.121212Dovonon_Renault2013testing also recentered $Z_{t}$ in their construction, though this does not change the statistic numerically. Close inspection, however, shows that the asymptotic distributions are in fact identical, up to a multiplicative constant. Specifically, let $\gamma_{0}$ be the (nonzero) unique CH feature such that $\sum_{j=1}^{k}\gamma_{0}^{(j)}=1$ . Then $\Gamma_{0}=\{\pm\gamma_{0}/\|\gamma_{0}\|\}$ and so by Proposition 5.1, the asymptotic distribution of our $J$ -statistic is simply the law of

[TABLE]

where we simply replaced $v$ with $v/(\sqrt{2}\|\gamma_{0}\|^{2})$ . By Theorem 3.1 and Corollary 3.1 in Dovonon_Renault2013testing– see also Dovonon_Goncalves2017bootstrapping, their $J$ -statistic (with $W$ being the identity matrix) converges in law to

[TABLE]

where $\bar{G}\in\mathbf{M}^{m\times(k-1)^{2}}$ with the $j$ th row $\mathrm{vec}(A\Delta_{j}A^{\intercal})^{\intercal}$ for $A=[I_{k-1},-\jmath_{k-1}]$ and $\jmath_{k-1}$ the $(k-1)\times 1$ vector of ones. By Lemma D.6, however, the two limits in (53) and (54) differ only by the multiplicative constant $\|\gamma_{0}\|^{-4}$ , establishing the claimed consistency. If the common CH feature also satisfies our normalization, i.e., $\|\gamma_{0}\|=1$ , then the two limits are identical. We reiterate that the our main motivation is to build upon Dovonon_Renault2013testing by allowing multiple common CH features and adopting a normalization that would not falsely exclude the existence of any common features.131313Any other linear normalization $c^{\intercal}\gamma_{0}=r$ for known $c\in\mathbf{R}^{k}$ and $r\in\mathbf{R}$ would share the same deficiency as the linear normalization, which includes, for example, $\gamma_{0}^{(1)}=1$ – see our next section. ∎

4.3 Simulation Studies

In this section, we examine the finite sample performance of our framework based on Monte Carlo simulations, and show how the identification assumption in Dovonon_Renault2013testing and Dovonon_Goncalves2017bootstrapping may suffer from their linear normalization. One may then try the multiple testing versions of these tests by testing a few linearly independent linear restrictions, but we show they may be too conservative.

As in Dovonon_Renault2013testing and Dovonon_Goncalves2017bootstrapping, we consider the following CH factor model:

[TABLE]

where $Y_{t}$ is a $k\times 1$ vector that can be thought of asset returns, $F_{t}$ is a $p\times 1$ vector of CH factors, $\Lambda$ is a $k\times p$ matrix of factor loadings, and $U_{t}$ is a vector of idiosyncratic shocks independent of $F_{t}$ . Following Dovonon_Renault2013testing and Dovonon_Goncalves2017bootstrapping, we let $\{U_{t}\}$ be an i.i.d. sequence from $N(0,I_{k}/2)$ , and the $j$ th component $f_{j,t+1}$ of $F_{t+1}$ follow a Gaussian-GARCH(1,1) model such that

[TABLE]

where $\omega_{j},\alpha_{j},\beta_{j}>0$ , $\{\epsilon_{j,t}\}\sim N(0,1)$ i.i.d. across both $j$ and $t$ , and $\{\sigma_{j0}\}$ are independent across $j$ and of $\{\epsilon_{j,t}\}$ . It follows that $\{f_{j,t}\}$ are independent across $j$ for each $t$ . The remaining specifications are detailed in Table 2. Our designs are the same as those in Dovonon_Renault2013testing and Dovonon_Goncalves2017bootstrapping except that different values for $\Lambda$ are used to illustrate the restrictiveness of the linear normalization. Designs D1 and D2 generate two assets while Designs D3, D4 and D5 generate three assets. In Designs D1, D3 and D4, the factor loading matrices $\Lambda$ ensure the existence of common CH features and thus serves for investigation of size performance, while no common CH features exist in Designs D2 and D5, which help us inspect power performance.

The tests are implemented with $m=2$ and instruments $Z_{t}=(Y_{1,t}^{2},Y_{2,t}^{2})^{\intercal}$ for Designs D1 and D2, and with $m=3$ and $Z_{t}=(Y_{1,t}^{2},Y_{2,t}^{2},Y_{3,t}^{2})^{\intercal}$ for Designs D3, D4 and D5. For derivative estimation, we set the tuning parameters $\kappa_{T}=T^{-1/4},T^{-1/3},T^{-2/5}$ for both the derivative estimator in (50) and the numerical derivative estimator as in (49) respectively. These choices are meant to satisfy Assumption 4.3. Again, we do not touch the issue of optimality in this paper, but instead hope to make the point that, even with these crude choices, our methods show substantial improvement over existing ones. The results corresponding to the two sets of choices are denoted as CF1 and CF2. To show the restrictiveness of the linear normalization $\gamma\in\{\gamma^{\prime}\in\mathbf{R}^{k}:\sum_{i=1}^{k}\gamma_{i}^{\prime}=1\}$ as in Dovonon_Renault2013testing, Dovonon_Goncalves2017bootstrapping and Lee_Liao2017LocalIDfailure, we report the results based on Dovonon_Goncalves2017bootstrapping’s corrected and continuously-corrected bootstrap as well as those based on the asymptotic test of Dovonon_Renault2013testing, denoted as DG1, DG2 and DR respectively. The sample sizes are $T=1,000,$ $2,000$ , $5,000,$ $10,000$ , $20,000$ , $40,000$ and $50,000$ . To minimize the initial value effect, the data are obtained by generating $T+100$ samples and dropping the first $100$ samples. We conduct $2,000$ Monte Carlo replications with $200$ empirical bootstrap repetitions for each replication. The nominal level is $5\%$ throughout.

The results are summarized in Tables 3-7. As expected, Dovonon_Goncalves2017bootstrapping’s resampling methods exhibit substantial size distortion, often close to or over $50\%$ ; so does the asymptotic test DR. This does not appear to be a finite sample issue as the distortion is especially severe in large samples. Rather, it is because the linear normalization excludes common CH features that actually exist in the data and in this way leads to wrong conclusions. Our tests considerably reduce the null rejection rates for all the chosen tuning parameters, though both CF1 and CF2 exhibit some degrees of over- and under-rejection, due to the issue of tuning parameters. Another interesting finding is that our bootstrap based on numerical differentiation (CF2) appears to be more sensitive to the choice of tuning parameters, which is somewhat expected because the structural method (CF1) exploits more information of the derivative. We leave a thorough comparison between these two methods for future study.

Alternatively, one may test a few linearly independent linear restrictions by adopting multiple testing versions of the DG and the DR tests, so as to avoid falsely excluding the existence of common CH features. One then rejects the existence of common CH features if all the restrictions are rejected at level $\alpha=5\%$ .141414Since the null is a union of “sub-nulls”, no Bonferroni-type correction is needed. However, the resulting tests, though valid, may be too conservative. To illustrate, we test the null that $\gamma_{0}$ satisfies (i) $\gamma_{0}^{(1)}+\gamma_{0}^{(2)}=1$ or (ii) $\gamma_{0}^{(1)}=1$ for D1 and D2, and satisfies (i) $\gamma_{0}^{(1)}+\gamma_{0}^{(2)}+\gamma_{0}^{(3)}=1$ , (ii) $\gamma_{0}^{(1)}=1$ , or (iii) $\gamma_{0}^{(2)}=1$ for D3, D4 and D5. We implement the multiple testing procedures based on Dovonon_Renault2013testing with optimal weighting matrix and Dovonon_Goncalves2017bootstrapping with the identity weighting matrix, and respectively label them as M-DG1, M-DG2 and M-DR. As expected, the M-DR test suffers from substantial under-rejection for D1, D3 and D4 even in large samples. M-DG1 and M-DG2 improve the situation somewhat, but the under-rejection is still significant for D3. Tables 6 and 7 indicate that our tests are more powerful than M-DG1, M-DG2 and M-DR in all cases. In particular, for D5 the rejection rates of our tests are close to one when $T$ is large while those of M-DG1 and M-DG2 are not. Results for multiple testing procedures based on Dovonon_Goncalves2017bootstrapping with optimal weighting matrix share similar patterns and are available upon request. We reiterate that the multiple testing procedure would not help with partial identification, and both Dovonon_Renault2013testing and Dovonon_Goncalves2017bootstrapping crucially rely on point identification.

5 Conclusion

In this paper, we developed a general statistical framework for conducting inference on functionals exhibiting first order degeneracy, i.e., the first order derivative of the parameter is zero. Our first contribution implies that the standard bootstrap necessarily fails to work in these settings. In light of this failure, we provided two general solutions: one generalizes the Babu correction, and the other one is a modified bootstrap following Fang_Santos2014HDD. Our framework includes many existing results as special cases. To further demonstrate the applicability of our theory, we developed a test of common CH features studied by Dovonon_Renault2013testing but under weaker assumptions that allow the existence of more than one common CH features.

References

\EdefEscapeHex

title1.1title1.1\EdefEscapeHexAppendix TitleAppendix Title\[email protected]\hyper@anchorend Online Supplemental Appendix to “Inference on Functionals under First Order Degeneracy”

Qihui Chen

School of Management and Economics

The Chinese University of Hong Kong, Shenzhen

[email protected] Zheng Fang

Department of Economics

Texas A&M University

[email protected]

The following list includes notation that will be used throughout the supplement.

Appendix A Local Analysis

In this appendix, we show how our bootstrap procedures can provide local size control. We start by characterizing local perturbations of the data generating process and their implications for the testing statistic $r_{n}^{2}\phi(\hat{\theta}_{n})$ .

A.1 Local Perturbations

We first introduce relevant concepts following BKRW993Efficient. In what follows we specialize our setup to the the i.i.d. setting for simplicity.151515Generally, we may consider models that are locally asymptotically quadratic (Vaart1998; Ploberger_Phillips2012optimal). In particular, the data $\{X_{i}\}_{i=1}^{n}$ is presumed to have a common probability measure $P\in\mathcal{P}$ , where $\mathcal{P}$ is a collection of Borel probability measures that possibly generate the data. Further, we think of the parameter $\theta_{0}$ as a map $\theta:\mathcal{P}\to\mathbb{D}_{\phi}$ , i.e., $\theta_{0}=\theta(P)$ . Formally, we impose the following:

Assumption A.1.

(i) $\{X_{i}\}_{i=1}^{n}$ is an i.i.d. sequence with each $X_{i}\in\mathbf{R}^{d_{x}}$ distributed according to $P\in\mathcal{P}$ ; (ii) $\theta_{0}\equiv\theta(P)$ for some known map $\theta:\mathcal{P}\rightarrow\mathbb{D}_{\phi}$ and $\phi(\theta_{0})=0$ .

Given the model $\mathcal{P}$ defined in Assumption A.1, we now formalize the notion of local perturbations to the true probability measure $P$ . Intuitively, a local perturbation can be thought as a sequence of probability measures contained in $\mathcal{P}$ that approaches $P$ . Since the set of probability measures is not a vector space, an appropriate embedding is needed to make precise sense of this idea. This is simplified by considering one dimensional parametric models containing $P$ and contained in $\mathcal{P}$ (Stein1956efficient).

Definition A.1.

A function $t\mapsto P_{t}$ mapping a neighborhood $(-\epsilon,\epsilon)$ of zero into $\mathcal{P}$ is called a differentiable path passing through $P$ if $P_{0}=P$ and for some $h:\mathbf{R}^{d_{x}}\to\mathbf{R}$ ,

[TABLE]

Intuitively, a differentiable path is just a parametric model in $\mathcal{P}$ and indexed by $t\in(-\epsilon,\epsilon)$ such that it is getting close to $P$ sufficiently fast as $t\to 0$ . The function $h$ is referred to as the score function of $P$ and satisfies $\int h\,dP=0$ and $h\in L^{2}(P)$ .

The perturbations on $P$ are fundamental in that they affect everything that is built on the model, which in particular includes the parameter $\theta:\mathcal{P}\to\mathbb{D}_{\phi}$ and the estimator $\hat{\theta}_{n}:\{X_{i}\}_{i=1}^{n}\to\mathbb{D}_{\phi}$ . In this paper, we shall only consider $\theta$ and $\hat{\theta}_{n}$ that are well behaved with respect to these local perturbations. This is formalized by the following assumption.

Assumption A.2.

(i) For every differentiable path $\{P_{t}\}$ in $\mathcal{P}$ with score function $h$ , $\theta:\mathcal{P}\to\mathbb{D}_{\phi}$ is regular in the sense that there exists $\theta_{0}^{\prime}(h)\in\mathbb{D}_{0}$ such that $\|\theta(P_{t})-\theta(P)-t\theta_{0}^{\prime}(h)\|_{\mathbb{D}}=o(t)$ (as $t\rightarrow 0$ ); (ii) $\hat{\theta}_{n}$ is a regular estimator for $\theta(P)$ .161616Formally, $\hat{\theta}_{n}$ is a regular estimator if for every differentiable path $\{P_{t}\}$ in $\mathcal{P}$ with score function $h$ , we have $r_{n}\{\hat{\theta}_{n}-\theta(P_{n})\}\stackrel{{\scriptstyle L_{n}}}{{\rightarrow}}\mathbb{G}$ , where $P_{n}\equiv P_{1/r_{n}}$ and $L_{n}$ denotes the law under $\prod_{i=1}^{n}P_{n}$ .

Assumption A.2(i) is a smoothness condition on the parameter $\theta:\mathcal{P}\to\mathbb{D}_{\phi}$ and the model $\mathcal{P}$ , which rules out parameters defined by, for example, densities or conditional densities with jumps (Ibragimov_Hasminskii1981; Chernozhukov_Hong2004nonregular). In our examples, $\theta_{0}$ takes the form of expectations, so Assumption A.2(i) is met under standard conditions as long as the model $\mathcal{P}$ is sufficiently rich to include differentiable paths (BKRW993Efficient; Brown_Newey1998expectation). Assumption A.2(ii) means that $\hat{\theta}_{n}$ is asymptotically invariant to local perturbations, excluding superefficient estimators such as Hodges’s estimator or Stein’s estimator (Vaart1997Superefficiency). Since $\theta_{0}$ are population means in all our examples, Assumption A.2(ii) is satisfied if we take $\hat{\theta}_{n}$ to be the corresponding sample averages; see, for example, Theorem 3.10.12 in Vaart1996 and Jeganathan1995LANtimeseries. Assumption A.2(i) and (ii) in fact are closely related, though themselves alone do not imply one another. In particular, regularity of $\hat{\theta}_{n}$ plus a mild condition implies regularity of $\theta:\mathcal{P}\to\mathbb{D}_{\phi}$ , and vice versa (Vaart1991differentibility; Hirano_Porter2012).

The local behaviors of our test statistic can now be characterized as follows.

Lemma A.1.

Let $\{P_{t}\}$ be a differentiable path with score function $h$ . Suppose that Assumptions 2.1, 2.2, A.1 and A.2 hold. Then,

[TABLE]

where $L_{n}$ denotes the law under $\prod_{i=1}^{n}P_{n}$ with $P_{n}\equiv P_{1/r_{n}}$ by abuse of notation.

Lemma A.1 indicates that the asymptotic distribution of $r_{n}^{2}\phi(\hat{\theta}_{n})$ varies as a function of the score $h$ , and in this sense exhibits second order irregularity, even if the map $\phi$ is both first and second order differentiable and $\hat{\theta}_{n}$ is regular. This is perhaps surprising ex ante and yet somewhat expected ex post. One important implication of Lemma A.1 is that one should carefully evaluate how sensitive the statistical procedures under consideration is, in the presence of first order degeneracy.

A.2 Local Size and Power

Having derived the asymptotic distributions of $r_{n}^{2}\phi(\hat{\theta}_{n})$ under local perturbations, we are now in a position to establish local power performance and local size control of our test. We consider differentiable paths $\{P_{t}\}$ in $\mathcal{P}$ that also belong to the set

[TABLE]

Thus, a path $\{P_{t}\}\in\mathcal{H}$ is such that $\{P_{t}\}$ satisfies the null hypothesis whenever $t\leq 0$ , but switches to satisfying the alternative hypothesis at all $t>0$ . One can think of $\mathcal{H}$ as a simple device to study local size and power in a compact way. Further, we denote the power function at sample size $n$ for the test that rejects whenever $r_{n}^{2}\phi(\hat{\theta}_{n})>\hat{c}_{1-\alpha}$ by

[TABLE]

where we write $P_{n}\equiv P_{\eta/r_{n}}$ and $P_{n}^{n}\equiv\prod_{i=1}^{n}P_{n}$ . The following additional assumption ensures local size control of our test.

Assumption A.3.

(i) $\mathbb{E}=\mathbf{R}$ ; (ii) The cdf of $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ is strictly increasing and continuous at its $(1-\alpha)$ -th quantile $c_{1-\alpha}$ ; (iii) There exists a strictly increasing function $\tau:\phi_{\theta_{0}}^{\prime\prime}(\mathbb{D}_{0})\to\mathbf{R}$ such that $\tau(0)=0$ and $\tau\circ\phi_{\theta_{0}}^{\prime\prime}:\mathbb{D}_{0}\to\mathbf{R}$ is subadditive.

Assumption A.3(i) formalizes the requirement that $\phi$ be scalar valued. Assumption A.3(ii) requires strict monotonicity of the cdf of $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ at $c_{1-\alpha}$ which ensures consistency of the critical value $\hat{c}_{1-\alpha}$ , and continuity which ensures the test controls size at least pointwise in $P$ . Subadditivity of $\tau\circ\phi_{\theta_{0}}^{\prime\prime}$ as required in Assumption A.3(iii) is crucial for establishing local size control of our test. This condition was imposed directly on the first order derivative in Fang_Santos2014HDD. In our setup, $\phi_{\theta_{0}}^{\prime\prime}$ itself often violates subadditivity because it is closely related to quadratic forms. Nonetheless, in all but Example 2.6, $\tau\circ\phi_{\theta_{0}}^{\prime\prime}$ is subadditive for $\tau:\mathbf{R}^{+}\to\mathbf{R}^{+}$ given by $\tau(\nu)=\sqrt{\nu}$ .171717For Example 2.6, it turns out that $\sqrt{\phi_{\theta_{0}}^{\prime\prime}(\cdot)}$ is subadditive when $\gamma_{0}$ is point identified, though the main motivation for us being general there is to accommodate partial identification as well as the Jacobian matrix being degenerate.

The following theorem derives the asymptotic limits of the power function $\pi_{n}(P_{\eta/r_{n}})$ .

Theorem A.1.

Let Assumptions 2.1, 2.2, 3.1, 3.2, 3.4, A.1, A.2 and A.3(i)(ii) hold. It then follows that for any differentiable path $\{P_{t}\}$ in $\mathcal{H}$ with score function $h$ , and every $\eta\in\mathbf{R}$ we have

[TABLE]

If in addition Assumption A.3(iii) also holds, then we can conclude that for any $\eta\leq 0$

[TABLE]

The first claim of the theorem establishes a lower bound for the power function under local perturbations to the null which includes in particular local alternatives. In fact, the lower bound is sharp whenever $c_{1-\alpha}$ is a continuity point of the cdf of $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G}+\eta\theta_{0}^{\prime}(h))$ , in which case (A.3) holds with equality. The role of Assumption A.3(iii) can be seen from (A.3) and the inequalities

[TABLE]

where the second equality is due to $\phi_{\theta_{0}}^{\prime\prime}(\theta_{0}^{\prime}(\eta h))=0$ and $\tau(0)=0$ .181818This is because $\phi_{\theta_{0}}^{\prime\prime}(\eta\theta_{0}^{\prime}(h))=\lim_{n\rightarrow\infty}n\{\phi(\theta(P_{n}))-\phi(\theta(P))\}=0$ by Assumption 2.1 and $\{P_{n}\}$ being a local perturbation under the null.

To conclude this section, we note that it is possible to develop a testing procedure adaptive to potential first order degeneracy, that is, in settings where $\phi$ is not always first order degenerate under the null. We emphasize that $r_{n}^{2}\phi(\hat{\theta}_{n})$ fails to be a valid statistic since it diverges to infinity at those nondegenerate points, and so does

[TABLE]

because $\theta_{0}$ might not be identified given $\phi(\theta_{0})=0$ . By introducing an appropriate selection rule, we can combine first and second order asymptotics to provide a more general testing procedure; see Remark A.1. Development of adaptiveness not only serves to maintain generality of our theory, but also is necessary when constructing confidence sets for $\phi(\theta_{0})$ ; see Remark A.2.

Remark A.1.

If $\phi_{\theta_{0}}^{\prime}$ is only degenerate at some but not all points under the null, then one may employ the statistic

[TABLE]

where $\kappa_{n}\downarrow 0$ satisfying $\kappa_{n}r_{n}\to\infty$ as $n\to\infty$ . Heuristically, if $\phi_{\theta_{0}}^{\prime}$ is nondegenerate, then $r_{n}\phi(\hat{\theta}_{n})/\kappa_{n}=O_{p}(1)/o_{p}(1)\xrightarrow{p}\infty$ and thus with probability approaching one $T_{n}=r_{n}\phi(\hat{\theta}_{n})$ which has nondegenerate weak limit $\phi_{\theta_{0}}^{\prime}(\mathbb{G})$ . If $\phi_{\theta_{0}}^{\prime}$ is degenerate, then $r_{n}\phi(\hat{\theta}_{n})/\kappa_{n}=r_{n}^{2}\phi(\hat{\theta}_{n})/\kappa_{n}r_{n}=O_{p}(1)/\kappa_{n}r_{n}\xrightarrow{p}0$ and therefore with probability approaching one $T_{n}=r_{n}^{2}\phi(\hat{\theta}_{n})$ which has nondegenerate weak limit $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ . Accordingly we may construct the corresponding critical value as

[TABLE]

where for $\alpha\in(0,1)$ and some estimator $\hat{\phi}_{n}^{\prime}$ of $\phi_{\theta_{0}}^{\prime}$ ,

[TABLE]

The indicator functions above serve as a rule for selecting proper statistics based on degeneracy of (a finite sample analogue of) $\phi_{\theta_{0}}^{\prime}$ . ∎

Remark A.2.

Confidence regions for $\nu_{0}\equiv\phi(\theta_{0})\in\mathbb{E}$ can be constructed by test inversion based on the statistic

[TABLE]

where $\psi:\mathbb{D}_{\phi}\to\mathbf{R}$ is given by $\psi(\theta)\equiv\|\phi(\theta)-\nu_{0}\|_{\mathbb{E}}$ . Critical values can be constructed in a similar fashion as in Remark A.1. By the chain rule (Shapiro1990, Proposition 3.6), it is straightforward to see that $\psi^{\prime}_{\theta_{0}}=\|\phi^{\prime}_{\theta_{0}}\|_{\mathbb{E}}$ and so $\phi^{\prime}_{\theta_{0}}=0$ if and only if $\psi^{\prime}_{\theta_{0}}=0$ . Moreover, $\psi^{\prime\prime}_{\theta_{0}}=\|\phi^{\prime\prime}_{\theta_{0}}\|_{\mathbb{E}}$ when $\psi^{\prime}_{\theta_{0}}=0$ . In general, confidence regions thus constructed are less conservative than the plug-in type confidence regions $\phi(\mathcal{C}_{n,\theta})$ with $\mathcal{C}_{n,\theta}$ some level $1-\alpha$ confidence region for $\theta_{0}$ . Pointwise validity of $\mathcal{C}_{n,\theta}$ is straightforward to establish, but the local properties appear to be challenging to develop. ∎

Finally, we present the proofs of Lemma A.1 and Theorem A.1.

Proof of Lemma A.1: By Assumptions 2.2(i)(ii), A.1 and A.2, we have for $P_{n}\equiv P_{1/r_{n}}$ ,

[TABLE]

Combination of Assumptions 2.1(i)(ii), $\phi(\theta(P))=\phi_{\theta_{0}}^{\prime}=0$ , and result (A.7) allows us to invoke the second order Delta method to conclude that

[TABLE]

This completes the proof of the lemma.∎

Proof of Theorem A.1: Under the assumptions in Theorem 3.3 and Assumptions A.3(i)(ii), we can show following the proof of Corollary 3.2 in Fang_Santos2014HDD that $\hat{c}_{1-\alpha}\xrightarrow{p}c_{1-\alpha}$ under $P^{n}$ . By Theorem 12.2.3 and Corollary 12.3.1 in TSH2005, $P_{n}^{n}$ and $P^{n}$ are mutually contiguous. It follows that

[TABLE]

Lemma A.1, Assumption A.3(i)(ii) and result (A.9) allow us to conclude by the portmanteau theorem that

[TABLE]

This establishes the first claim of the theorem.

For the second claim, note that if $\eta\leq 0$ , then

[TABLE]

where we exploited $\phi(\theta(P))=\phi(\theta(P_{n})=0$ for all $n$ and Assumption 2.1(iii). Hence,

[TABLE]

where the second inequality is due to the Lemma A.1, result (A.9) and the portmanteau theorem, the second equality is by $\tau$ being strictly increasing, the third inequality is by $\tau\circ\phi_{\theta_{0}}^{\prime\prime}$ being subadditive, and the third equality is due to result (A.11), $\tau(0)=0$ and $\tau$ being strictly increasing. This proves the second claim of the theorem.∎

Appendix B Proofs of Main Results

By Assumption 2.2(ii), the support $\mathbb{D}_{L}$ of $\mathbb{G}$ satisfies $\mathbb{D}_{L}\subset\mathbb{D}_{0}$ . Since only the differentiability of $\phi$ on $\mathbb{D}_{L}$ is relevant, we may assume without loss of generality that $\mathbb{D}_{0}=\mathbb{D}_{L}$ in what follows. Moreover, By Proposition I.3.3 in Vakhania_Tarieladze_Chobanyan1987probability, the support $\mathbb{D}_{0}$ of $\mathbb{G}$ is closed. It then follows from Theorem 4.1 in Dugundji1951extension and Assumption 2.1(i), $\phi_{\theta_{0}}^{\prime\prime}$ can be continuously extended from $\mathbb{D}_{0}$ to $\mathbb{D}$ . Throughout the appendix, we thus interpret $\phi_{\theta_{0}}^{\prime\prime}$ as its continuous extension whenever it takes arguments $h\in\mathbb{D}\backslash\mathbb{D}_{0}$ with $\mathbb{D}_{0}$ being the support of $\mathbb{G}$ .

Proof of Theorem 2.1: The second claim follows from the first by the Slutsky theorem and the continuous mapping theorem, in view of Assumption 2.2(i)(ii) and continuity of $\phi_{\theta_{0}}^{\prime\prime}$ on $\mathbb{D}$ (interpreted as some continuous extension). Nonetheless, for pedagogical purposes, we go backwards and start by proving the second claim first. For each $n\in\mathbf{N}$ , let $\mathbb{D}_{n}\equiv\{h\in\mathbb{D}:\theta_{0}+h/r_{n}\in\mathbb{D}_{\phi}\}$ and define $g_{n}:\mathbb{D}_{n}\to\mathbb{E}$ by

[TABLE]

By Assumption 2.1(ii), $\|g_{n}(h_{n})-\phi_{\theta_{0}}^{\prime\prime}(h)\|_{\mathbb{E}}\to 0$ whenever $h_{n}\to h\in\mathbb{D}_{0}$ . Moreover, $\mathbb{G}\in\mathbb{D}_{0}$ (almost surely) is separable since it is tight by Assumption 2.2(ii). The second claim then follows by Theorem 1.11.1(i) in Vaart1996.

As for the first claim, define $f_{n}:\mathbb{D}_{n}\times\mathbb{D}\to\mathbb{E}\times\mathbb{E}$ by

[TABLE]

Assumption 2.1(ii) then allows us to conclude again by Theorem 1.11.1(i) in Vaart1996 that

[TABLE]

By the continuous mapping theorem applied to result (B.1), we have

[TABLE]

The first claim then follows from result (B.2) and Lemma 1.10.2(iii) in Vaart1996. ∎

Proof of Theorem 3.1: Inspecting the structure of the problem, we see that the bootstrap consistency (26) is equivalent to $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G}+h)-\phi_{\theta_{0}}^{\prime\prime}(h)\overset{d}{=}\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ for all $h\in\mathbb{D}_{0}$ by exactly the same arguments as the proof of Theorem A.1 in Fang_Santos2014HDD. Thus, it boils down to showing that $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G}+h)-\phi_{\theta_{0}}^{\prime\prime}(h)\overset{d}{=}\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ for all $h\in\mathbb{D}_{0}$ if and only if $\phi_{\theta_{0}}^{\prime\prime}(h)=0$ for $h\in\mathbb{D}_{0}$ . One direction is immediate since if latter holds, then both $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G}+h)-\phi_{\theta_{0}}^{\prime\prime}(h)$ and $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ are degenerate at [math] for all $h\in\mathbb{D}_{0}$ , and hence are equal in distribution. The converse consists of two steps.

To begin with, note that by Assumption 2.2(ii), $\mathbb{G}$ being centered Gaussian and Lemma A.7 in Fang_Santos2014HDD, we may assume without loss of generality that the support of $\mathbb{G}$ is $\mathbb{D}$ and that $\mathbb{D}$ is separable. Since $\mathbb{D}$ is separable, it follows that the Borel $\sigma$ -algebra, the $\sigma$ -algebra generated by the weak topology, and the cylindrical $\sigma$ -algebra coincide by Theorem 2.1 in Vakhania_Tarieladze_Chobanyan1987probability. Furthermore, by Theorem 7.1.7 in Bogachev2007, $P$ is Radon with respect to the Borel $\sigma$ -algebra, and hence also with respect to the cylindrical $\sigma$ -algebra. Finally, let $P$ be the law of $\mathbb{G}$ on $\mathbb{D}$ .

Step 1: Show that $\phi_{\theta_{0}}^{\prime\prime}$ corresponds to a bilinear map if $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G}+h)-\phi_{\theta_{0}}^{\prime\prime}(h)\overset{d}{=}\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ for all $h\in\mathbb{D}$ .

For completeness, we introduce additional notation following Section 3.7 in Davydov1998local. First, let $\mathbb{D}^{*}$ denote the dual space of $\mathbb{D}$ , and $\langle x,x^{*}\rangle_{\mathbb{D}}=x^{*}(x)$ for any $x\in\mathbb{D}$ and $x^{*}\in\mathbb{D}^{*}$ . Similarly denote the dual space of $\mathbb{E}$ by $\mathbb{E}^{*}$ and the corresponding bilinear form by $\langle\cdot,\cdot\rangle_{\mathbb{E}}$ . Since $\mathbb{G}$ is Gaussian, $\mathbb{D}^{*}\subset L^{2}(P)$ (Bogavcev1998gaussian, p.42). We may thus embed $\mathbb{D}^{*}$ into $L^{2}(P)$ . Denote by $\mathbb{D}_{P}^{\prime}$ the closure of $\mathbb{D}^{*}$ , viewed as a subset of $L^{2}(P)$ . By some abuse of notation write $x^{\prime}(x)=\langle x^{\prime},x\rangle_{\mathbb{D}}$ for any $x^{\prime}\in\mathbb{D}^{\prime}_{P}$ and $x\in\mathbb{D}$ . Finally, for each $h\in\mathbb{D}$ we let $P^{h}$ denote the law of $\mathbb{G}+h$ , write $P^{h}\ll P$ whenever $P^{h}$ is absolutely continuous with respect to $P$ , and define the set:

[TABLE]

Since $P$ is Radon with respect to the cylindrical $\sigma$ -algebra of $\mathbb{D}$ , it follows by Theorem 7.1 in Davydov1998local that there exists a continuous linear map $I:\mathbb{H}_{P}\rightarrow\mathbb{D}^{\prime}_{P}$ satisfying for every $h\in\mathbb{H}_{P}$ :

[TABLE]

Fix an arbitrary $e^{*}\in\mathbb{E}^{*}$ and $h\in\mathbb{H}_{P}$ . Since $\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G}+h)-\phi_{\theta_{0}}^{\prime\prime}(h)\overset{d}{=}\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})$ for all $h\in\mathrm{Supp}(\mathbb{G})$ , it follows that $\langle e^{*},\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G}+rh)-\phi_{\theta_{0}}^{\prime\prime}(rh)\rangle_{\mathbb{E}}$ and $\langle e^{*},\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})\rangle_{\mathbb{E}}$ must be equal in distribution for all $r\in\mathbf{R}$ .191919The proof of Lemma A.3 in Fang_Santos2014HDD never exploits that $\phi_{\theta_{0}}^{\prime}$ is a first order derivative beyond continuity of $\phi_{\theta_{0}}^{\prime}$ and $\phi_{\theta_{0}}^{\prime}(0)=0$ which are satisfied by $\phi_{\theta_{0}}^{\prime\prime}$ . In particular, their characteristic functions must equal each other, and hence for all $r\geq 0$ and $t\in\mathbf{R}$ :

[TABLE]

where in the second equality we have exploited $\phi_{\theta_{0}}^{\prime\prime}$ being positively homogenous of degree two. Setting $C(t)\equiv E[\exp\{it\langle e^{*},\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})\rangle_{\mathbb{E}}\}]$ , we have by (B.4) that

[TABLE]

for all $r\geq 0$ and $t\in\mathbf{R}$ .

We next aim to equate second order right derivatives of both sides in the identity (B.5). The second order right derivative of the left hand side at $r=0$ is given by

[TABLE]

On the other hand, exploiting result (B.3), linearity of $I:\mathbb{H}_{P}\rightarrow\mathbb{D}^{\prime}_{P}$ and that $h\in\mathbb{H}_{P}$ implies $rh\in\mathbb{H}_{P}$ for all $r\in\mathbf{R}$ and in particular for all $r\in[0,1]$ , we may rewrite the right hand side of (B.5) as

[TABLE]

The integrand on the right hand side of (B) is differentiable with respect to $r$ for all $r\in[0,1]$ and the resulting derivative is dominated by $\exp\{|\langle x,Ih\rangle_{\mathbb{D}}|\}\times\{|\langle x,Ih\rangle_{\mathbb{D}}|+\sigma^{2}(h)\}$ which is integrable against $P$ since $\langle\mathbb{G},Ih\rangle_{\mathbb{D}}\sim N(0,\sigma^{2}(h))$ by Proposition 2.10.3 in Bogavcev1998gaussian and $Ih\in\mathbb{D}^{\prime}_{P}$ . Thus by Theorem 2.27(ii) in Folland1999, the first order derivative of the right hand side in (B) at $r\in[0,1]$ exists and is given by

[TABLE]

In turn, result (B.8) allows us to conclude that the second order right derivative of the right hand side in (B) at $r=0$ exists and is given by

[TABLE]

Since equation (B.5) holds for all $r\geq 0$ and $t\in\mathbf{R}$ , it follows from results (B.6) and (B.9) that for all $t\in\mathbf{R}$ :

[TABLE]

Note that $t\mapsto C(t)$ is the characteristic function of $\langle e^{*},\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})\rangle_{\mathbb{E}}$ and hence it is continuous. Thus, since $C(0)=1$ there exists a $t_{0}>0$ such that $C(t_{0})t_{0}\neq 0$ . For such $t_{0}$ it follows from (B.10) that

[TABLE]

Define a map $\Phi_{\theta_{0}}^{\prime\prime}:\mathbb{D}\times\mathbb{D}\to\mathbb{E}$ by

[TABLE]

It then follows from (B.11) that, for any $e^{*}\in\mathbb{E}^{*}$ and any $g,h\in\mathbb{D}$ ,

[TABLE]

where $\sigma(g,h)\equiv E[\langle\mathbb{G},Ig\rangle\langle\mathbb{G},Ih\rangle]$ . Since $I:\mathbb{H}_{P}\to\mathbb{D}_{P}^{\prime}$ is linear, $(h,g)\mapsto\langle e^{*},\Phi_{\theta_{0}}^{\prime\prime}(g,h)\rangle_{\mathbb{E}}$ is bilinear on $\mathbb{H}_{P}\times\mathbb{H}_{P}$ . Moreover, $(h,g)\mapsto\langle e^{*},\Phi_{\theta_{0}}^{\prime\prime}(g,h)\rangle_{\mathbb{E}}$ is continuous on $\mathbb{H}_{P}\times\mathbb{H}_{P}$ due to continuity of $\phi_{\theta_{0}}^{\prime\prime}$ (and hence $\Phi_{\theta_{0}}^{\prime\prime}$ ) and $e^{*}\in\mathbb{E}^{*}$ . We thus conclude from $\mathbb{H}_{P}$ being a dense subspace of $\mathbb{D}$ by Proposition 7.4(ii) in Davydov1998local that $(h,g)\mapsto\langle e^{*},\Phi_{\theta_{0}}^{\prime\prime}(g,h)\rangle_{\mathbb{E}}$ is continuous and bilinear on $\mathbb{D}\times\mathbb{D}$ . Since $e^{*}\in\mathbb{E}^{*}$ is arbitrary, it follows from Lemma A.2 in Vaart1991differentibility that $\Phi_{\theta_{0}}^{\prime\prime}:\mathbb{D}\times\mathbb{D}\to\mathbb{E}$ is bilinear and continuous. By identity (B.12), we have $\phi_{\theta_{0}}^{\prime\prime}(h)=\Phi_{\theta_{0}}^{\prime\prime}(h,h)$ for all $h\in\mathbb{D}$ . Hence, $\phi_{\theta_{0}}^{\prime\prime}$ is a quadratic form corresponding to the bilinear map $\Phi_{\theta_{0}}^{\prime\prime}$ .

Step 2: Conclude that $\phi_{\theta_{0}}^{\prime\prime}=0$ on the support of $\mathbb{G}$ . Note that if $\phi$ is second order Hadamard differentiable, then one can directly start with Step 2.

By Lemma A.3 in Fang_Santos2014HDD, for all $h\in\mathbb{D}$ ,

[TABLE]

where the third equality exploited bilinearity of $\Phi_{\theta_{0}}^{\prime\prime}$ . Fix an arbitrary $e^{*}\in\mathbb{E}^{*}$ . By result (B), we have for all $r\in\mathbf{R}$ and $h\in\mathbb{D}$ ,

[TABLE]

where the last step used linearity of $\Phi_{\theta_{0}}^{\prime\prime}$ in its second argument. We now equate second derivatives of both sides at $r=0$ . The second derivative of the left hand side is trivially zero, while that of the right hand side, by the recursive use of dominated convergence arguments, is given by $E[\exp\{it\langle e^{*},\phi_{\theta_{0}}^{\prime\prime}(\mathbb{G})\rangle_{\mathbb{E}}\}\{2it\langle e^{*},\Phi_{\theta_{0}}^{\prime\prime}(\mathbb{G},h)\rangle_{\mathbb{E}}\}^{2}]$ . Thus we have

[TABLE]

for all $t\in\mathbf{R}$ , which in turn implies that for all $t\in\mathbf{R}\setminus\{0\}$ ,

[TABLE]

Picking a sequence $t_{n}\downarrow 0$ , replacing $t$ with $t_{n}$ in (B.16) and letting $n\to\infty$ leads to, by the dominated convergence theorem: for all $e^{*}\in\mathbb{E}^{*}$ and all $h\in\mathbb{D}$ ,

[TABLE]

Consequently, $\langle e^{*},\Phi_{\theta_{0}}^{\prime\prime}(g,h)\rangle_{\mathbb{E}}=0$ for all $h\in\mathbb{D}$ and $P$ -almost surely $g\in\mathbb{D}$ . Since $e^{*}$ is arbitrary, we conclude by Lemma 6.10 in AliprantisandBorder2006 that $\Phi_{\theta_{0}}^{\prime\prime}(g,h)=0$ for all $h\in\mathbb{D}$ and $P$ -almost $g\in\mathbb{D}$ . Hence, $\phi_{\theta_{0}}^{\prime\prime}(h)=0$ for $P$ -almost $h\in\mathbb{D}$ .

Finally, denote by $\Omega$ the collection of all $h\in\mathbb{D}$ such that $\phi_{\theta_{0}}^{\prime\prime}(h)=0$ . Then we have $P(\Omega)=1$ by Assumption 2.2(ii) and the above discussion. We claim that $\Omega$ is dense in $\mathbb{D}$ . To see this, suppose otherwise and then there must exist some $h_{0}\in\mathbb{D}$ and some $\delta>0$ such that $B(h_{0},\delta)\cap\Omega=\emptyset$ . Note that i) $P(B(h_{0},\delta))>0$ since $h_{0}\in\mathrm{Supp}(P)=\mathbb{D}$ , and ii) $\phi_{\theta_{0}}^{\prime\prime}(h)\neq 0$ for all $h\in B(h_{0},\delta)$ by the definition of $\Omega$ . These contradict the fact $P(\Omega)=1$ . Since $\phi_{\theta_{0}}^{\prime\prime}$ is continuous $\mathbb{D}$ , we may conclude from $\Omega$ being dense in $\mathrm{Supp}(P)$ and $\phi_{\theta_{0}}^{\prime\prime}=0$ on $\Omega$ that $\phi_{\theta_{0}}^{\prime\prime}=0$ on $\mathbb{D}$ . ∎

Proof of Theorem 3.2: Let $\mathbb{D}_{n}\equiv\{h\in\mathbb{D}:\theta_{0}+h/r_{n}\in\mathbb{D}_{\phi}\}$ and define for each $n\in\mathbf{N}$ the map $\Psi_{n}:\mathbb{D}_{n}\times\mathbb{D}_{n}\to\mathbb{E}$ by

[TABLE]

If $\{g_{n},h_{n}\}_{n=1}^{\infty}\subset\mathbb{D}_{n}$ satisfies $(g_{n},h_{n})\to(g,h)\in\mathbb{D}_{0}\times\mathbb{D}_{0}$ as $n\to\infty$ , then Assumption 3.3 allows us to conclude that

[TABLE]

Since $\phi_{\theta_{0}}^{\prime\prime}$ admits a continuous extension on $\mathbb{D}$ , by the corresponding extension of $\Phi_{\theta_{0}}^{\prime\prime}$ according to equation B.12, it follows from (B) that

[TABLE]

Next, let $\mathbb{G}_{n}\equiv r_{n}\{\hat{\theta}_{n}-\theta_{0}\}$ , $\mathbb{G}_{n}^{*}\equiv r_{n}\{\hat{\theta}_{n}^{*}-\hat{\theta}_{n}\}$ and $\mathbb{G}_{n}^{\dagger}\equiv r_{n}\{\hat{\theta}_{n}^{*}-\theta_{0}\}=\mathbb{G}_{n}^{*}+\mathbb{G}_{n}$ . By Assumption 2.1, 2.2, 3.1 and 3.2(i), it follows from Lemma A.2 in Fang_Santos2014HDD that for $\mathbb{G}_{1},\mathbb{G}_{2}$ independently distributed according to $\mathbb{G}$ ,

[TABLE]

By the continuous mapping theorem and result (B.20) we have

[TABLE]

Combining the separability of $\mathbb{G}_{1}$ and $\mathbb{G}_{2}$ by Assumption 2.2(ii), results (B.19) and (B.21), we conclude by Theorem 1.11.1(i) in Vaart1996 that

[TABLE]

By Lemma 1.10.2 in Vaart1996 we have from (B.22) that

[TABLE]

Now fix $\epsilon>0$ . Note that

[TABLE]

By Lemma 1.2.6 in Vaart1996,

[TABLE]

Results (B.23), (B) and (B.25), together with $\epsilon$ being arbitrary, then yield

[TABLE]

Result (B.21) and Assumption 2.2(ii) implies that $(\mathbb{G}_{n},\mathbb{G}_{n}^{\dagger})$ is asymptotically measurable and asymptotically tight. In turn, Lemmas 1.4.3 and 1.4.4 in Vaart1996 implies that $(\mathbb{G}_{n},\mathbb{G}_{n}^{\dagger},\mathbb{G}_{1},\mathbb{G}_{1}+\mathbb{G}_{2})$ is asymptotically tight and asymptotically measurable. Fix an arbitrary subsequence $\{n_{k}\}$ . Then Theorem 1.3.9 in Vaart1996 implies that $(\mathbb{G}_{n},\mathbb{G}_{n}^{\dagger},\mathbb{G}_{1},\mathbb{G}_{1}+\mathbb{G}_{2})$ converges weakly along a further subsequence of $\{n_{k}\}$ to a tight Borel law in $\prod_{j=1}^{4}\mathbb{D}$ , which is equal to $(\mathbb{G}_{1},\mathbb{G}_{1}+\mathbb{G}_{2},\mathbb{G}_{1},\mathbb{G}_{1}+\mathbb{G}_{2})$ by marginal convergence. This is a weak limit where the dependence structure between the first two components and last two components is known and in fact unique. Since $n_{k}$ is arbitrary, it follows that

[TABLE]

Since $\Psi:\mathbb{D}\times\mathbb{D}\to\mathbb{E}$ and hence $(\Psi,\Psi):\prod_{j=1}^{4}\mathbb{D}\to\prod_{j=1}^{2}\mathbb{E}$ is continuous, it follows from result (B.27) and the continuous mapping theorem that

[TABLE]

Combination of the continuous mapping theorem and Lemma 1.10.2(iii) in Vaart1996 yields that

[TABLE]

By the triangle inequality, we have

[TABLE]

By Lemma 1.2.6 in Vaart1996 and result (B.29)

[TABLE]

Combination of (B.26), (B), (B) and the triangle inequality leads to

[TABLE]

The theorem follows by combining (B.26) and (B.32) and noticing that

[TABLE]

where the second equality is due to bilinearity of $\Phi_{\theta_{0}}^{\prime\prime}$ .∎

Proof of Theorem 3.3: Inspecting the proof of Theorem 3.2 in Fang_Santos2014HDD, we see that $\phi_{\theta_{0}}^{\prime}$ being a first order derivative is actually never exploited there. The conclusion of the theorem then follows in view of Lemma B.2 when combined with exactly the same arguments in Fang_Santos2014HDD. ∎

Proof of Proposition 3.1: Let $\{h_{n}\}\subset\mathbb{D}$ and $h\in\mathbb{D}_{0}$ such that $h_{n}\to h$ . Since $\phi_{\theta_{0}}^{\prime}=0$ by Assumption 2.1(iii), we may rewrite $\hat{\phi}_{n}^{\prime\prime}(h_{n})$ :

[TABLE]

where $g_{n}\equiv(t_{n}r_{n})^{-1}r_{n}\{\hat{\theta}_{n}-\theta_{0}\}+h_{n}$ . By Assumptions 2.2(i), 3.5, Lemma 1.10.2 in Vaart1996 and $h_{n}\to h$ , we have $g_{n}\xrightarrow{p}h$ . By Assumptions 2.1(ii), 2.2(ii) and Theorem 1.11.1(ii) in Vaart1996, we thus have

[TABLE]

By Assumption 2.1 and 2.2, it follows from Theorem 2.1 and $r_{n}t_{n}\to\infty$ that

[TABLE]

Combining results (B), (B.34) and (B.35) we thus arrive at the desired conclusion. ∎

Lemma B.1.

Suppose that Assumptions 2.2(i)(ii) and 3.1(ii) hold, and that $\phi:\mathbb{D}_{\phi}\subset\mathbb{D}\to\mathbb{E}\equiv\mathbf{R}$ is Hadamard differentiable at $\theta_{0}\in\mathbb{D}_{\phi}$ tangentially to $\mathbb{D}_{0}$ with $\phi_{\theta_{0}}^{\prime}$ satisfying Assumption 2.1(iii). Then $\hat{c}_{1-\alpha}\xrightarrow{p}0$ , where for $\alpha\in(0,1)$ ,

[TABLE]

Proof: This lemma is somewhat similar to Lemma 5 in AndrewsandGuggen2010ET and we include the proof here only for completeness. Fix $\alpha\in(0,1)$ and let $c_{1-\alpha}\equiv\inf\{c\in\mathbf{R}:P(\phi_{\theta_{0}}^{\prime}(\mathbb{G})\leq c)\geq 1-\alpha\}$ . Note that $c_{1-\alpha}=0$ for all $\alpha\in(0,1)$ . Since $\phi$ is Hadamard differentiable at $\theta_{0}\in\mathbb{D}_{\phi}$ tangentially to $\mathbb{D}_{0}$ , it follows by Theorem 3.9.15 in Vaart1996 that

[TABLE]

This, together with Lemma 10.11 in Kosorok2008, give us: for all $t\in\mathbf{R}\setminus\{0\}$ ,

[TABLE]

Fix $\epsilon>0$ . Clearly, $c_{1-\alpha}\pm\epsilon\in\mathbf{R}\setminus\{0\}$ for all $\epsilon>0$ and all $\alpha\in(0,1)$ . Hence, by (B.37),

[TABLE]

By definition of $\hat{c}_{1-\alpha}$ , it follows from (B.38) that

[TABLE]

Since $\epsilon$ is arbitrary, the conclusion of the lemma then follows from result (B.39).∎

Lemma B.2.

Let Assumptions 2.1 hold, and $\hat{\phi}_{n}^{\prime\prime}:\mathbb{D}\rightarrow\mathbb{E}$ be an estimator depending on $\{X_{i}\}_{i=1}^{n}$ . Then the following are equivalent:

(i)

For every compact set $K\subset\mathbb{D}_{0}$ and every $\epsilon>0$ ,

[TABLE]

(ii)

For every compact set $K\subset\mathbb{D}_{0}$ , every $\delta_{n}\downarrow 0$ and every $\epsilon>0$ ,

[TABLE]

(iii)

For every sequence $\{h_{n}\}\subset\mathbb{D}$ and every $h\in\mathbb{D}_{0}$ such that $h_{n}\to h$ as $n\to\infty$ ,

[TABLE]

Proof: The equivalence between (i) and (ii) is intuitive and straightforward to establish. Suppose that (i) holds. Fix a compact set $K\subset\mathbb{D}_{0}$ , a sequence $\{\delta_{n}\}$ with $\delta_{n}\downarrow 0$ , and $\epsilon,\eta>0$ . We want to show that there exists some $N_{0}>0$ such that for all $n\geq N_{0}$ ,

[TABLE]

But from (i) we know that there is some $\delta_{0}>0$ such that

[TABLE]

which in turn implies that there is some $N_{1}$ satisfying for all $n\geq N_{1}$

[TABLE]

Since $\delta_{n}\downarrow 0$ , there exists some $N_{2}$ such that $\delta_{n}\leq\delta_{0}$ for all $n\geq N_{2}$ and hence

[TABLE]

Setting $N_{0}\equiv\max\{N_{1},N_{2}\}$ , we see that (B.43) follows from (B.45) and (B.46).

Conversely, suppose that (ii) holds, fix a compact set $K\subset\mathbb{D}_{0}$ and $\epsilon>0$ , and we aim to establish (i) or equivalently, there exists some $\delta_{0}>0$ such that (B.45) holds. Pick a sequence $\delta_{n}\downarrow 0$ . Then there exists some $N_{0}$ such that (B.43) holds with “ $\leq$ ” replaced by “ $<$ ”. Setting $\delta_{0}\equiv\delta_{N_{0}}$ , we may then conclude (B.45) from (B.43).

Now suppose (ii) (and hence (i)) holds again and let $\{h_{n}\}\subset\mathbb{D}$ such that $h_{n}\to h\in\mathbb{D}_{0}$ . Fix $\delta>0$ . There must be some $N_{1}$ such that $\|h_{n}-h\|_{\mathbb{D}}<\delta$ for all $n\geq N_{1}$ . By the triangle inequality we have: for all $n\geq N_{1}$ ,

[TABLE]

Part (iii) then follows from (B) and part (i).

Finally, suppose that (iii) holds. Fix a compact set $K\subset\mathbb{D}_{0}$ and $\epsilon>0$ . Let $\delta_{n}\downarrow 0$ . Note that if $\sup_{h\in K^{\delta_{n}}}\|\hat{\phi}_{n}^{\prime\prime}(h)-\phi_{\theta_{0}}^{\prime\prime}(h)\|_{\mathbb{E}}>\epsilon$ , then there must exist some $h_{n}\in K^{\delta_{n}}$ such that $\|\hat{\phi}_{n}^{\prime\prime}(h_{n})-\phi_{\theta_{0}}^{\prime\prime}(h_{n})\|_{\mathbb{E}}>\epsilon$ and this is true for all $n\in\mathbf{N}$ . It follows that

[TABLE]

Note that $h_{n}\in K^{\delta_{n}}$ is possibly random and satisfies $d(h_{n},K)\equiv\inf_{a\in K}\|h_{n}-a\|_{\mathbb{D}}\leq\delta_{n}\to 0$ as $n\to\infty$ . Fix an arbitrary subsequence $\{n_{k}\}$ . Since $K$ is compact, it follows by Lemma A.6 in Fang2014Plugin that there exists a further subsequence $\{n_{k_{j}}\}$ and some deterministic $h\in K$ such that $h_{n_{k_{j}}}\xrightarrow{p}h$ as $j\to\infty$ . By the triangle inequality,

[TABLE]

Since $h_{n_{k_{j}}}\xrightarrow{p}h$ as $j\to\infty$ , the first term on the right hand side above tends to zero along $\{n_{k_{j}}\}$ by (iii) and Lemma B.3, while the second term tends to zero along $\{n_{k_{j}}\}$ by Theorem 1.9.5 in Vaart1996. Since $\{n_{k}\}$ is arbitrary, combination of results (B.48) and (B) then leads to (ii). ∎

Lemma B.3 (Extended Continuous Mapping Theorem).

Let $\mathbb{D}$ and $\mathbb{E}$ be metric spaces equipped with metrics $d$ and $\rho$ respectively, $g_{n}:\mathbb{D}_{n}\subset\mathbb{D}\to\mathbb{E}$ a possibly random map for each $n\in\mathbf{N}$ , and $g:\mathbb{D}_{0}\subset\mathbb{D}\to\mathbb{E}$ a nonrandom map. Suppose that $g_{n}(x_{n})\xrightarrow{p}g(x)$ whenever $x_{n}\to x$ for $x_{n}\in\mathbb{D}_{n}$ and $x\in\mathbb{D}_{0}$ . If $X_{n}\xrightarrow{p}X$ such that $X$ is Borel measurable, separable and satisfies $P(X\in\mathbb{D}_{0})=1$ , then $g_{n}(X_{n})\xrightarrow{p}g(X)$ .

Proof: We closely follow the proof of Proposition A.8.6 in BKRW993Efficient (see also Vaart_Wellner1990prohorov). Fix $\epsilon>0$ throughout. First, we show that $g:\mathbb{D}_{0}\to\mathbb{E}$ is continuous. By assumption, for each $x\in\mathbb{D}_{0}$ we have

[TABLE]

where $\text{Osc}_{g_{n}}(B(x,\delta))\equiv\sup_{y,z\in B(x,\delta)}\rho(g_{n}(y),g_{n}(z))$ for $B(x,\delta)\equiv\{y\in\mathbb{D}_{n}:d(y,x)<\delta\}$ . This can be easily seen by the triangle inequality:

[TABLE]

Notice that again by assumption, the triangle inequality and result (B.50) we have

[TABLE]

as $n\to\infty$ followed by $d(x,y)\to\infty$ . Since $g$ is a nonrandom function, we must have $\rho(g(y),g(x))\to 0$ as $d(y,x)\to 0$ and hence $g$ is continuous on $\mathbb{D}_{0}$ .

Next, for $x\in\mathbb{D}_{0}$ define

[TABLE]

This is well defined by a simple reductio ad absurdum argument as in BKRW993Efficient. We now show that $k(\cdot,\epsilon):\mathbb{D}_{0}\to\mathbf{N}$ is measurable. This is done by proving that $k(\cdot,\epsilon)$ is lower semicontinuous, i.e., $x_{m}\to x$ for $\{x,x_{m}\}\subset\mathbb{D}_{0}$ implies

[TABLE]

Fix $x\in\mathbb{D}_{0}$ and $\{x_{m}\}\subset\mathbb{D}_{0}$ such that $x_{m}\to x$ as $m\to\infty$ . Then there must exist some subsequence $\{m^{\prime}\}$ of $\{m\}$ such that $\liminf_{m\to\infty}k(x_{m},\epsilon)=\lim_{m^{\prime}\to\infty}k(x_{m^{\prime}},\epsilon)$ . Since $k(\cdot,\epsilon)$ is integer valued, we further have $\liminf_{m\to\infty}k(x_{m},\epsilon)=k(x_{m^{\prime}},\epsilon)\equiv k^{\prime}$ for all $m^{\prime}$ sufficiently large. If $k^{\prime}=\infty$ , then the inequality (B.52) follows trivially. Otherwise, suppose that $k^{\prime}<\infty$ . For any $y$ with $d(x,y)<1/k^{\prime}$ , there exists an $m_{0}$ such that $d(x_{m^{\prime}},y)<1/k^{\prime}$ for all $m^{\prime}\geq m_{0}$ . By definition of $k(x,\epsilon)$ , it follows that for all $n\geq k^{\prime}$ ,

[TABLE]

Letting $m^{\prime}\uparrow\infty$ , we have by $x_{m^{\prime}}\to x$ and continuity of $g$ and $P$ that for all $n\geq k^{\prime}$ ,

[TABLE]

Hence, $k(x,\epsilon)\leq k^{\prime}=\liminf_{m\to\infty}k(x_{m},\epsilon)$ and hence $k(\cdot,\epsilon)$ is Borel measurable.

Since $P(X\in\mathbb{D}_{0})=1$ , we may assume without loss of generality that $X$ takes values in $\mathbb{D}_{0}$ . In turn, it follows that $k(X,\epsilon)$ is a Borel $\mathbf{N}$ -valued random variable. Thus there exists some $k_{0}\equiv k_{0}(\epsilon)$ such that

[TABLE]

Since $X_{n}\xrightarrow{p}X$ , there exists some $n_{0}\equiv n_{0}(\epsilon)$ such that for all $n\geq n_{0}(\epsilon)$ ,

[TABLE]

Now define

[TABLE]

It follows that for all $n\geq\max\{n_{0},k_{0}\}$ ,

[TABLE]

by definition of $k(x,\epsilon)$ , results (B.55) and (B.56), and we are done since $\epsilon$ is arbitrary.∎

Appendix C Results for Examples 2.1 - 2.6

Example 2.2: Moment Inequalities

In this example, it is a simple exercise to show that

[TABLE]

Thus, $\phi$ is Hadamard differentiable with the derivative $\phi_{\theta}^{\prime}$ degenerate at $\theta\leq 0$ . Moreover, $\phi$ is second order Hadamard directionally differentiable. The derivative $\phi_{\theta}^{\prime\prime}$ is nondegenerate at 0, though degenerate whenever $\theta<0$ . Exploiting the structure in (C.1), we may easily estimate the derivative by

[TABLE]

where $\kappa_{n}\downarrow 0$ satisfies $\sqrt{n}\kappa_{n}\uparrow\infty$ , and $\mkern 1.5mu\overline{\mkern-1.5muX\mkern-1.5mu}\mkern 1.5mu_{n}\equiv\frac{1}{n}\sum_{i=1}^{n}X_{i}$ . Interestingly, construction of $\hat{\phi}_{n}^{\prime\prime}$ as above amounts to the generalized moment selection procedure as in AndrewsandSoares2010 for conducting inference in moment inequalities models.

Example 2.3: Cramer-von Mises Functionals

Cramer-von Mises functionals can be viewed as generalized Wald functionals. It is straightforward to show that $\phi$ is first and second Hadamard differentiable at any $\theta\in\ell^{\infty}(\mathbf{R}^{d_{x}})$ with derivatives satisfying:

[TABLE]

for all $h\in\ell^{\infty}(\mathbf{R}^{d_{x}})$ . Note that first order derivative $\phi_{\theta}^{\prime}$ is degenerate when $\theta=F_{0}$ , while second order derivative $\phi_{\theta}^{\prime\prime}$ is nowhere degenerate. The corresponding bilinear map $\Phi_{\theta}^{\prime\prime}:\ell^{\infty}(\mathbf{R}^{d_{x}})\times\ell^{\infty}(\mathbf{R}^{d_{x}})\to\mathbf{R}$ is given by $\Phi_{\theta}^{\prime\prime}(h,g)=\int hg\,dF_{0}$ . In this example, there is no need for derivative estimation because $\phi_{\theta_{0}}^{\prime\prime}$ is a known map.

Example 2.4: Stochastic Dominance

Lemma C.1.

Let $w:\mathbf{R}\rightarrow\mathbf{R}^{+}$ satisfy $\int_{\mathbf{R}}w(u)du<\infty$ and $\phi:\ell^{\infty}(\mathbf{R})\times\ell^{\infty}(\mathbf{R})\rightarrow\mathbf{R}$ be given by $\phi(\theta)=\int_{\mathbf{R}}\max\{\theta^{(1)}(u)-\theta^{(2)}(u),0\}^{2}w(u)du$ for any $\theta=(\theta^{(1)},\theta^{(2)})\in\ell^{\infty}(\mathbf{R})\times\ell^{\infty}(\mathbf{R})$ . Then it follows that

(i)

$\phi$ * is first order Hadamard differentiable at any $\theta\in\ell^{\infty}(\mathbf{R})\times\ell^{\infty}(\mathbf{R})$ with $\phi_{\theta}^{\prime}:\ell^{\infty}(\mathbf{R})\times\ell^{\infty}(\mathbf{R})\rightarrow\mathbf{R}$ satisfying for any $h=(h^{(1)},h^{(2)})\in\ell^{\infty}(\mathbf{R})\times\ell^{\infty}(\mathbf{R})$ *

[TABLE]

where $B_{+}(\theta)\equiv\{u\in\mathbf{R}:\theta^{(1)}(u)>\theta^{(2)}(u)\}$ . 2. (ii)

$\phi$ * is second order Hadamard directionally differentiable at any $\theta\in\ell^{\infty}(\mathbf{R})\times\ell^{\infty}(\mathbf{R})$ and the derivative $\phi_{\theta}^{\prime\prime}:\ell^{\infty}(\mathbf{R})\times\ell^{\infty}(\mathbf{R})\rightarrow\mathbf{R}$ is given by: for any $h=(h^{(1)},h^{(2)})\in\ell^{\infty}(\mathbf{R})\times\ell^{\infty}(\mathbf{R})$ *

[TABLE]

where $B_{0}(\theta)\equiv\{u\in\mathbf{R}:\theta^{(1)}(u)=\theta^{(2)}(u)\}$ .

Proof: Fix $\theta\in\ell^{\infty}(\mathbf{R})\times\ell^{\infty}(\mathbf{R})$ . Further, let $t_{n}\downarrow 0$ , $\{h_{n}\}=\{(h_{n}^{(1)},h_{n}^{(2)})\}$ be a sequence in $\ell^{\infty}(\mathbf{R})\times\ell^{\infty}(\mathbf{R})$ satisfying $\|h_{n}^{(1)}-h^{(1)}\|_{\infty}\vee\|h_{n}^{(2)}-h^{(2)}\|_{\infty}=o(1)$ for some $h=(h^{(1)},h^{(2)})\in\ell^{\infty}(\mathbf{R})\times\ell^{\infty}(\mathbf{R})$ , and

[TABLE]

Observe that since $\theta^{(1)}(u)-\theta^{(2)}(u)<0$ for all $u\in B_{-}(\theta)$ , and $\|h_{n}^{(1)}-h_{n}^{(2)}\|_{\infty}=O(1)$ due to $\|h^{(1)}-h^{(2)}\|_{\infty}<\infty$ , the dominated convergence theorem yields that:

[TABLE]

and

[TABLE]

Combining results (C.3) - (C) yields

[TABLE]

which establishes the first claim of the lemma.

Next fix $\theta\in\ell^{\infty}(\mathbf{R})\times\ell^{\infty}(\mathbf{R})$ and let $\{h_{n}\}$ and $\{t_{n}\}$ be as before. Therefore, by the dominated convergence theorem we have

[TABLE]

and

[TABLE]

It follows from results (C.6)-(C) that

[TABLE]

This competes the proof of the second claim and we are done.∎

Note that if $B_{+}(\theta)$ has Lebesgue measure zero, i.e., $\theta^{(1)}\leq\theta^{(2)}$ almost everywhere, then $\phi^{\prime}_{\theta}(h)=0$ and $\phi^{\prime\prime}_{\theta}(h)$ simplifies to $\phi^{\prime\prime}_{\theta}(h)=\int_{B_{0}(\theta)}\max\{h^{(1)}(u)-h^{(2)}(u)\}^{2}w(u)du$ . If in addition the contact set $B_{0}(\theta)$ has Lebesgue measure zero, then $\phi^{\prime\prime}_{\theta}$ in turn is degenerate, corresponding to the degenerate limits obtained in Theorem 1 of Linton2010. Let $\hat{B}_{0}(\theta_{0})$ be an estimator of $B_{0}(\theta_{0})$ . Then we may estimate $\phi_{\theta_{0}}^{\prime\prime}$ by

[TABLE]

It is a simple exercise to verify that Assumption 3.4 is satisfied provided

[TABLE]

where $A\triangle B$ denotes the set difference between sets $A$ and $B$ . Such a construction corresponds to the bootstrap procedure studied in Linton2010.

Example 2.5: Conditional Moment Inequalities

Lemma C.2.

Let $\mathcal{F}$ be compact under some metric $d$ and $\phi:\ell^{\infty}(\mathcal{F})\times\ell^{\infty}(\mathcal{F})\to\mathbf{R}$ be given by $\phi(\theta)=\sup_{f\in\mathcal{F}}\{[\max(\theta^{(1)}(f),0)]^{2}+[\theta^{(2)}(f)]^{2}\}$ . Then it follows that:

(i)

$\phi$ * is Hadamard differentiable at any $\theta\in\ell^{\infty}(\mathcal{F})\times\ell^{\infty}(\mathcal{F})$ satisfying $\theta^{(1)}\leq 0$ and $\theta^{(2)}=0$ , and its derivative $\phi_{\theta}^{\prime}(h)=0$ for any $h\in\ell^{\infty}(\mathcal{F})\times\ell^{\infty}(\mathcal{F})$ *

(ii)

$\phi$ * is second order Hadamard directionally differentiable at any $\theta\in C(\mathcal{F})\times C(\mathcal{F})$ satisfying $\theta^{(1)}\leq 0$ and $\theta^{(2)}=0$ tangentially to $C(\mathcal{F})\times C(\mathcal{F})$ , and the derivative is given by: for any $h\in C(\mathcal{F})\times C(\mathcal{F})$ ,*

[TABLE]

where $\mathcal{F}_{0}\equiv\{f\in\mathcal{F}:\theta^{(1)}(f)=0\}$ , and $\sup\emptyset\equiv 0$ .

Remark C.1.

Note that if $\mathcal{F}_{0}=\emptyset$ , then $\phi_{\theta}^{\prime\prime}$ simplifies to $\phi_{\theta}^{\prime\prime}(h)=\sup_{f\in\mathcal{F}}[h^{(2)}(f)]^{2}$ .∎

Proof: Let $\theta\in\ell^{\infty}(\mathcal{F})\times\ell^{\infty}(\mathcal{F})$ satisfying $\theta^{(1)}\leq 0$ and $\theta^{(2)}=0$ , $\{h_{n}\}\subset\ell^{\infty}(\mathcal{F})\times\ell^{\infty}(\mathcal{F})$ such that $h_{n}\to h\in\ell^{\infty}(\mathcal{F})\times\ell^{\infty}(\mathcal{F})$ , and $t_{n}\downarrow 0$ . Combining $\theta^{(1)}\leq 0$ , $\theta^{(2)}=0$ so that $\phi(\theta)=0$ and the triangle inequality, we have

[TABLE]

as desired in part (i), where in the last step we used the fact that $h_{n}^{(1)}=h_{n}^{(2)}=O(1)$ .

As for the second claim, let $\theta\in C(\mathcal{F})\times C(\mathcal{F})$ satisfying $\theta^{(1)}\leq 0$ and $\theta^{(2)}=0$ , $\{h_{n}\}\subset\ell^{\infty}(\mathcal{F})\times\ell^{\infty}(\mathcal{F})$ such that $h_{n}\to h\in C(\mathcal{F})\times C(\mathcal{F})$ , and $t_{n}\downarrow 0$ . By $\theta^{(1)}\leq 0$ and $\theta^{(2)}=0$ , Lipschtiz continuity of the sup operator and the triangle inequality we have

[TABLE]

Since $\|h_{n}-h\|_{\infty}=o(1)$ and $\theta^{(1)}\leq 0$ , it follows that

[TABLE]

and that

[TABLE]

Combination of results (C), (C) and (C.14) leads to

[TABLE]

Next, fix $\delta>0$ . By definition of $\mathcal{F}_{0}^{\delta}$ , compactness of $\mathcal{F}$ and continuity of $\theta^{(1)}$ , we see that $\sup_{f\in\mathcal{F}\setminus\mathcal{F}_{0}^{\delta}}\theta^{(1)}(f)<0$ . Since also $t_{n}h^{(1)}=o(1)$ and $h^{(1)}\in C(\mathcal{F})$ , it follows that $\theta^{(1)}(f)+t_{n}h^{(1)}(f)<0$ for all $f\in f\in\mathcal{F}\setminus\mathcal{F}_{0}^{\delta}$ and for all $n$ large. In turn we have

[TABLE]

where the last step is due to $h^{(2)}\in C(\mathcal{F})$ . On the other hand, we have,

[TABLE]

where the first inequality is due to $\theta(f)=0$ for all $f\in\mathcal{F}_{0}$ and $\theta^{(1)}\leq 0$ , the second inequality exploits the definition and compactness of $\mathcal{F}_{0}^{\delta}$ , and the equality is due to uniform continuity of $h^{(1)}$ on $\mathcal{F}$ since $h^{(1)}\in C(\mathcal{F})$ and $\mathcal{F}$ is compact.

Finally, combining results (C), (C), and $\phi(\theta)=0$ we have:

[TABLE]

It follows from $\phi_{\theta}^{\prime}=0$ , (C.15) and (C) that

[TABLE]

as desired for the second claim of the lemma. ∎

Suppose that $\hat{\mathcal{F}}_{0}$ and $\hat{\mathcal{F}}_{0,c}$ are respectively estimators of $\mathcal{F}_{0}\equiv\{f\in\mathcal{F}:\theta_{0}^{(1)}(f)=0\}$ and $\mathcal{F}\setminus\mathcal{F}_{0}$ that satisfy202020We note that for two generic sets $A$ and $B$ in a metric space, neither $d_{H}(A,B)$ controls $d_{H}(A^{c},B^{c})$ nor $d_{H}(A^{c},B^{c})$ controls $d_{H}(A,B)$ (Lemenant_Milakis_Spinolo2014).

[TABLE]

Based on $\hat{\mathcal{F}}_{0}$ and $\hat{\mathcal{F}}_{0,c}$ and in view of Lemma B.3 in Fang_Santos2014HDD, we may estimate the derivative as follows:

[TABLE]

The estimation of $\mathcal{F}_{0}$ and $\mathcal{F}\setminus\mathcal{F}_{0}$ is in accordance with the generalized moment selection in Andrews_Shi2013CMI; see also Kaido_Santos2013.

Example 2.6: Overidentification Test

Lemma C.3.

Let $\Gamma\subset\mathbf{R}^{k}$ be a compact set, and $\phi:\prod_{j=1}^{m}\ell^{\infty}(\Gamma)\to\mathbf{R}$ be given by $\phi(\theta)=\inf_{\gamma\in\Gamma}\theta(\gamma)^{\intercal}W\theta(\gamma)$ where $\theta\in\prod_{j=1}^{m}\ell^{\infty}(\Gamma)$ and $W$ is a $m\times m$ symmetric positive definite matrix. Then we have

(i)

$\phi$ * is Hadamard differentiable at any $\theta\in\prod_{j=1}^{m}\ell^{\infty}(\Gamma)$ satisfying $\theta(\gamma)=0$ for some $\gamma\in\Gamma$ with the derivative given by $\phi_{\theta}^{\prime}(h)=0$ for all $h\in\prod_{j=1}^{m}\ell^{\infty}(\Gamma)$ .*

(ii)

If $\Gamma_{0}(\theta)\equiv\{\gamma_{0}\in\Gamma:\theta(\gamma_{0})=0\}$ is in the interior of $\Gamma$ , $\theta\in\prod_{j=1}^{m}C^{1}(\Gamma)$ satisfies $\phi(\theta)=0$ , and for all small $\epsilon>0$ , $\inf_{\gamma\in\Gamma\setminus\Gamma_{0}(\theta)^{\epsilon}}\|\theta(\gamma)\|\geq C\epsilon^{\kappa}$ for some $\kappa\in(0,1]$ and some $C>0$ , then $\phi$ is second order Hadamard directionally differentiable at $\theta$ tangentially to $\prod_{j=1}^{m}C(\Gamma)$ with the derivative given by: for any $h\in\prod_{j=1}^{m}C(\Gamma)$

[TABLE]

where $J:\Gamma_{0}(\theta)\to\mathbf{M}^{m\times k}$ is the Jacobian matrix defined by $J(\gamma_{0})\equiv\frac{d\theta(\gamma)}{d\gamma^{\intercal}}\big{|}_{\gamma=\gamma_{0}}$ .

Proof: Fix $\theta\in\prod_{j=1}^{m}\ell^{\infty}(\Gamma)$ and let $t_{n}\downarrow 0$ and $\{h_{n},h\}\subset\prod_{j=1}^{m}\ell^{\infty}(\Gamma)$ such that $h_{n}\to h$ . For a vector $a\in\mathbf{R}^{m}$ , define the norm $\|a\|_{W}=\sqrt{a^{\intercal}Wa}$ . It follows that

[TABLE]

where the second inequality is because $\theta(\gamma_{0})=0$ for all $\gamma_{0}\in\Gamma_{0}(\theta)$ and the last step is due to $h_{n}=O(1)$ by assumption. This establishes part (i).

For part (ii), fix $\theta\in\prod_{j=1}^{m}C^{1}(\Gamma)$ with $\phi(\theta)=0$ and let $t_{n}\downarrow 0$ and $\{h_{n}\}\subset\prod_{j=1}^{m}\ell^{\infty}(\Gamma)$ such that $h_{n}\to h\in\prod_{j=1}^{m}C(\Gamma)$ . First of all, note that for $\gamma_{0}\in\Gamma_{0}(\theta)$ ,

[TABLE]

where the first inequality is by Lipschitz continuity of the $\inf$ operator and the triangle inequality, and the last inequality follows from $h_{n}\to h$ and $\theta(\gamma_{0})=0$ for $\gamma_{0}\in\Gamma_{0}(\theta)$ .

Next, for each fixed $a\geq(3\lambda_{0}^{-1/2}C^{-1}\max_{\gamma\in\Gamma}\|h(\gamma)\|_{W})^{1/\kappa}$ with $h\neq 0$ and $\lambda_{0}>0$ the smallest eigenvalue of $W$ , by assumption and the triangle inequality we have: for all $n$ sufficiently large so that $t_{n}^{\kappa}\geq t_{n}$ ,

[TABLE]

where the strict inequality is due to $h\neq 0$ . This in turn implies that for all $n$ large,

[TABLE]

Now for $\gamma_{0}\in\Gamma_{0}(\theta)$ , set $V_{n,\gamma_{0}}(a)\equiv\{v\in\mathbf{R}^{k}:\gamma_{0}+t_{n}v\in\Gamma,\|v\|\leq a\}$ and $V(a)\equiv\{v\in\mathbf{R}^{k}:\|v\|\leq a\}$ . Note that $\bigcup_{\gamma_{0}\in\Gamma_{0}(\theta)}V_{n,\gamma_{0}}(a)=\Gamma_{0}(\theta)^{at_{n}}$ . Since $\theta$ and $h$ are continuous, it then follows that

[TABLE]

In turn, notice that

[TABLE]

where the first inequality follows from the formula $|b^{2}-c^{2}|\leq|b+c||b-c|$ and that $\gamma_{0}$ is any fixed element in $\Gamma_{0}(\theta)$ , and the last step follows from uniform continuity of $h$ on $\Gamma$ because $h$ is continuous on $\Gamma$ and $\Gamma$ is compact.

Since $\theta\in\prod_{j=1}^{m}C^{1}(\Gamma)$ , we further have,

[TABLE]

By the mean value theorem applied entry-wise to $\theta(\gamma_{0}+t_{n}v)-\theta(\gamma_{0})$ , there exist $\tilde{\gamma}_{n}^{(1)}(\gamma_{0},v),\ldots,\tilde{\gamma}_{n}^{(m)}(\gamma_{0},v)$ all between $\theta_{0}$ and $\theta_{0}+t_{n}v$ such that

[TABLE]

where by abuse of notation we write

[TABLE]

Since $\theta\in\prod_{j=1}^{m}C^{1}(\Gamma)$ and $\Gamma$ is compact, $J(\cdot)$ is uniformly continuous on $\Gamma$ and hence

[TABLE]

Since all norms in finite dimensional spaces are equivalent, it follows from results (C), (C), (C.29), (C) and $\theta(\gamma_{0})=0$ for all $\gamma_{0}\in\Gamma_{0}(\theta)$ that

[TABLE]

By assumption, $\Gamma_{0}(\theta)$ is in the interior of $\Gamma$ and so $V_{n,\gamma_{0}}(a)=V(a)$ for all $n$ sufficiently large. It follows that

[TABLE]

where the second equality exploits the fact that $V(a)$ is symmetric. For each $\gamma_{0}\in\Gamma_{0}(\theta)$ , by the projection theorem there is some $v^{*}\in\mathbf{R}^{k}$ such that

[TABLE]

Thus, by choosing $a$ large if necessary so that $v^{*}\in V(a)$ , we have from results (C.31), (C) and (C.33) that

[TABLE]

Combining (C.34), $\phi(\theta)=0$ and part (i), we then arrive at part (ii).∎

Remark C.2.

The condition that “for all small $\epsilon>0$ , $\inf_{\gamma\in\Gamma\setminus\Gamma_{0}(\theta)^{\epsilon}}\|\theta(\gamma)\|\geq C\epsilon^{\kappa}$ for some $\kappa\in(0,1]$ and some $C>0$ ” in Lemma C.3 effectively imposes restrictions on the Jacobian matrix that prevent one directly applying Lemma C.3 to the setup of Dovonon_Renault2013testing where $\Gamma_{0}(\theta)=\{\gamma_{0}\}$ is a singleton. To see this, let $\theta$ be the moment function $\rho$ in Dovonon_Renault2013testing. Then, for any $\gamma\in\Gamma\setminus\Gamma_{0}(\theta)^{\epsilon}$ with $\|\gamma-\gamma_{0}\|=a\epsilon$ for $a>1$ , we have by Dovonon_Renault2013testing,

[TABLE]

for some constant $C^{\prime}>0$ depending on the eigenvalues of the Hessian matrices (evaluated at $\gamma_{0}$ ) of the maps $\gamma\mapsto\theta^{(j)}(\gamma)$ , where for the second equality we exploited the facts that (i) $\theta(\gamma_{0})=0$ , (ii) the Jacobian matrix is degenerate, and (iii) $\|\gamma-\gamma_{0}\|=a\epsilon$ . But by assumption, for the same $\gamma$ ,

[TABLE]

for all $\epsilon>0$ sufficiently small since $\kappa\in(0,1]$ , a contradiction. The conclusion holds more generally: the condition in fact excludes Jacobian matrices of deficient rank, regardless of whether $\gamma_{0}$ is point or partially identified. To see this, let $J(\gamma_{0})a=0$ for some nonzero $a\in\mathbf{R}^{k}$ . Then we may choose $\gamma=\gamma_{0}+\lambda a\in\Gamma\setminus\Gamma_{0}(\theta)^{\epsilon}$ for some suitable $\lambda\in\mathbf{R}$ and for all small $\epsilon>0$ – this is possible since $\Gamma_{0}(\theta)$ is required to be in the interior of $\Gamma$ . Then the previous arguments apply with such a choice of $\gamma$ and any $\gamma_{0}\in\Gamma_{0}(\theta)$ . ∎

Appendix D Proofs for Section 4

Lemma D.1.

Let $\phi:\prod_{j=1}^{m}\ell^{\infty}(\mathbb{S}^{k})\to\mathbf{R}$ be given by $\phi(\theta)=\inf_{\gamma\in\mathbb{S}^{k}}\|\theta(\gamma)\|^{2}$ . Then

(i)

$\phi$ * is Hadamard differentiable at any $\theta\in\prod_{j=1}^{m}\ell^{\infty}(\mathbb{S}^{k})$ satisfying $\theta(\gamma_{0})=0$ for some $\gamma_{0}\in\mathbb{S}^{k}$ and the derivative satisfies $\phi_{\theta}^{\prime}(h)=0$ for all $h\in\prod_{j=1}^{m}\ell^{\infty}(\mathbb{S}^{k})$ .*

(ii)

$\phi$ * is second order Hadamard directionally differentiable at any $\theta_{0}(\gamma)\equiv E[Z_{t}\{(\gamma^{\intercal}Y_{t+1})^{2}-c(\gamma)\}]$ under Assumption 4.1 tangentially to $\prod_{j=1}^{m}C(\mathbb{S}^{k})$ with the derivative given by: for all $h\in\prod_{j=1}^{m}C(\mathbb{S}^{k})$ ,*

[TABLE]

*where $\Gamma_{0}=\{\gamma_{0}\in\mathbb{S}^{k}:\theta_{0}(\gamma_{0})=0\}$ is the (nonempty) identified set of $\gamma_{0}$ , and $G\in\mathbf{M}^{m\times k^{2}}$ with the * $j$ th row given by $\operatorname{vec}(\Delta_{j})^{\intercal}$ and

[TABLE]

Proof: Fix $\theta\in\prod_{j=1}^{m}\ell^{\infty}(\mathbb{S}^{k})$ satisfying $\theta(\gamma_{0})=0$ for some $\gamma_{0}\in\mathbb{S}^{k}$ , $\{h_{n}\}\subset\prod_{j=1}^{m}\ell^{\infty}(\mathbb{S}^{k})$ such that $h_{n}\rightarrow h\in\prod_{j=1}^{m}\ell^{\infty}(\mathbb{S}^{k})$ , and $t_{n}\downarrow 0$ . It follows that

[TABLE]

where in the last step we used the fact that $\sup_{\gamma\in\mathbb{S}^{k}}\|h_{n}(\gamma)\|=O(1)$ . So $\phi_{\theta}^{\prime}(h)=0$ for any $h\in\prod_{j=1}^{m}\ell^{\infty}(\mathbb{S}^{k})$ , as desired for the first claim of the lemma.

Now consider $\theta_{0}(\gamma)\equiv E[Z_{t}\{(\gamma^{\intercal}Y_{t+1})^{2}-c(\gamma)\}]$ and suppose that Assumption 4.1 holds. Pick $\{h_{n}\}\subset\prod_{j=1}^{m}\ell^{\infty}(\mathbb{S}^{k})$ such that $h_{n}\rightarrow h\in\prod_{j=1}^{m}C(\mathbb{S}^{k})$ , and $t_{n}\downarrow 0$ . Note that $\phi(\theta_{0})=0$ under Assumption 4.1. Then first, we have

[TABLE]

Next, let $\Gamma_{0}^{\epsilon}\equiv\{\gamma\in\mathbb{S}^{k}:\min_{s\in\Gamma_{0}}\|s-\gamma\|\leq\epsilon\}$ and $\Gamma_{1}^{\epsilon}\equiv\{\gamma\in\mathbb{S}^{k}:\min_{s\in\Gamma_{0}}\|s-\gamma\|\geq\epsilon\}$ . By Equation (7) in Dovonon_Renault2013testing, $\theta_{0}(\gamma)=$ Cov $(Z_{t},\sigma_{t}^{2})$ Diag $(\Lambda^{\intercal}\gamma\gamma^{\intercal}\Lambda)$ ), where for a $p\times p$ matrix $A$ , $\text{Diag}(A)$ denotes the $p\times 1$ vector consisting of diagonal entries. Also, let $\lambda_{\min}(\cdot)$ and $\lambda_{\min}^{+}(\cdot)$ denote the smallest and the smallest positive singular values, respectively. We then have for $C\equiv p^{-1/2}\lambda_{\min}^{+}(\Lambda^{\intercal})\lambda_{\min}(\text{Cov}(Z_{t},\sigma_{t}^{2}))/2$ ,

[TABLE]

where the first inequality follows from a simple application of the singular value decomposition of $\text{Cov}(Z_{t},\sigma_{t}^{2})$ , the second inequality exploits the generalized mean inequality, and last inequality is by Lemma D.4. Note that $\lambda_{\min}(\text{Cov}(Z_{t},\sigma_{t}^{2}))>0$ by Assumption 4.1(v). Let $\Delta\equiv[3C^{-1}\max_{\gamma\in\mathbb{S}^{k}}\|h(\gamma)\|]^{1/2}>0$ for the nontrivial case $\max_{\gamma\in\mathbb{S}^{k}}\|h(\gamma)\|>0$ . Then it follows by the triangle inequality that for $n$ sufficiently large such that $t_{n}\leq\sqrt{t_{n}}$ ,

[TABLE]

and therefore

[TABLE]

For $\gamma_{0}\in\Gamma_{0}$ , let $V_{n,\gamma_{0}}^{\Delta}\equiv\{v\in\mathbf{R}^{k}:\gamma_{0}+\sqrt{t_{n}}v\in\mathbb{S}^{k}$ and $\|v\|\leq\Delta\}$ and $V_{\gamma_{0}}^{\Delta}\equiv\{v\in\mathbf{R}^{k}:\gamma_{0}^{\intercal}v=0$ and $\|v\|\leq\Delta\}$ . Then we have

[TABLE]

where the first equality is due to the definition of $\Gamma_{0}^{\sqrt{t_{n}}\Delta}$ and the second follows by

[TABLE]

where $\gamma_{0}$ in the first inequality is any fixed element in $\Gamma_{0}$ , the last equality follows by the uniform continuity of $h$ over $\mathbb{S}^{k}$ . Noting that $(\gamma^{\intercal}Y_{t+1})^{2}=\gamma^{\text{\scalebox{0.7}{$ \intercal $}}}Y_{t+1}Y_{t+1}^{\intercal}\gamma$ and so $c(\gamma)=\gamma^{\text{\scalebox{0.7}{$ \intercal $}}}E[Y_{t+1}Y_{t+1}^{\intercal}]\gamma$ , we may write

[TABLE]

where we made use of some facts on the vec operator (AbadirandMagnus, p.282). In turn, by (D.4) and the definition of $\Gamma_{0}$ , we have

[TABLE]

where the second equality follows by the fact that $V_{n,\gamma_{0}}^{\Delta}$ converges to $V_{\gamma_{0}}^{\Delta}$ uniformly in $\gamma_{0}\in\Gamma_{0}$ with respect to the Hausdorff metric by Lemma D.5 and Lemma B.3 in Fang_Santos2014HDD, and the third equality by the facts that $G\operatorname{vec}(vu^{\intercal})=0$ for all $v\in\Gamma_{0}$ and all $u\in\mathbf{R}^{k}$ (to be proved shortly) and that the inside minimum can be attained in $V_{\gamma_{0}}^{\Delta}$ for all $\Delta$ large enough. Combining (D.2), (D.3) and (D.5) yields

[TABLE]

as desired. It remains to show $G\operatorname{vec}(vu^{\intercal})=0$ for all $v\in\Gamma_{0}$ and all $u\in\mathbf{R}^{k}$ . Fix $v\in\Gamma_{0}$ and $u\in\mathbb{R}^{k}$ . By similar arguments (in reverse order) that led to (D.4), we obtain

[TABLE]

Next, note that, by the law of iterated expectations, we have

[TABLE]

where the third inequality follows by the model specified in display (40) and Assumption 4.1(ii). Result (D) in turn implies that, for all $j=1,\ldots,m$ ,

[TABLE]

where $v^{\intercal}\Lambda=0$ because $v\in\Gamma_{0}=\{\gamma_{0}\in\mathbb{S}^{k}:\theta_{0}(\gamma_{0})=0\}$ which is equal to the intersection of $\mathbb{S}^{k}$ and the null space of $\Lambda^{\intercal}$ – see our discussions below Assumption 4.1. The claim now follows by combining (D.6) and (D.8). ∎

Remark D.1.

The derivative (45) can be rewritten as:

[TABLE]

where $\Gamma_{0}^{\perp}\equiv\{\lambda\in\mathbf{R}^{k}:\lambda^{\intercal}\gamma_{0}=0~{},\,\forall\,\gamma_{0}\in\Gamma_{0}\}$ denotes the orthogonal complement of $\Gamma_{0}$ . Then for $\hat{\Gamma}_{T,\perp}=\{\gamma\in\mathbf{R}^{k}:\sup_{\lambda\in\hat{\Gamma}_{T}}|\gamma^{\intercal}\lambda|\leq\kappa_{T}^{1/2}\}$ and $B_{T}\equiv\{v\in\mathbf{R}^{k}:\|v\|\leq\kappa_{T}^{-1/2}\}$ , we may estimate $\phi_{\theta_{0}}^{\prime\prime}(h)$ by

[TABLE]

Lemma D.2.

Under Assumptions 4.1 and 4.2, we have

[TABLE]

where $\mathbb{G}$ is a zero mean Gaussian process with the covariance functional satisfying: for any $\gamma_{1}$ , $\gamma_{2}\in\Gamma_{0}$ and $\mu_{z}=E[Z_{t}]$ ,

[TABLE]

Proof: By elementary rearrangements we have

[TABLE]

where $\hat{\mu}_{z}=\frac{1}{T}\sum_{t=1}^{T}Z_{t}$ , $\hat{c}(\gamma)=\frac{1}{T}\sum_{t=1}^{T}(\gamma^{\intercal}Y_{t+1})^{2}$ , and

[TABLE]

By Assumptions 4.1(vi) and 4.2, and the law of large numbers for stationary and ergodic sequences and the compactness of $\mathbb{S}^{k}$ , we have

[TABLE]

Once again by Assumptions 4.1(vi) and 4.2, together with $\sqrt{T}G_{T}(\gamma)=\sqrt{T}\tilde{G}\operatorname{vec}({\gamma\gamma^{\intercal}})$ where $\tilde{G}\in\mathbf{M}^{m\times k^{2}}$ having its $j$ th row given by $(\operatorname{vec}(\tilde{\Delta}_{j}))^{\intercal}$ for

[TABLE]

we have by the compactness of $\mathbb{S}^{k}$ that

[TABLE]

for some Gaussian process $\mathbb{G}(\gamma)$ . In particular, for $\gamma\in\Gamma_{0}$ the summand in $G_{T}(\gamma)$ is a martingale difference sequence, so for any $\gamma_{1}$ , $\gamma_{2}\in\Gamma_{0}$ , the covariance functional satisfies

[TABLE]

This completes the proof of the lemma. ∎

Lemma D.3.

Suppose Assumptions 4.1, 4.2 and 4.3 hold. Let $\hat{\phi}_{T}^{\prime\prime}$ be constructed as in (50). Then we have: whenever $h_{T}\to h$ as $T\rightarrow\infty$ for a sequence $\{h_{T}\}\subset\prod_{j=1}^{m}\ell^{\infty}(\mathbb{S}^{k})$ and $h\in\prod_{j=1}^{m}C(\mathbb{S}^{k})$ , it follows that

[TABLE]

Proof: Pick a sequence $\{h_{T}\}\subset\prod_{j=1}^{m}\ell^{\infty}(\mathbb{S}^{k})$ and $h\in\prod_{j=1}^{m}C(\mathbb{S}^{k})$ such that $h_{T}\to h$ as $T\to\infty$ . Define

[TABLE]

Then we have

[TABLE]

where “ $\lesssim$ ” follows from $h_{T}\to h$ , and the last step is by Assumptions 4.2 and 4.3.

Next, under Assumptions 4.1, 4.2 and 4.3, we have by Theorem 3.1 in CHT2007 that $d_{H}(\hat{\Gamma}_{T},\Gamma_{0})\xrightarrow{p}0$ as $T\to\infty$ , with $a_{T}=T$ , $b_{T}=\sqrt{T}$ , and $\hat{c}=T\kappa_{T}^{2}$ . Let

[TABLE]

Since $h\in\prod_{j=1}^{m}C(\mathbb{S}^{k})$ and $\mathbb{S}^{k}$ is compact, together with $d_{H}(\hat{\Gamma}_{T},\Gamma_{0})\xrightarrow{p}0$ , it follows that

[TABLE]

Since $\bar{\phi}_{T}^{\prime\prime}(h)$ is monotonically decreasing as $T\uparrow\infty$ , we further have

[TABLE]

The lemma then follows from results (D), (D) and (D.12). ∎

Proof of Proposition 4.2: By Lemmas D.2 and D.3, Assumptions 3.1 and 3.2, and the cdf of the weak limit being strictly increasing at $c_{1-\alpha}$ , we have $\hat{c}_{1-\alpha}\overset{p}{\to}c_{1-\alpha}$ following exactly the same proof of Corollary 3.2 in Fang_Santos2014HDD.212121Note $\phi_{\theta_{0}}^{\prime\prime}$ trivially admits a continuous extension on $\prod_{j=1}^{m}\ell^{\infty}(\mathbb{S}^{k})$ with the first min replaced by $\inf$ . Then under $\mathrm{H}_{0}$ , the conclusion follows from combining Proposition 4.1, Slutsky thoerem, ${c}_{1-\alpha}$ being a continuity point of the weak limit and the portmanteau theorem. ∎

Lemma D.4.

Let $\Lambda$ and $\Gamma_{1}^{\epsilon}$ be given as in the proof of Lemma D.1. Then under Assumption 4.1 and $\mathrm{H}_{0}$ , for all sufficiently small $\epsilon>0$ , we have

[TABLE]

where $\sigma_{\min}^{+}(\Lambda^{\intercal})$ denotes the smallest positive singular value of $\Lambda^{\intercal}$ .

Proof: To begin with, note that i) $\Gamma_{0}=\operatornamewithlimits{arg\,min}_{\gamma\in\mathbb{S}^{k}}\|\Lambda^{\intercal}\gamma\|$ by Assumption 4.1, ii) $\Gamma_{0}\neq\emptyset$ under the null, iii) $\sigma_{\min}^{+}(\Lambda^{\intercal})$ is well-defined by Assumption 4.1(i) so that $\Gamma_{0}\subsetneqq\mathbb{S}^{k}$ . Let $\Lambda^{\intercal}=P\Sigma Q^{\intercal}$ be the singular value decomposition of $\Lambda^{\intercal}$ , where $P\in\mathbf{M}^{p\times p}$ and $Q\in\mathbf{M}^{k\times k}$ are orthonormal, and $\Sigma\in\mathbf{M}^{p\times k}$ is a diagonal matrix with diagonal entries in descending order. Since $\Lambda$ is of full column rank, $\sigma_{\min}^{+}(\Lambda^{\intercal})$ is equal to the $p$ th diagonal entry of $\Sigma$ with $p<k$ .

Fix $\gamma\in\Gamma_{1}^{\epsilon}$ . Let $a_{\gamma}\equiv Q^{\intercal}\gamma$ and write $a_{\gamma}=[a_{\gamma}^{(1)\intercal},a_{\gamma}^{(2)\intercal}]^{\intercal}$ for $a_{\gamma}^{(1)}\in\mathbf{R}^{p}$ and $a_{\gamma}^{(2)}\in\mathbf{R}^{k-p}$ . Suppose first that $\|a_{\gamma}^{(2)}\|\neq 0$ . Then we have

[TABLE]

since $Q[0,a_{\gamma}^{(2)\intercal}]^{\intercal}/\|a_{\gamma}^{(2)}\|\in\Gamma_{0}$ by direct calculations. In turn, result (D.13) implies

[TABLE]

Moreover, we know from $Q\in\mathbf{M}^{k\times k}$ being orthonormal and $\gamma\in\mathbb{S}^{k}$ that

[TABLE]

Combining results (D.13) and (D.14) we may thus conclude that

[TABLE]

implying that $\|a_{\gamma}^{(1)}\|\geq\frac{\epsilon}{\sqrt{2}}$ . This also holds for all sufficiently small $\epsilon>0$ when $\|a_{\gamma}^{(2)}\|=0$ in which case $\|a_{\gamma}^{(1)}\|=1$ in view of (D.15). Consequently, we have

[TABLE]

for all sufficiently small $\epsilon>0$ . This completes the proof of the lemma. ∎

Lemma D.5.

Let $V_{n,\gamma_{0}}^{\Delta}$ and $V_{\gamma_{0}}^{\Delta}$ be defined as in the proof of Lemma D.1. Then $d_{H}(V_{n,\gamma_{0}}^{\Delta},V_{\gamma_{0}}^{\Delta})\to 0$ uniformly in $\gamma_{0}\in\Gamma_{0}$ as $n\to\infty$ .

Proof: First, note that $V_{n,\gamma_{0}}^{\Delta}=\{v\in\mathbf{R}^{k}:\gamma_{0}+\sqrt{t_{n}}v\in\mathbb{S}^{k}\text{ and }\|v\|\leq\Delta\}$ . For $u\in V_{n,\gamma_{0}}^{\Delta}$ , set $u^{*}\equiv u-(\gamma_{0}^{\intercal}u)\gamma_{0}$ . It is a simple exercise to verify that $u^{*}\in V_{\gamma_{0}}^{\Delta}$ . It follows that

[TABLE]

In turn, result (D.18) implies that: for all $\gamma_{0}\in\Gamma_{0}$ ,

[TABLE]

On the other hand, for $v\in V_{\gamma_{0}}^{\Delta}$ , set $v^{*}=v-b_{n}\gamma_{0}$ for $b_{n}=(1-\sqrt{1-t_{n}\|v\|})/\sqrt{t_{n}}$ if $\|v\|<\Delta$ , and $v^{*}=a_{n}v-b_{n}\gamma_{0}$ for $a_{n}=1-\sqrt{t_{n}}$ and $b_{n}=(1-\sqrt{1-t_{n}(1-\sqrt{t_{n}})^{2}\|v\|^{2}})/\sqrt{t_{n}}$ if $\|v\|=\Delta$ . In any case, $v^{*}\in V_{n,\gamma_{0}}^{\Delta}$ by direct calculations. Therefore,

[TABLE]

uniformly in $\gamma_{0}\in\Gamma_{0}$ , where we exploited the facts that $b_{n}=O(\sqrt{t_{n}})$ uniformly in $\gamma_{0}\in\Gamma_{0}$ and that $V_{\gamma_{0}}^{\Delta}$ is bounded. The lemma then follows from (D.19) and (D.20). ∎

Our final lemma shows the work in Section 4 is consistent with Dovonon_Renault2013testing in the case they studied when the weighting matrix is the identity matrix. We note that the essential difference between $G$ and $\bar{G}$ in (53) and (54) is: the former consists of the second order derivatives of the moment function with respect to all $k$ entries of $\gamma$ , whereas the latter the second order derivatives of the moment function with the $k$ -th entry $\gamma^{(k)}$ of $\gamma$ substituted by $\gamma^{(k)}=1-\sum_{j=1}^{k-1}\gamma^{(j)}$ .

Lemma D.6.

The limit $J^{W}$ with $W=I_{m}$ in Theorem 3.1 of Dovonon_Renault2013testing can be represented as: for $\mathbb{G}$ and $G$ defined in Section 4,

[TABLE]

Proof: First, note that by Dovonon_Renault2013testing, $J^{W}$ with $W=I_{m}$ can be represented as in (54) where $\mathbb{G}(\gamma_{0})$ is centered Gaussian with variance $E[(Z_{t}-E[Z_{t}])(Z_{t}-E[Z_{t}])^{\intercal}\{(\gamma_{0}^{\intercal}Y_{t+1})^{2}-E[(\gamma_{0}^{\intercal}Y_{t+1})^{2}]\}]$ . Next, simple algebra shows that

[TABLE]

where $A\equiv[I_{k-1},-\jmath_{k-1}]$ , where $\jmath_{k-1}$ is the $(k-1)\times 1$ vector of ones. It follows that

[TABLE]

as desired, where the second equality exploited the facts that $\theta_{0}(\gamma_{0})=0$ and that $G\mathrm{vec}(v\gamma_{0}^{\intercal})=0$ for any $v\in\mathbf{R}^{k}$ , and the third equality follows from the fact that the $(k-1)$ columns in $A^{\intercal}\in\mathbf{M}^{k\times(k-1)}$ and $\gamma_{0}$ form a basis for $\mathbf{R}^{k}$ . To see this last fact, note first that the columns of $A^{\intercal}$ are clearly linearly independent; moreover, if $\gamma_{0}=A^{\intercal}c^{\ast}$ for some nonzero $c^{\ast}\in\mathbf{R}^{k-1}$ , then $\gamma_{0}^{\intercal}\jmath_{k}=0$ by simple algebra, contradicting the linear normalization that $\sum_{j=1}^{k}\gamma_{0}^{(j)}\neq 0$ . ∎

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Inference on Functionals under First Order Degeneracy

Abstract

1 Introduction

2 Setup and Background

2.1 General Setup

2.2 Related Examples

Example 2.1** (Wald Functional: Squared Mean).**

Example 2.2** (Unconditional Moment Inequalities).**

Example 2.3** (Cramér-von Mises Functional).**

Example 2.4** (Stochastic Dominance).**

Example 2.5** (Conditional Moment Inequalities).**

Example 2.6** (Overidentification Test).**

2.3 Concepts of Differentiability

Definition 2.1**.**

Definition 2.2**.**

Remark 2.1**.**

2.3.1 Examples Revisited

Example 02.1 (Continued).

Example 02.6 (Continued).

2.4 Second Order Delta Method

Assumption 2.1**.**

Assumption 2.2**.**

Theorem 2.1**.**

Remark 2.2**.**

3 The Bootstrap

3.1 Bootstrap Setup

Assumption 3.1**.**

Assumption 3.2**.**

3.2 Failures of the Standard Bootstrap

Theorem 3.1**.**

3.3 The Babu Correction

Assumption 3.3**.**

Theorem 3.2**.**

3.4 A Modified Bootstrap

Assumption 3.4**.**

Theorem 3.3**.**

Remark 3.1**.**

3.5 Estimation of the Derivative

Assumption 3.5**.**

Proposition 3.1** (Hong_Li2015numericaldelta).**

3.5.1 Examples Revisited

Example 02.6 (Continued).

4 Application: Testing for Common CH Features

4.1 The Setup

Assumption 4.1**.**

4.2 A Modified JJJ Test

Assumption 4.2**.**

Proposition 4.1**.**

Assumption 4.3**.**

Proposition 4.2**.**

Remark 4.1**.**

4.3 Simulation Studies

5 Conclusion

References

Appendix A Local Analysis

A.1 Local Perturbations

Assumption A.1**.**

Definition A.1**.**

Assumption A.2**.**

Lemma A.1**.**

A.2 Local Size and Power

Assumption A.3**.**

Theorem A.1**.**

Remark A.1**.**

Remark A.2**.**

Appendix B Proofs of Main Results

Lemma B.1**.**

Lemma B.2**.**

Lemma B.3** (Extended Continuous Mapping Theorem).**

Appendix C Results for Examples 2.1 - 2.6

Example 2.2: Moment Inequalities

Example 2.3: Cramer-von Mises Functionals

Example 2.4: Stochastic Dominance

Lemma C.1**.**

Example 2.1 (Wald Functional: Squared Mean).

Example 2.2 (Unconditional Moment Inequalities).

Example 2.3 (Cramér-von Mises Functional).

Example 2.4 (Stochastic Dominance).

Example 2.5 (Conditional Moment Inequalities).

Example 2.6 (Overidentification Test).

Definition 2.1.

Definition 2.2.

Remark 2.1.

Assumption 2.1.

Assumption 2.2.

Theorem 2.1.

Remark 2.2.

Assumption 3.1.

Assumption 3.2.

Theorem 3.1.

Assumption 3.3.

Theorem 3.2.

Assumption 3.4.

Theorem 3.3.

Remark 3.1.

Assumption 3.5.

Proposition 3.1 (Hong_Li2015numericaldelta).

Assumption 4.1.

4.2 A Modified $J$ Test

Assumption 4.2.

Proposition 4.1.

Assumption 4.3.

Proposition 4.2.

Remark 4.1.

Assumption A.1.

Definition A.1.

Assumption A.2.

Lemma A.1.

Assumption A.3.

Theorem A.1.

Remark A.1.

Remark A.2.

Lemma B.1.

Lemma B.2.

Lemma B.3 (Extended Continuous Mapping Theorem).

Lemma C.1.

Lemma C.2.

Remark C.1.

Lemma C.3.

Remark C.2.

Lemma D.1.

Remark D.1.

Lemma D.2.

Lemma D.3.

Lemma D.4.

Lemma D.5.

Lemma D.6.