Are Unobservables Separable?

Andrii Babii; Jean-Pierre Florens

arXiv:1705.01654·math.ST·April 2, 2021

Are Unobservables Separable?

Andrii Babii, Jean-Pierre Florens

PDF

TL;DR

This paper develops a novel nonparametric test for the separability of unobservables in models with endogenous observables, using a nonseparable IV framework and applying it to US expenditure data.

Contribution

It introduces a new nonparametric test for unobservables separability based on a nonseparable IV model with a novel Donsker-type CLT for residuals.

Findings

01

Test rejects separability in Engel curves for most commodities.

02

Proposes a nonstandard distribution for the test statistic.

03

Uses a dataset from the 2015 US Consumer Expenditure Survey.

Abstract

It is common to assume in empirical research that observables and unobservables are additively separable, especially, when the former are endogenous. This is done because it is widely recognized that identification and estimation challenges arise when interactions between the two are allowed for. Starting from a nonseparable IV model, where the instrumental variable is independent of unobservables, we develop a novel nonparametric test of separability of unobservables. The large-sample distribution of the test statistics is nonstandard and relies on a novel Donsker-type central limit theorem for the empirical distribution of nonparametric IV residuals, which may be of independent interest. Using a dataset drawn from the 2015 US Consumer Expenditure Survey, we find that the test rejects the separability in Engel curves for most of the commodities.

Tables1

Table 1. Table 1 : Testing separability of Engel curves. The table shows m 𝑚 m out of n 𝑛 n bootstrap p-values of Kolmogorov-Smirnov and Cramér-von Mises tests for 13 commodities.

Commodity	KS	Commodity	KS	CvM
Food home	0.00	Gas and oil	0.00	0.00
Food away	0.00	Personal care	0.00	0.00
Clothing	0.00	Health	0.00	0.00
Tobacco	0.00	Insurance	0.00	0.00
Alcohol	0.00	Reading	0.00	0.53
Trips	0.00	Transportation	0.01	0.03
Entertainment	0.08

Equations296

Q = q \in R^{J} : P^{⊤} q = I argmax U (q, ε),

Q = q \in R^{J} : P^{⊤} q = I argmax U (q, ε),

Y = Φ (Z, ε), ε ⊥ ⊥ W,

Y = Φ (Z, ε), ε ⊥ ⊥ W,

Y = ψ (Z) + g (ε) .

Y = ψ (Z) + g (ε) .

r (w) ≜ \mathds E [Y ∣ W = w] f_{W} (w) = \int φ (z) f_{Z W} (z, w) d z ≜ (T φ) (w),

r (w) ≜ \mathds E [Y ∣ W = w] f_{W} (w) = \int φ (z) f_{Z W} (z, w) d z ≜ (T φ) (w),

H_{0} : U ⊥ ⊥ W vs. U \neq ⊥ ⊥ W .

H_{0} : U ⊥ ⊥ W vs. U \neq ⊥ ⊥ W .

H^{s} (R^{p}) = {f \in L_{2} (R^{p}) : ∥ f ∥_{s} ≜ ∥ L^{s} f ∥ < \infty};

H^{s} (R^{p}) = {f \in L_{2} (R^{p}) : ∥ f ∥_{s} ≜ ∥ L^{s} f ∥ < \infty};

\overset{r}{^} (w) = \frac{1}{n h _{n}^{q}} i = 1 \sum n Y_{i} K_{w} (h_{n}^{- 1} (W_{i} - w)), (\hat{T} ϕ) (w) = \int ϕ (z) \hat{f}_{Z W} (z, w) d z,

\overset{r}{^} (w) = \frac{1}{n h _{n}^{q}} i = 1 \sum n Y_{i} K_{w} (h_{n}^{- 1} (W_{i} - w)), (\hat{T} ϕ) (w) = \int ϕ (z) \hat{f}_{Z W} (z, w) d z,

\hat{f}_{Z W} (z, w) = \frac{1}{n h _{n}^{p + q}} i = 1 \sum n K_{z} (h_{n}^{- 1} (Z_{i} - z)) K_{w} (h_{n}^{- 1} (W_{i} - w)),

\overset{φ}{^} = ϕ arg min \hat{T} ϕ - \overset{r}{^}^{2} + α_{n} ∥ ϕ ∥_{s}^{2},

\overset{φ}{^} = ϕ arg min \hat{T} ϕ - \overset{r}{^}^{2} + α_{n} ∥ ϕ ∥_{s}^{2},

\overset{φ}{^} = L^{- s} (α_{n} I + \hat{T}_{s}^{*} \hat{T}_{s})^{- 1} \hat{T}_{s}^{*} \overset{r}{^},

\overset{φ}{^} = L^{- s} (α_{n} I + \hat{T}_{s}^{*} \hat{T}_{s})^{- 1} \hat{T}_{s}^{*} \overset{r}{^},

\hat{F}_{\hat{U} W} (u, w) = \frac{1}{n} i = 1 \sum n \mathbbm 1_{{\hat{U}_{i} \leq u, W_{i} \leq w}}, \hat{F}_{\hat{U}} (u) = \frac{1}{n} i = 1 \sum n \mathbbm 1_{{\hat{U}_{i} \leq u}}, \hat{F}_{W} (w) = \frac{1}{n} i = 1 \sum n \mathbbm 1_{{W_{i} \leq w}}

\hat{F}_{\hat{U} W} (u, w) = \frac{1}{n} i = 1 \sum n \mathbbm 1_{{\hat{U}_{i} \leq u, W_{i} \leq w}}, \hat{F}_{\hat{U}} (u) = \frac{1}{n} i = 1 \sum n \mathbbm 1_{{\hat{U}_{i} \leq u}}, \hat{F}_{W} (w) = \frac{1}{n} i = 1 \sum n \mathbbm 1_{{W_{i} \leq w}}

G_{n} (u, w) = n (\hat{F}_{\hat{U} W} (u, w) - \hat{F}_{\hat{U}} (u) \hat{F}_{W} (w)) .

G_{n} (u, w) = n (\hat{F}_{\hat{U} W} (u, w) - \hat{F}_{\hat{U}} (u) \hat{F}_{W} (w)) .

G_{n} (u, w) = \frac{1}{n} i = 1 \sum n \mathbbm 1_{{U_{i} \leq u, W_{i} \leq w}} - \mathbbm 1_{{U_{i} \leq u}} F_{W} (w) - \mathbbm 1_{{W_{i} \leq w}} F_{U} (u) + F_{U W} (u, w) + δ_{u, w} (U_{i}, W_{i}) + o_{P} (1)

G_{n} (u, w) = \frac{1}{n} i = 1 \sum n \mathbbm 1_{{U_{i} \leq u, W_{i} \leq w}} - \mathbbm 1_{{U_{i} \leq u}} F_{W} (w) - \mathbbm 1_{{W_{i} \leq w}} F_{U} (u) + F_{U W} (u, w) + δ_{u, w} (U_{i}, W_{i}) + o_{P} (1)

δ_{u, w} (U_{i}, W_{i})

δ_{u, w} (U_{i}, W_{i})

ρ (u, z, w)

T_{2, n} = \iint ∣ G_{n} (u, w) ∣^{2} d \hat{F}_{\hat{U} W} (u, w) and T_{\infty, n} = u, w sup ∣ G_{n} (u, w) ∣.

T_{2, n} = \iint ∣ G_{n} (u, w) ∣^{2} d \hat{F}_{\hat{U} W} (u, w) and T_{\infty, n} = u, w sup ∣ G_{n} (u, w) ∣.

H_{n} (u, w) = \frac{1}{n} i = 1 \sum n h_{u, w} (U_{i}, W_{i}) - \mathds E [h_{u, w} (U_{i}, W_{i})],

H_{n} (u, w) = \frac{1}{n} i = 1 \sum n h_{u, w} (U_{i}, W_{i}) - \mathds E [h_{u, w} (U_{i}, W_{i})],

H_{n} ⇝ H in L_{\infty} (R \times R^{q}),

H_{n} ⇝ H in L_{\infty} (R \times R^{q}),

(u, w, u^{'}, w^{'}) \mapsto \mathds E [(h_{u, w} (U, W) - \mathds E [h_{u, w} (U, W)]) (h_{u^{'}, w^{'}} (U, W) - \mathds E [h_{u^{'}, w^{'}} (U, W)])] .

(u, w, u^{'}, w^{'}) \mapsto \mathds E [(h_{u, w} (U, W) - \mathds E [h_{u, w} (U, W)]) (h_{u^{'}, w^{'}} (U, W) - \mathds E [h_{u^{'}, w^{'}} (U, W)])] .

(u, w, u^{'}, w^{'}) \mapsto

(u, w, u^{'}, w^{'}) \mapsto

\times (\mathbbm 1_{{U \leq u^{'}, W \leq w^{'}}} - \mathbbm 1_{{U \leq u^{'}}} F_{W} (w^{'}) - \mathbbm 1_{{W \leq w^{'}}} F_{U} (u^{'}) + F_{U W} (u^{'}, w^{'}) + δ_{u^{'}, w^{'}} (U, W))] .

d_{2}

d_{2}

H_{1, n} : F_{U W} (u, w) = F_{U} (u) F_{W} (w) + n^{- 1/2} H (u, w), \forall u, w,

H_{1, n} : F_{U W} (u, w) = F_{U} (u) F_{W} (w) + n^{- 1/2} H (u, w), \forall u, w,

T_{2, n} ⇝ \iint ∣ H (u, w) ∣^{2} d F_{U W} (u, w) and T_{\infty, n} ⇝ u, w sup ∣ H (u, w) ∣,

T_{2, n} ⇝ \iint ∣ H (u, w) ∣^{2} d F_{U W} (u, w) and T_{\infty, n} ⇝ u, w sup ∣ H (u, w) ∣,

T_{2, n} ⇝ \iint ∣ H (u, w) + 2 H (u, w) ∣^{2} d F_{U W} (u, w) and T_{\infty, n} ⇝ u, w sup ∣ H (u, w) + 2 H (u, w) ∣.

T_{2, n} ⇝ \iint ∣ H (u, w) + 2 H (u, w) ∣^{2} d F_{U W} (u, w) and T_{\infty, n} ⇝ u, w sup ∣ H (u, w) + 2 H (u, w) ∣.

Y = φ (Z) + θ Z U + U, Z W U \sim_{i . i . d .} N 000, 1 0.4 0.3 0.4 10 0.3 01 .

Y = φ (Z) + θ Z U + U, Z W U \sim_{i . i . d .} N 000, 1 0.4 0.3 0.4 10 0.3 01 .

T_{\infty, n} = u, w sup ∣ G_{n} (u, w) ∣ and T_{2, n} = \iint ∣ G_{n} (u, w) ∣^{2} d \hat{F}_{\hat{U} W} (u, w),

T_{\infty, n} = u, w sup ∣ G_{n} (u, w) ∣ and T_{2, n} = \iint ∣ G_{n} (u, w) ∣^{2} d \hat{F}_{\hat{U} W} (u, w),

∥ \overset{φ}{^} - φ ∥_{c}^{2} ≲_{P} α_{n}^{- \frac{a + c}{a + s}} \overset{r}{^} - \hat{T} φ^{2} + α_{n}^{\frac{b - c}{a + s}} .

∥ \overset{φ}{^} - φ ∥_{c}^{2} ≲_{P} α_{n}^{- \frac{a + c}{a + s}} \overset{r}{^} - \hat{T} φ^{2} + α_{n}^{\frac{b - c}{a + s}} .

∥ \overset{φ}{^} - φ ∥_{c}^{2} = O_{P} (α_{n}^{- \frac{a + c}{a + s}} (\frac{1}{n h _{n}^{q}} + h_{n}^{2 t}) + α_{n}^{\frac{b - c}{a + s}}) .

∥ \overset{φ}{^} - φ ∥_{c}^{2} = O_{P} (α_{n}^{- \frac{a + c}{a + s}} (\frac{1}{n h _{n}^{q}} + h_{n}^{2 t}) + α_{n}^{\frac{b - c}{a + s}}) .

n (\hat{F}_{\hat{U}} (u) - F_{U} (u)) = \frac{1}{n} i = 1 \sum n {\mathbbm 1_{{U_{i} \leq u}} - F_{U} (u) + U_{i} [T (T^{*} T)^{- 1} f_{U Z} (u, .)] (W_{i})} + o_{P} (1)

n (\hat{F}_{\hat{U}} (u) - F_{U} (u)) = \frac{1}{n} i = 1 \sum n {\mathbbm 1_{{U_{i} \leq u}} - F_{U} (u) + U_{i} [T (T^{*} T)^{- 1} f_{U Z} (u, .)] (W_{i})} + o_{P} (1)

n (\hat{F}_{\hat{U}} (u) - F_{U} (u)) = n (\hat{F}_{U} (u) - F_{U} (u)) + n (Pr (U \leq u + \hat{Δ} (Z) ∣ X) - F_{U} (u)) + o_{P} (1) .

n (\hat{F}_{\hat{U}} (u) - F_{U} (u)) = n (\hat{F}_{U} (u) - F_{U} (u)) + n (Pr (U \leq u + \hat{Δ} (Z) ∣ X) - F_{U} (u)) + o_{P} (1) .

n (Pr (U \leq u + \hat{Δ} (Z) ∣ X) - Pr (U \leq u))

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Are Unobservables Separable?

Andrii Babii

UNC Chapel Hill Department of Economics, University of North Carolina–Chapel Hill - Gardner Hall, CB 3305 Chapel Hill, NC 27599-3305. Email: [email protected].

Jean-Pierre Florens

Toulouse School of Economics Department of Economics, Toulouse School of Economics - 1, Esplanade de l’Université, 31080 Toulouse Cedex 06, France.

Abstract

It is common to assume in empirical research that observables and unobservables are additively separable, especially, when the former are endogenous. This is done because it is widely recognized that identification and estimation challenges arise when interactions between the two are allowed for. Starting from a nonseparable IV model, where the instrumental variable is independent of unobservables, we develop a novel nonparametric test of separability of unobservables. The large-sample distribution of the test statistics is nonstandard and relies on a novel Donsker-type central limit theorem for the empirical distribution of nonparametric IV residuals, which may be of independent interest. Using a dataset drawn from the 2015 US Consumer Expenditure Survey, we find that the test rejects the separability in Engel curves for most of the commodities.

Keywords: unobservables, endogeneity, separability test, Engel curves, heterogeneity in unobservables, distribution of nonparametric IV residuals.

1 Introduction

It is common to assume in empirical research that observables and unobservables are additively separable, especially when the former are endogenous. This is done because it is widely recognized that identification and estimation challenges arise when interactions between the two are allowed for. However, the economic theory and considerations often lead to nonseparable models. Prominent examples are demand functions, where the price or income effects might be heterogeneous in unobserved preferences; production functions, where observed input choices may be heterogeneous in input choices unobserved by the econometrician; labor supply functions with heterogeneous wage effects; wage equations, where the returns to schooling might vary with unobserved ability; or more generally, treatment effect models, where causal effects are heterogeneous in unobservables.

In response to these empirical challenges, there is a growing literature studying the nonparametric identification of nonseparable models with endogeneity; see Chernozhukov and Hansen (2005), Chernozhukov, Imbens, and Newey (2007), Florens, Heckman, Meghir, and Vytlacil (2008), Imbens and Newey (2009), Torgovitsky (2015), and D’Haultfouille and Février (2015) among many others. It is well-understood that the fully nonparametric estimation of a nonseparable model may lead to a difficult nonlinear ill-posed inverse problem; see Carrasco, Florens, and Renault (2007), Horowitz and Lee (2007), Gagliardini and Scaillet (2012), and Dunker, Florens, Hohage, Johannes, and Mammen (2014).

Since a fully nonparametric estimation of a nonseparable model is more challenging and since separable models rule out the heterogeneity of marginal effects in unobservables, detecting separability is desirable in empirical applications. If the separability is rejected, then the more sophisticated nonseparable models should not be neglected, while if it turns out that the structural relation is separable, then the conventional empirical practice could be well-justified.

Despite the significant efforts focused on understanding the identification and the estimation of nonseparable IV models and the widespread use of separable IV models in empirical practice, little work has been done on developing formal testing procedures that could discriminate empirically between the two. Lu and White (2014) and Su, Tu, and Ullah (2015) are notable exceptions that develop separability tests under the conditional independence restriction and additional identifying restrictions imposed by the nonseparable model. The conditional independence restriction is different from the mean-independence restriction imposed by the separable nonparametric IV model and does not allow justifying the separable nonparametric IV model that we are interested in here. Other recent specification tests for the nonseparable model include the monotonicity test of Hoderlein, Su, White, and Yang (2016), the endogeneity test of Fève, Florens, and Van Keilegom (2018), and the specification test for the quantile IV regression of Breunig (2020).

In this paper, we design a novel fully nonparametric separability test. Our test is based on the independence condition of the nonseparable model and does not rely on additional identifying restrictions, such as the monotonicity in unobservable. The test is based on the insight that the structural function in the separable model can be estimated using the nonparametric IV approach; see Florens (2003), Newey and Powell (2003), Hall and Horowitz (2005), Blundell, Chen, and Kristensen (2007), and Darolles, Fan, Florens, and Renault (2011). If the separable model is correct, then the nonparametric IV residuals should approximate unobservables that are independent of the instrumental variables in the nonseparable IV model. This intuition suggests that it should be possible to detect the separability with the classical Kolmogorov-Smirnov or Cramér-von Mises independence tests between the nonparametric IV residuals and the instrumental variable. To the best of our knowledge, no such test is currently available in the literature, and it is not known whether the empirical distribution of the nonparametric IV residuals satisfies the Donsker property.

Formalizing this intuition is far from trivial since the regression residuals are different from the true regression errors and the nonparametric IV regression is an example of a linear ill-posed inverse problem and requires regularization. Moreover, the empirical distribution function of the nonparametric IV residuals is a non-smooth function of the estimated nonparametric IV regression. The weak convergence of the empirical distribution of regression residuals in the parametric linear case is a classical problem in statistics; see, e.g., Durbin (1973), Loynes (1980), and Mammen (1996). The extension to the nonparametric regression is more challenging, and it is remarkable that the empirical distribution of nonparametric regression residuals still converges weakly as was shown in Akritas and Van Keilegom (2001). The additively separable nonparametric IV regression differs from the problems discussed above in two important directions. First, its finite-sample and the asymptotic performance depend both on the smoothness of the regression function and the smoothing properties of the conditional expectation operator. Second, it features an additional dependence between the endogenous regressor and the regression error that cannot be neglected in practice.

In this paper, we show that the empirical distribution function of nonparametric IV residuals converges weakly to a Gaussian process at a parametric rate, even though residuals are obtained from the nonparametrically estimated ill-posed inverse problem. To the best of our knowledge, this is the first result on the distribution of the nonparametric IV residuals, which can be used to develop various residual-based specification tests and is of independent interest. Building on this result, we obtain the large sample approximation to the distributions of independence separability tests. The distributions of residual-based independence tests are non-standard and not amenable to standard bootstrap approximations. Therefore, we suggest using the $m$ out of $n$ bootstrap or subsampling to compute the critical values.

Our results are based on the insight that the Tikhonov regularization in Sobolev spaces, considered in Florens, Johannes, and Van Bellegem (2011), Gagliardini and Scaillet (2012), Carrasco, Florens, and Renault (2014), and Gagliardini and Scaillet (2017), among others, provides a natural link between the modern empirical process theory and the theory of ill-posed inverse problems. In regards to this literature, we obtain new results for the Tikhonov regularization with a Sobolev penalty that can be applied to generic ill-posed inverse problems, including various nonparametric IV estimators, e.g., based on kernel smoothing. In particular, the Tikhonov regularization with a Sobolev penalty achieves sufficiently fast convergence rates for the semiparametric theory. In contrast, the simple one-step Tikhonov regularization without Sobolev penalization suffers from the well-known saturation effects; see Darolles, Fan, Florens, and Renault (2011).

The paper is organized as follows. In Section 2, we present two motivating examples, where economic considerations lead to nonseparable models with endogeneity and discuss a testable implication of separability. In Section 3, we characterize the large sample approximation to the distribution of the residual-based Kolmogorov-Smirnov and Cramér-von Mises independence tests and introduce a resampling procedure to compute the critical values. We also study the behavior of these tests under the fixed and the local alternative hypotheses. We report on a Monte Carlo study in Section 4 which provides insights about the validity of our asymptotic approximations in finite samples. In Section 5, we test the separability of Engel curves for a large set of commodities and find that the separability is rejected most of the time. Conclusions appear in Section 6. All technical details, auxiliary results, and proofs are collected in the Appendix and the Supplementary Material.

2 Separability of unobservables

2.1 Motivating examples

The instrumental variable models with additively separable unobservables constitute a workhorse of modern empirical practice. However, the additive separability of unobservables is a restrictive modeling assumption that essentially rules out the heterogeneity of estimated causal structural effects in unobservables; see, e.g., Heckman (2001) or Imbens (2010). Indeed, the structural economic models typically lead to nonseparable unobservables as illustrated below.

Example 2.1 (Demand function).

Consider a random utility maximization problem

[TABLE]

where $U(.,.)$ is a utility function, $Q$ is a vector of demanded quantities, $\varepsilon$ is an individual preference variable, unobserved by the econometrician, $P$ is a vector of prices, and $I$ is the income. The solution to this optimization problem leads to the nonseparable demand functions $Q_{j}=\Phi(P,I,\varepsilon)$ for each good $j=1,\dots,J$ as shown in Brown and Walker (1989) and Lewbel (2001); see also Hoderlein and Vanhems (2018) for the welfare analysis based on the nonseparable model. The nonseparable demand functions may lead in turn to the nonseparable Engel curves.

Example 2.2 (Production function/frontier).

Simar, Vanhems, and Van Keilegom (2016)** consider a production process with unobserved heterogeneity that leads to the production function/frontier $\phi$ such that $Y=\phi(Z,\varepsilon)-U$ , where $Y$ is an output, $Z$ are observed inputs, $\varepsilon$ is an environmental factor, and $U\geq 0$ is a measure of inefficiency. In this example, the nonseparable model is generated by the fact that the environmental factor is taken into account along with other input choices by firms, and, at the same time, the former is not observed by the econometrician.

2.2 A testable implication

Let $(Y,Z,W)$ be observed random variables admitting a nonseparable representation

[TABLE]

where $Y\in\mathbf{R}$ is outcome, $Z\in\mathbf{R}^{p}$ are regressors, $\varepsilon\in\mathbf{R}$ is unobservable, $W\in\mathbf{R}^{q}$ is a vector of instrumental variables, and $\Phi:\mathbf{R}^{p}\times\mathbf{R}\to\mathbf{R}$ is a structural function. We assume that $W$ are valid instrumental variables satisfying the exclusion restriction, $\varepsilon\perp\!\!\!\perp W$ , and the relevance condition, $W\not\perp\!\!\!\perp Z$ . Note that the independence exclusion restriction is a commonly used identifying condition for nonseparable models; see Chernozhukov, Fernández-Val, Newey, Stouli, and Vella (2020), Blundell, Horowitz, and Parey (2017), Torgovitsky (2017), Torgovitsky (2015), D’Haultfouille and Février (2015), Dunker, Florens, Hohage, Johannes, and Mammen (2014), Gagliardini and Scaillet (2012), and Horowitz and Lee (2007) for recent examples and applications, as well as Chernozhukov and Hansen (2013), Matzkin (2013), and Imbens (2010) for the review of earlier econometrics literature on the identification of nonseparable models.

The independence condition $\varepsilon\perp\!\!\!\perp W$ does not rule-out the heteroskedasticity in the distribution of $Y$ conditionally on $Z$ or $W$ , which is often observed in the empirical practice. It also does not rule-out the heteroskedasticity in the distribution of unobservables $\varepsilon$ conditionally on covariates $Z$ . However, it rules out the heteroskedasticity of unobservables conditionally on the instrumental variable, which could be less restrictive, since the instrumental variable is univariate in typical applications. This leads to an interesting trade-off between the heterogeneity of causal structural effects in unobservables allowed for in the nonseparable model and the heteroskedasticity of unobservables conditionally on the instrumental variable allowed for in the separable model.

To develop the separability test, several strategies can be adopted. For instance, one could nonparametrically estimate the nonseparable model and check whether the separability holds. This approach corresponds to the principle behind the Wald test for parametric models. Alternatively, since the nonparametric identification and estimation of the separable model is easier, one could estimate the separable model and check the independence condition of the nonseparable model. This approach corresponds to the principle behind Rao’s score test in the parametric setting and is the one adopted in this paper.

We say that the model in equation (1) has a separable representation if there exists measurable functions $\psi:\mathbf{R}^{p}\to\mathbf{R}$ and $g:\mathbf{R}\to\mathbf{R}$ such that

[TABLE]

If the model has a separable representation, then the structural function can be estimated consistently using the nonparametric IV approach; see Darolles, Fan, Florens, and Renault (2011), Blundell, Chen, and Kristensen (2007), Horowitz and Lee (2007), and Newey and Powell (2003). The nonparametric IV regression function $\varphi:\mathbf{R}^{p}\to\mathbf{R}$ solves the functional equation

[TABLE]

where $T:L_{2}(\mathbf{R}^{p})\to L_{2}(\mathbf{R}^{q})$ is an integral operator. Let $U\triangleq Y-\varphi(Z)$ be the nonparametric IV regression error. Note that even if the model is nonseparable, we still have $\mathds{E}[U|W]=0$ with $U=Y-\varphi(Z)$ for $\varphi$ solving the functional equation (2). The following result provides a convenient for us testable implication of separability, provided that $U$ is unambiguously defined, see Appendix for a formal proof.

Proposition 2.1.

Suppose that there exists a unique solution to equation (2). If the model in equation (1) admits a separable representation, then $U\perp\!\!\!\perp W$ .

It is worth mentioning that the independence between $U$ and $W$ is only a testable implication of additive separability of unobservables. However, when the model is nonseparable, we have $U=\Phi(Z,\varepsilon)-\varphi(Z)\triangleq h(Z,\varepsilon)$ , for some non-degenerate function $h$ of $(Z,\varepsilon)$ , which in many cases is not independent of $W$ , because $Z\not\perp\!\!\!\perp W$ by the relevance condition. Therefore, the independence test between $U$ and $W$ will have power against many interesting deviations from the separability. Note also that Proposition 2.1 relies on the injectivity of $T$ , which is known as a completeness condition, see Newey and Powell (2003) and Babii and Florens (2020), and does not require that the nonseparable model is identified; see, e.g., Chernozhukov and Hansen (2005) and Chen, Chernozhukov, Lee, and Newey (2014). Lastly, note that the additive separability is different from the multiplicative separability when $Y=\psi(Z)g(\varepsilon)$ . However, when $Y,\psi(Z)$ , and $g(\varepsilon)$ are positive, we obtain the additively separable model after taking logs.

3 Independence test

In this section, we introduce tests of the independence condition characterized in Proposition 2.1. Formally, we focus on testing

[TABLE]

$H_{0}$ is testable, provided that the nuisance parameter $\varphi$ in $U=Y-\varphi(Z)$ is replaced by the appropriate estimator.

3.1 Tikhonov regularization in Sobolev spaces

We focus on the Tikhonov-regularized estimator penalized by the Sobolev norm to estimate the nuisance parameter $\varphi$ ; see Carrasco, Florens, and Renault (2014), Gagliardini and Scaillet (2012), and Florens, Johannes, and Van Bellegem (2011). The attractive feature of this estimator is that it does not suffer from the well-known saturation bias and can achieve a sufficiently fast convergence rate for our asymptotic theory and more generally for semiparametric applications; see Corollary A.1.1 in the Appendix.

Let $(L_{2}(\mathbf{R}^{p}),\|.\|)$ denote the space of functions square-integrable with respect to the Lebesgue measure. Let $\langle x\rangle^{s}\triangleq(1+|x|^{2})^{s/2}$ be a polynomial weight function with $s\in\mathbf{R}$ , where $x\in\mathbf{R}^{p}$ and $|.|$ is a Euclidean norm on $\mathbf{R}^{p}$ . Consider the operator $L^{s}f=F^{-1}(\langle.\rangle^{s}Ff)$ defined for all $f$ such that $\|\langle.\rangle^{2}Ff\|<\infty$ , where $F$ is a Fourier transform on $L_{2}(\mathbf{R}^{p})$ with scaling $(2\pi)^{-p/2}$ . Then the self-adjoint operator $L$ generates a Hilbert scale of Sobolev spaces

[TABLE]

see Krein and Petunin (1966) for more details on Banach and Hilbert scales.

Let $(\hat{T},\hat{r})$ be the kernel estimators of $(T,r)$ in equation (2) computed as

[TABLE]

where $K_{z}:\mathbf{R}^{p}\to\mathbf{R}$ and $K_{w}:\mathbf{R}^{q}\to\mathbf{R}$ are kernel functions and $h_{n}\to 0$ is a sequence of bandwidth parameters.

We estimate $\varphi$ using the Tikhonov-regularized estimator penalized by the Sobolev norm with $s\geq 0$

[TABLE]

where $\hat{T}$ and $\hat{r}$ are as described above. It is easy to see that this problem has a closed-form solution

[TABLE]

where $\hat{T}_{s}=\hat{T}L^{-s}$ and $\hat{T}_{s}^{*}$ is the adjoint operator to $\hat{T}_{s}$ .

3.2 Distribution of statistics

Let $\hat{U}_{i}=Y_{i}-\hat{\varphi}(Z_{i})$ be the nonparametric IV residuals and let

[TABLE]

be the empirical distribution functions. To test $H_{0}$ , we focus on the following residual-based independence empirical process

[TABLE]

Note that this process involves residuals $\hat{U}_{i}$ instead of the true regression errors $U_{i}$ , hence, its asymptotic behavior can be significantly different from the asymptotic behavior of classical independence empirical processes; see van der Vaart and Wellner (1996), Chapter 3.8. In particular, the estimation of the nuisance component $\varphi$ may affect the asymptotic distribution of the independence empirical process.

To understand the behavior of $\mathbb{G}_{n}$ , we introduce several assumptions.

Assumption 3.1.

For some $a,b>0$

(i)

Operator smoothing: $\|T\phi\|_{v}\sim\|\phi\|_{v-a}$ for all $\phi\in L_{2}(\mathbf{R}^{p})$ and $v\in\mathbf{R}$ . 2. (ii)

Parameter smoothness: $\varphi\in H^{b}(\mathbf{R}^{p})$ .

Assumption 3.1 (i) describes the smoothing property of the operator $T$ . Roughly speaking, the action of $T$ increases the Sobolev smoothness by $a$ , which is called the degree of ill-posedness. Intuitively, the more $T$ smooths out features of $\varphi$ , the harder it is to recover it from the equation (2). Condition (ii) describes the smoothness of the structural function $\varphi$ and is a standard smoothness restriction in the nonparametric literature.

Assumption 3.2.

(i) $(Y_{i},Z_{i},W_{i})_{i=1}^{n}$ are i.i.d. observations of $(Y,Z,W)$ with $\mathds{E}|Y|^{2}<\infty$ $\mathds{E}\|W\|<\infty$ , $\mathds{E}\|Z\|<\infty$ , and $\mathds{E}\left[U^{2}|W\right]\leq C<\infty$ ; (ii) the distribution of $(U,Z,W)$ is absolutely continuous with respect to the Lebesgue measure with densities $f_{Z},f_{W},f_{ZW},f_{U|Z}\in L_{\infty}$ and $f_{Z},f_{ZW}\in L_{2}$ ; (iii) $f_{ZW}\in H^{t}(\mathbf{R}^{p+q})$ for some $t>0$ ; (iv) $K_{z}$ and $K_{w}$ products of a univariate continuous kernel $K\in L_{2}(\mathbf{R})\cap L_{\infty}(\mathbf{R})$ of bounded variation with $\int K(u)\mathrm{d}u=1$ , $\int|u|^{l}|K(u)|\mathrm{d}u<\infty$ , and $\int u^{k}K(u)\mathrm{d}u=0$ for $k\in\{1,\dots,l\}$ and $l\geq t$ .

Assumption 3.2 describes several mild conditions on the distribution of the data and the kernel functions that are largely standard for kernel estimators; see also Darolles, Fan, Florens, and Renault (2011), Appendix B for a discussion of generalized boundary kernels that can be used when supports are bounded. To introduce the next assumption, let $\partial_{u}$ be a partial derivative with respect to the variable $u$ , let $\|.\|_{\infty}$ denote the uniform norm, and put $x\vee y=\max\{x,y\}$ and $x\wedge y=\min\{x,y\}$ .

Assumption 3.3.

(i) $\|\partial_{u}f_{UZ}\|_{\infty}<\infty$ and $\sup_{u}\|f_{UZ}(u,.)\|_{\kappa}<\infty$ with $\kappa>2a\vee(a+q/2)$ ; (ii) $\left\|\int_{\{v\leq.\}}\partial_{u}f_{UZW}(.,.v)\mathrm{d}v\right\|_{\infty}<\infty$ and $\sup_{u,w}\left\|\int^{w}f_{UZW}(u,.,v)\mathrm{d}v\right\|_{\kappa}<\infty$ with $\kappa>2a\vee(a+q/2)$ ;

Assumption 3.3 imposes some relatively mild smoothness conditions on the distribution of the data.

Assumption 3.4.

$h_{n}\to 0$ * and $\alpha_{n}\to 0$ as $n\to\infty$ are such that (i) $nh_{n}^{q}\alpha_{n}^{2(a+c)/(a+b)}\to\infty$ , $nh_{n}^{p+q}\alpha_{n}\to\infty$ , and $h_{n}^{2t}/\alpha_{n}\to 0$ ; (ii) $\sqrt{n}\alpha_{n}^{2b/(a+b)}\to 0$ , $\sqrt{n}h_{n}^{q}\alpha^{2a/(a+b)}\to\infty$ , and $\sqrt{n}h_{n}^{2t}/\alpha_{n}^{2a/(a+b)}\to 0$ ; (iii) $n\alpha_{n}^{2}\to 0$ , $nh_{n}^{2(b\wedge 2t)}\to 0$ , and $nh_{n}^{p+2q}\to\infty$ ; where $2s=b-a\geq 0$ , $b>c$ , $s\geq c>p/2$ , $t>(p+q)/2$ , $a,b,t,p,q$ are as in Assumptions 3.1 and 3.2.*

Assumption 3.3 (i) provides a set of sufficient conditions for $\|\hat{\varphi}-\varphi\|_{c}=o_{P}(1)$ with $c>p/2$ , while condition (ii) states additional requirements for $\|\hat{\varphi}-\varphi\|=o_{P}(n^{-1/4})$ ; see Corollary A.1.1 in the Appendix. The former condition is needed for the asymptotic equicontinuity argument, while the latter requires that the nuisance parameter $\varphi$ is estimated at a sufficiently fast rate, which is often encountered in the semiparametric literature. Lastly, condition (iii) ensures that a certain uniform asymptotic expansion holds. To illustrate that conditions on tuning parameters are feasible, suppose for simplicity that $p=q=c=1$ and that $h_{n}\sim n^{-c_{1}}$ and $\alpha_{n}\sim n^{-c_{2}}$ for some $c_{1},c_{2}\in(0,1)$ . Then (i) requires that $c_{1}+2c_{2}(a+c)/(a+b)<1$ , $2c_{1}+c_{2}<1$ , and $t>c_{2}/2c_{1}$ . For (ii), we additionally need $c_{2}>(a+b)/4b$ , $c_{1}+2c_{2}a/(a+b)<0.5$ , and $t>1/4c_{1}+c_{2}a/c_{1}(a+b)$ . Lastly, (iii) requires that $c_{2}>0.5$ , $c_{1}>1/2(b\wedge 2t)$ , and $c_{1}<1/3$ . Therefore, we require $(c_{1},c_{2})\in\{(x,y)\in\mathbf{R}^{2}:0.5<y<1-2x,x\in(1/2(b\wedge 2t),1/3)\}$ , which is non-empty provided that $b\wedge 2t>3/2$ . Given this choice, the following smoothness conditions are imposed in Assumption 3.4: $t>[1/4c_{1}+c_{2}a/c_{1}(a+b)]\wedge[c_{2}/2c_{1}]$ and $b>[2c_{2}(a+c)/(1-c_{1})-a]\wedge[2c_{2}a/(0.5-c_{1})-a]\wedge[1/(c_{2}-0.25)]$ .

The following result describes a convenient for us approximation to the residual-based independence empirical process:

Theorem 3.1.

Suppose that Assumptions 3.1, 3.2, 3.3, and 3.4 are satisfied. Then

[TABLE]

uniformly over $(u,w)\in\mathbf{R}\times\mathbf{R}^{q}$ with

[TABLE]

It if worth mentioning that Theorem 3.1 does not require $U\perp\!\!\!\perp W$ . The proof of this result can be found in the Appendix and relies on the asympttoic equicontinuity arguments. Roughly speaking, we show that the consistency of the nonparametric IV estimator in the Sobolev norm together with the Donsker property of Sobolev balls imply that that certain terms associated with residuals are asymptotically negligible. At the same time, the estimation of the nuisance component $\varphi$ has a first-order asymptotic effect due to the $\delta_{u,w}(U_{i},W_{i})$ term, while the higher-order terms are negligible provided that $\|\hat{\varphi}-\varphi\|=o_{P}(n^{-1/4})$ . This rate condition is typically encountered for the semiparametric problems; see Chernozhukov, Chetverikov, Demirer, Duflo, Hansen, Newey, and Robins (2018) and Chernozhukov, Escanciano, Ichimura, Newey, and Robins (2016) for recent contributions, Andrews (1994) for earlier treatment, and Babii (2021), Section 3.3 for a related discussion in the setting of ill-posed inverse problems.

It is worth mentioning that, in some cases, the estimation of nuisance parameters does not have any first-order asymptotic effect, which is known as the Neyman orthogonality property in the semiparametric literature. In particular, this is the case for the independence empirical process based on the nonparametric conditional mean regression residuals; see Einmahl and Van Keilegom (2008). Interestingly, if we had $W\perp\!\!\!\perp(U,Z)$ , then $\rho=0$ , and the estimation of $\varphi$ would not have any first-order asymptotic effect.

Theorem 3.1 can be readily used to construct the residual-based Cramér-von Mises and Kolmogorov-Smirnov statistics

[TABLE]

To understand the behavior of the two statistics under the null and the alternative hypotheses, consider a centered version of the process in Theorem 3.1

[TABLE]

where $h_{u,w}(U,W)=\mathbbm{1}_{\{U\leq u,W\leq w\}}-\mathbbm{1}_{\{U\leq u\}}F_{W}(w)-\mathbbm{1}_{\{W\leq w\}}F_{U}(u)+F_{UW}(u,w)+\delta_{u,w}(U,W)$ . The following Donsker-type central limit theorem holds:

Proposition 3.1.

Suppose that assumptions of Theorem 3.1 are satisfied. Then

[TABLE]

where $\mathbb{H}$ is a tight centered Gaussian process with uniformly continuous sample paths and the covariance function

[TABLE]

Note that under the null hypothesis $H_{0}:U\perp\!\!\!\perp W$ , we have $\mathds{E}[h_{u,w}(U,W)]=0$ and the covariance function of $\mathbb{H}$ simplifies to

[TABLE]

For the alternative hypothesis, $H_{1}:U\not\perp\!\!\!\perp W$ , put

[TABLE]

Consider also a sequence of local alternative hypotheses

[TABLE]

where the function $H$ is such that $F_{UW}$ is a proper CDF. There exist several ways to construct such local alternatives with prespecified marginal distributions $F_{U}$ and $F_{W}$ . For instance, the Morgenstern’s family is $F_{UW}(u,w)=F_{U}(u)F_{W}(w)+aF_{U}(u)F_{W}(w)(1-F_{U}(u))(1-F_{W}(w))$ with $a\in[-1,1]$ ; see Devroye (1986), Chapter XI, Theorem 3.2. The following corollary describes the behavior of the independence test under the null and fixed/local alternative hypotheses:

Corollary 3.1.

Suppose that assumptions of Theorem 3.1 are satisfied. Then under $H_{0}$

[TABLE]

while under $H_{1}$ , we have $T_{2,n},T_{\infty,n}\xrightarrow{\mathrm{a.s.}}\infty$ , provided that $d_{2},d_{\infty}>0$ . Moreover, under $H_{1,n}$

[TABLE]

Corollary 3.1 shows that the residual-based independence test can detect parametric local alternatives. The asymptotic distributions under $H_{0}$ are not pivotal, in contrast to the nonparametric regression without endogeneity, cf. Einmahl and Van Keilegom (2008). While obtaining the distribution-free statistics is possible in simpler residual-based testing problems, see Escanciano, Pardo-Fernández, and Van Keilegom (2018), these methods do not seem to extend naturally to our setting. Therefore, the bootstrap could be an attractive alternative for simulating the critical values of the test. Interestingly, the naive nonparametric and the multiplier bootstraps do not work.

3.3 Critical values

The asymptotic distributions in Corollary 3.1 are nonstandard and depend on several nuisance nonparametric components. This calls for resampling methods to compute the critical values. As can be seen from the proof of Theorem 3.1, our uniform asymptotic expansion relies on the differentiability of the CDF. This leads to a dependence of the asymptotic distribution on the probability density function $f_{UZW}$ in Corollary 3.1; see also the proof of Theorem A.1 and Corollary A.2.1. Such uniform asymptotic expansion cannot be obtained in the same way for the bootstrapped statistics since in the bootstrap world the empirical distribution function is not differentiable.

The lack of smoothness of the empirical distribution function suggests that the standard bootstrap procedures may fail in approximating the asymptotic distribution of the test statistics. The problem of a similar nature occurs with the bootstrap of the cube-root consistent estimators; see, e.g., Babii and Kumar (2021) and references therein. Another complication with the bootstrap is that we typically need to resample from the distribution obeying the constraints of the null hypothesis and that the validity of the bootstrap has to be established case-by-case. Note also that the (smoothed) residual bootstrap, cf. Neumeyer and Van Keilegom (2019), does not preserve the dependence between the endogenous regressor and the unobservables and does not mimic the data generating process of the IV regression under the null hypothesis. In Section 4, we find in Monte Carlo experiments that the standard nonparametric bootstrap does not work.

Consequently, we suggest relying on the subsampling or the $m$ out of $n$ bootstrap to compute the critical values of the test. The resampling procedure is as follows:

Draw a sample of size $m$ from $(Y_{i},Z_{i},W_{i})_{i=1}^{n}$ without replacements (subsampling) or with replacements ( $m$ out of $n$ bootstrap), where $m=m_{n}$ is a sequence such that $m_{n}\to\infty$ and $m_{n}/n\to 0$ and as $n\to\infty$ . 2. 2.

Compute the Kolmogorov-Smirnov or the Cramér-von Mises statistics using the simulated sample. 3. 3.

Repeat the first two steps many times and compute the critical values using empirical quantiles of the statistics over all simulated samples. Alternatively, compute p-values as $1-F_{n}^{*}(T_{n})$ , where $T_{n}$ is the statistics computed from $(Y_{i},Z_{i},W_{i})_{i=1}^{n}$ and $F_{n}^{*}$ is the empirical distribution function of bootstrapped statistics.

An attractive feature of subsampling is that it is valid for general hypothesis testing problems; see Politis, Romano, and Wolf (2001), Theorem 3.1, and there is no need to show its validity in each specific application. An adaptive data-driven rule to select $m_{n}$ is considered, e.g., in Bickel and Sakov (2008).

4 Monte Carlo experiments

To evaluate the finite-sample performance of the test, we simulate samples as

[TABLE]

We set $\varphi(x)=\cos(x)$ and consider samples of size $n=500$ and $n=1,000$ observations; see Supplementary Material for additional simulation results. Note that the degree of separability of unobservables is governed by $\theta\in\mathbf{R}$ . The separable model corresponds to $\theta=0$ , while any $\theta\neq 0$ corresponds to the alternative nonseparable model. It is worth mentioning that under $H_{1}$ , the nonparametric IV regression does not estimate consistently the nonseparable structural function $(z,u)\mapsto\cos(z)+\theta zu$ , which depends on unobservables. The nonparametric IV regression estimates instead the function $z\mapsto\phi(z)$ solving the functional equation $\mathds{E}[Y|W]=\mathds{E}[\phi(Z)|W]$ . The difference between the two functions is precisely what gives the power to the test.

We set the number of Monte Carlo replications and the number of bootstrap replications to $1,000$ through all our experiments. We also discretize all continuous quantities on the grid of 100 equidistant points in $[-4,4]$ . The estimates $\hat{r}$ and $\hat{T}$ in equation (3) are obtained using the sixth-order Epanechnikov kernel. The corresponding bandwidth parameters are computed using Silverman’s rule of thumb: $h_{z}=3.53\hat{\sigma}_{z}n^{-1/13}$ and $h_{w}=3.53\hat{\sigma}_{w}n^{-1/13}$ , where $\hat{\sigma}_{z}$ and $\hat{\sigma}_{w}$ are sample standard deviations of observed $Z$ and $W$ . This choice satisfies Assumption 3.4 and requires that the regularization parameter is $\alpha_{n}\sim n^{-c_{2}}$ with $c_{2}\in(0.5,11/13)$ . To satisfy this requirement, we set $\alpha_{n}=n^{-4/5}$ .

We look at the distributions of Kolmogorov-Smirnov and Cramér-von Mises statistics, computed respectively as

[TABLE]

where $\mathbb{G}_{n}(u,w)=\sqrt{n}(\hat{F}_{\hat{U}W}(u,w)-\hat{F}_{\hat{U}}(u)\hat{F}_{W}(w))$ and the empirical distribution functions are computed as in equation (4). Lastly, we use the adaptive rule of Bickel and Sakov (2008) to estimate the size of the subsample. The rule consists of choosing $\hat{m}_{j}=\operatorname*{arg\,min}_{j\geq 0}\sup_{x}|F_{j}^{*}(x)-F_{j+1}^{*}(x)|$ , where $F_{j}^{*}$ is the empirical distribution of the simulated statistics using a subsample of size $m_{j}=[q^{j}n],j=0,1,2,\dots,4$ , $[a]$ is integer part of $a$ , and $q=0.5$ .

Figure 1 shows the distribution of the test statistics under the null hypothesis and the two alternative hypotheses for different sample sizes. The two distributions are sufficiently distinct once the alternative hypothesis becomes more separated from the null hypothesis.

We plot in Figure 2 the power curves when the level of the test is fixed at $5\%$ . The power of the test increases once alternative hypotheses become more distant from the null hypothesis. The Cramér-von Mises test seems to have higher power for the class of considered alternatives. We can also see that the figure illustrates the consistency of the test in the sense that its power becomes closer to one as the sample size increases under the alternative hypotheses.

In Figure 3, we explore the performance of the bootstrap. We plot the exact finite sample distribution of both test statistics and the distribution of bootstrapped statistics under $H_{0}$ . In panels (a) and (b), we plot the distribution of the naive bootstrap, drawing a sample of size $n$ randomly with replacements from $(Y_{i},Z_{i},W_{i})_{i=1}^{n}$ . In panels (c) and (d), we plot the distribution of the $m$ out of $n$ bootstrap. The naive bootstrap fails and does not mimic the distribution of the Kolmogorov-Smirnov/Cramér-von Mises statistics. The distribution of the $m$ out of $n$ bootstrap, on the other hand, is close to the finite sample distributions of both statistics. We also observe that the adaptive choice seems to work slightly better for the Cramér-von Mises statistics.

5 Are Engel curves separable?

Engel curves are fundamental for the analysis of consumers’ behavior and have implications for the aggregate economic outcomes. The Engel curve describes the relationship between the demand for a particular commodity and the household’s budget. Interesting applications of the estimated Engel curves include a measurement of welfare losses associated with tax distortions in Banks, Blundell, and Lewbel (1997), an estimation of the growth and the inflation in Nakamura, Steinsson, and Liu (2016), or an estimation of the income inequality across countries in Almås (2012). The nonparametric IV approach to the estimation of Engel curves is pioneered in the seminal paper Blundell, Chen, and Kristensen (2007) who focus on the estimation of Engel curves in the UK.

We draw a dataset from the 2015 US Consumer Expenditure Survey; see Babii (2020) for the estimated Engel curves with the uniform confidence bands using this dataset. We restrict our attention to married couples with a positive income during the last 12 months, yielding 10,055 observations. The dependent variable is a share of expenditures on a particular commodity while the endogenous regressor is a natural logarithm of the total expenditures. We instrument the expenditures using the gross income. In particular, Blundell, Chen, and Kristensen (2007) point out that the gross income will be exogenous for consumption expenditures assuming that heterogeneity in earnings is not related to unobserved preferences over consumption; see also Chen and Christensen (2018) and Babii (2020).

In Table 1, we report the $m$ out of $n$ bootstrap p-values, with the adaptive choice of $m$ ; see Section 4 for more details on the practical implementation of tests. We report results for both the Kolmogorov-Smirnov (KS) and the Cramér-von Mises (CvM) tests. Remarkably, the 5% level tests reject the separability for all commodities with the exception for the Entertainment (KS) and Reading (CvM). Moreover, the 1% level tests reject separability in all cases, except for Reading (CvM) and Transportation (CvM). This suggests that Engel curves for these commodities may exhibit substantial heterogeneities in unobservables.

6 Conclusions

This paper offers a new perspective on the separability of unobservables in economic models with endogeneity. Starting from the nonseparable model where the instrumental variable is independent of unobservables, our first contribution is to develop a novel fully nonparametric separability test. The test is based on the estimation of a separable nonparametric IV regression and the verification of the independence restriction imposed by the nonseparable IV model. To obtain a large sample approximation to the distribution of our test statistics, we develop a novel uniform asymptotic expansions of the empirical distribution function of nonparametric IV residuals and obtain new results for the Tikhonov regularization in Sobolev spaces. We show that, despite the uncertainty coming from an ill-posed inverse nonparametric IV regression, the empirical distribution function of residuals and the residual-based independence empirical process still satisfy the Donsker central limit theorem. In contrast to the nonparametric regression without endogeneity, we find that the parameter uncertainty affects the asymptotic distribution of the residual-based independence tests, which are highly nonstandard. In our Monte Carlo experiments, we find that the bootstrap fails in approximating the distribution of the test statistics under the null hypothesis; hence we rely on the $m$ out of $n$ bootstrap (or subsampling) procedure to compute its critical values.

Using the 2015 US Consumer Expenditure Survey data, we find that the $1\%$ level test rejects the separability of Engel curves for most of the commodities. This indicates that the Engel curves may be heterogeneous in unobservables and that the nonseparable modeling of Engel curves may be useful, see, e.g., Blundell, Horowitz, and Parey (2017) for the estimation of nonseparable demand functions.

The paper offers several other directions for future research. First, it might be interesting to test the separability of unobservables in other structural relations that are commonly estimated using the additively separable models in the empirical practice, such as the production function, the labor supply function, the demand function, or the wage equation. Second, given the plethora of residual-based specification tests for regression models without endogeneity, our results could also be used to develop similar tests for econometric models with endogeneity; see Pardo-Fernández, Van Keilegom, and González-Manteiga (2007) and Escanciano, Pardo-Fernández, and Van Keilegom (2018).

Acknowledgement

This work was supported by the French National Research Agency under Grant ANR-19-CE40-0013-01/ExtremReg project. We thank Ivan Canay, Tim Christensen, Elia Lapenta, Pascal Lavergne, Thierry Magnac, Nour Meddahi, and Ingrid Van Keilegom for helpful discussions. All remaining errors are ours.

APPENDIX: ADDITIONAL RESULTS AND PROOFS

Notation.

For two sequences $(a_{n})_{n\in\mathbf{N}}$ and $(b_{n})_{n\in\mathbf{N}}$ , we denote $a_{n}\lesssim b_{n}$ if $a_{n}=O(b_{n})$ and $a_{n}\sim b_{n}$ if both $a_{n}\lesssim b_{n}$ and $b_{n}\lesssim a_{n}$ . For two sequences of random variables $(X_{n})_{n\in\mathbf{N}}$ and $(Y_{n})_{n\in\mathbf{N}}$ , we denote $X_{n}\lesssim_{P}Y_{n}$ for $X_{n}=O_{P}(Y_{n})$ . For a bounded linear operator $T:\mathcal{X}\to\mathcal{Y}$ on normed spaces, we use $\|T\|_{\mathrm{op}}=\inf\{c\geq 0:\;\|Tx\|\leq c\|x\|,\forall x\in\mathcal{X}\}$ to denote its operator norm, where with some abuse of notation, we use $\|.\|$ to denote the norm of both spaces.

A.1 Tikhonov regularization in Sobolev spaces

This section discusses convergence rates for the Tikhonov-regularized estimator in Sobolev spaces. The following result extends Carrasco, Florens, and Renault (2014), Proposition 3.1 to the case of the unknown operator.

Theorem A.1.

Suppose that Assumption 3.1 is satisfied, $\|\hat{T}-T\|_{\mathrm{op}}^{2}\lesssim_{P}\alpha_{n}$ , and $2s\geq b-a$ . Then for every $c\in[0,s]$

[TABLE]

It is worth emphasizing that this result is not specific to the nonparametric IV regression and can be applied to a generic ill-posed inverse problem $T\varphi=r$ , where $(T,r)$ is estimated with $(\hat{T},\hat{r})$ . Moreover, in the case of nonparametric IV regression, it can be easily applied to nonparametric/machine learning estimators $(\hat{T},\hat{r})$ other than the kernel smoothing. Next, we specialize the generic result of Theorem A.1 to the nonparametric IV regression with $(T,r)$ estimated via kernel smoothing, see equation (3).

Corollary A.1.1.

Suppose that Assumptions 3.1 and 3.2 are satisfied, $\frac{1}{nh_{n}^{p+q}}\vee h_{n}^{2t}=O\left(\alpha_{n}\right)$ , and $2s\geq b-a$ . Then for every $c\in[0,s]$

[TABLE]

A.2 Distribution of nonparametric IV residuals

In this section, we present results on the weak convergence of the empirical distribution of nonparametric IV residuals. These results are used to obtain the large sample approximation to the distribution of independence tests and are of independent interest.

Theorem A.1.

Suppose that Assumptions 3.1, 3.2, and 3.3 (i), and 3.4 are satisfied. Then

[TABLE]

uniformly over $u\in\mathbf{R}$ .

Proof.

By Lemma A.4.1, the following expansion holds uniformly in $u\in\mathbf{R}$

[TABLE]

By Taylor’s theorem, there exists some $\tau\in[0,1]$ such that

[TABLE]

By Lemma B.1.1 in the Supplementary Material,

[TABLE]

while under Assumptions 3.3 (i) and 3.4

[TABLE]

Combining all estimates, we obtain uniformly in $u\in\mathbf{R}$

[TABLE]

∎

As a consequence of Theorem A.1, we obtain the following Donsker-type central limit theorem for the empirical distribution of nonparametric IV residuals.

Corollary A.2.1.

Suppose that assumptions of Theorem A.1 are satisfied. Then

[TABLE]

where $\mathbb{G}$ is a tight centered Gaussian process with uniformly continuous sample paths and the covariance function

[TABLE]

Proof.

The process given in Theorem A.1 is an empirical process indexed by the following class of functions $\mathcal{F}=\left\{(v,w)\mapsto\mathbbm{1}_{\{v\leq u\}}+v\left(T(T^{*}T)^{-1}f_{UZ}(u,.)\right)(w),\;u\in\mathbf{R}\right\}$ , which is a sum of the Donsker class and $\mathcal{H}=\left\{(v,w)\mapsto v\left(T(T^{*}T)^{-1}f_{UZ}(u,.)\right)(w),\;u\in\mathbf{R}\right\}$ . By van der Vaart and Wellner (1996), Example 2.10.5, it enough to show that $\mathcal{H}$ is Donsker. The former statement follows from the fact that under Assumption 3.1 in the Supplementary Material by Engl, Hanke, and Neubauer (2000), since for $\kappa-a>q/2$

[TABLE]

where the last inequality follows under Assumption 3.3 (i). Therefore, $\mathcal{H}\subset\{(v,w)\mapsto vg(w):\;g\in H^{\kappa-a}_{M}\}$ , where $H_{M}^{\kappa-a}$ is a Sobolev ball of radius $M$ . Since $\kappa>a+q/2$ , this shows that the class $\mathcal{H}$ is Donsker; see Nickl and Pötscher (2007), Corollaries 4 and 5. The covariance function simplifies since $\mathds{E}[U|W]=0$ . ∎

A.3 Proofs of main results

In this section we provide proofs of main results of the paper.

Proof of Proposition 2.1.

Since $T:L_{2}(\mathbf{R}^{p})\to L_{2}(\mathbf{R}^{q})$ is injective, the nonparametric IV regression $\varphi\in L_{2}(\mathbf{R}^{p})$ is unique. Therefore, $U=Y-\varphi(Z)$ is a well-defined unique random variable. If the model in equation (1) admits a separable representation, then since $\varepsilon\perp\!\!\!\perp W$

[TABLE]

Therefore, $\varphi(Z)=\psi(Z)+\mathds{E}g(\varepsilon)$ by the injectivity of $T$ , and whence $U=g(\varepsilon)-\mathds{E}g(\varepsilon)$ . This shows that $U\perp\!\!\!\perp W$ because $\varepsilon\perp\!\!\!\perp W$ . ∎

Proof of Theorem 3.1.

By Lemma A.4.2, uniformly in $(u,w)$

[TABLE]

where

[TABLE]

The first term is a classical independence empirical process

[TABLE]

where the second line follows by the maximal inequality.

Next, under Assumption 3.3 (i), by Taylor’s theorem, for some $\tau\in[0,1]$

[TABLE]

Under Assumptions 3.3 by Corollary A.1.1

[TABLE]

Similarly, we have uniformly in $(u,w)$

[TABLE]

Therefore, uniformly in $(u,w)\in\mathbf{R}\times\mathbf{R}^{q}$

[TABLE]

where the last line follows by the same argument as in the proof of Theorem A.1 under Assumption 3.3 (i). ∎

Proof of Proposition 3.1.

$\mathbb{H}_{n}$ is an empirical process indexed by the class of functions

[TABLE]

By van der Vaart and Wellner (1996), Example 2.10.7 it suffices to show that each of the functions in the sum constitutes a Donsker class. To that end, recall first that the indicator functions are classical examples of Donsker classes. Therefore, all terms in $\mathcal{F}$ , but the last one, are either Donsker or can be factored as Donsker classes and a deterministic bounded function not depending on the argument of the indicator function. Lastly, under Assumptions 3.1 (i) by Engl, Hanke, and Neubauer (2000), Corollary 8.22

[TABLE]

where the latter follows under Assumption 3.3 (ii). Therefore, we obtain that $\{(v,w)\mapsto v(T(T^{*}T)^{-1}g(\tilde{v},\tilde{w},.))(w):\;\tilde{v}\in\mathbf{R},\tilde{w}\in\mathbf{R}^{q}\}\subset\{(v,w)\mapsto vg(w):\;g\in H^{\kappa-a}_{M}\}$ , where $H_{M}^{\kappa-a}$ is a Sobolev ball of radius $M$ . Since $\kappa>a+q/2$ , this shows that $\mathcal{F}$ is Donsker; see Nickl and Pötscher (2007), Corollaries 4 and 5. ∎

Proof of Corollary 3.1.

Since under $H_{0}$ , $\mathbb{G}_{n}\leadsto\mathbb{H}$ by Proposition 3.1, the asymptotic distribution of $T_{\infty,n}$ under $H_{0}$ is readily obtained by the continuous mapping theorem; see van der Vaart and Wellner (1996), Theorem 1.3.6. For the Cramér-von Mises statistics, write

[TABLE]

with

[TABLE]

By Proposition 3.1, under $H_{0}$ , $\mathbb{G}_{n}\leadsto\mathbb{H}$ and $\sqrt{n}(\hat{F}_{\hat{U}W}(u,w)-F_{UW}(u,w))$ also converges weakly by Proposition 3.1 and Theorem A.2.1, whence by the Skorokhod construction

[TABLE]

The first expression in Eq. A.1 implies that $R_{1n}\xrightarrow{\mathrm{a.s.}}0$ . Since $\mathbb{H}$ has a.s. bounded and continuous trajectories, the second expression in Eq. A.1 in conjunction with the Helly-Bray theorem show that $R_{2n}\xrightarrow{\mathrm{a.s.}}0$ . Therefore, the asymptotic distribution of the Cramér-von Mises test follows by the continuous mapping theorem.

Under the fixed alternative hypothesis, since $\mathds{E}[U|W]=0$ , by Theorem 3.1, the Glivenko-Cantelli theorem, and a similar argument we obtain

[TABLE]

Therefore, by Slutsky’s theorem $T_{2,n}\xrightarrow{\mathrm{a.s.}}\infty$ and $T_{\infty,n}\xrightarrow{\mathrm{a.s.}}\infty$ , which proves the second statement. For the local alternatives, note that

[TABLE]

Therefore, by Corollary 3.1 and continuous mapping theorem

[TABLE]

For the Cramér-von Mises statistics, write

[TABLE]

where

[TABLE]

Therefore, the result follows by Proposition 3.1 and the same argument as under $H_{0}$ with the only difference that now we have the bias $2H$ in the limiting distribution. ∎

A.4 Auxiliary technical results

In this section, we provide several auxiliary technical results.

Lemma A.4.1.

Suppose that Assumption 3.1, 3.2, 3.3, and 3.4. Then

[TABLE]

where $\hat{\Delta}=\hat{\varphi}-\varphi$ and $\mathscr{X}=(Y_{i},Z_{i},W_{i})_{i=1}^{\infty}$ .

Proof.

The main idea of the proof is to embed the process inside the supremum into an empirical process indexed by $u$ and a Sobolev ball containing $\hat{\Delta}$ with a probability tending to one. We first show that the process is Donsker, whence the supremum in Eq. A.2 is $O_{P}(n^{-1/2})$ . Finally, the required $o_{P}(n^{-1/2})$ order will follow from the fact that the process is degenerate.

Let $H_{M}^{c}$ be a ball of radius $M<\infty$ in the Sobolev space $H^{c}(\mathbf{R}^{p})$ . For $u\in\mathbf{R}$ and $\Delta\in H^{c}_{M}$ , define $f_{u,\Delta}(U,Z)=\mathbbm{1}_{(-\infty,u+\Delta(Z)]}(U)$ , $\mathcal{G}_{1}=\left\{f_{u,\Delta}:\;u\in\mathbf{R},\Delta\in H^{c}_{M}(\mathbf{R}^{p})\right\}$ , $\mathcal{G}_{2}=\left\{f_{u,0}:\;u\in\mathbf{R}\right\}$ , and $\mathcal{G}=\mathcal{G}_{1}-\mathcal{G}_{2}$ . Note that $\mathcal{G}_{2}$ is a classical Donsker class of indicator functions. If we can show that $\mathcal{G}_{1}$ is Donsker, then $\mathcal{G}$ will be Donsker as a sum of two Donsker classes; see van der Vaart and Wellner (1996), Theorem 2.10.6. To this end, we check that the bracketing entropy condition is satisfied for $\mathcal{G}_{1}$ .

By Nickl and Pötscher (2007), Corollary 4 the bracketing number of $H_{M}^{c}$ satisfies $\log N_{[\;]}(\varepsilon,H_{M}^{c},\|.\|_{L^{2}_{Z}})\lesssim\varepsilon^{-p/c}$ , where $(L^{2}_{Z},\|.\|_{L^{2}_{Z}})$ denotes the space of functions, square-integrable with respect to $f_{Z}$ . Put $M_{\varepsilon}=N_{[\;]}(\varepsilon,H_{M}^{c},\|.\|_{L^{2}_{Z}})$ and fix $u\in\mathbf{R}$ . Let $\left[\underline{\Delta}_{j},\overline{\Delta}_{j}\right]_{j=1}^{M_{\varepsilon}}$ be a collection of $\varepsilon$ -brackets for $H^{c}_{M}$ , i.e., for any $\Delta\in H^{c}_{M}$ , there exists $1\leq j\leq M_{\varepsilon}$ such that $\underline{\Delta}_{j}\leq\Delta\leq\overline{\Delta}_{j}$ and $\left\|\overline{\Delta}_{j}-\underline{\Delta}_{j}\right\|_{L_{Z}^{2}}\leq\varepsilon$ , and whence $\mathbbm{1}_{\left(-\infty,u+\underline{\Delta}_{j}\right]}\leq\mathbbm{1}_{\left(-\infty,u+\Delta\right]}\leq\mathbbm{1}_{\left(-\infty,u+\overline{\Delta}_{j}\right]}$ . Now for each $1\leq j\leq M_{\varepsilon}$ , partition the real line into intervals defined by grids of points $-\infty=\underline{u}_{j,1}<\underline{u}_{j,2}<\dots<\underline{u}_{j,M_{1\varepsilon}}=\infty$ and $-\infty=\overline{u}_{j,1}<\overline{u}_{j,2}<\dots<\overline{u}_{j,M_{2\varepsilon}}=\infty$ , so that each segment has probabilities

[TABLE]

Denote the largest $\underline{u}_{j,k}$ such that $\underline{u}_{j,k}\leq u$ by $\underline{u_{j}^{*}}$ and the smallest $\overline{u}_{j,k}$ such that $u\leq\overline{u}_{jk}$ by $\overline{u}_{j}^{*}$ . Consider the following family of brackets $\left[\mathbbm{1}_{\left(-\infty,\underline{u}_{j}^{*}+\underline{\Delta}_{j}\right]},\mathbbm{1}_{\left(-\infty,\overline{u}_{j}^{*}+\overline{\Delta}_{j}\right]}\right]_{j=1}^{M_{\varepsilon}}.$ Under Assumption 3.2 (ii)

[TABLE]

Therefore, we constructed brackets of size $O(\varepsilon)$ , covering $\mathcal{G}_{1}$ , and we have used at most $O\left(\varepsilon^{-2}M_{\varepsilon}\right)$ such brackets. Since $c>p/2$ , we have $\int_{0}^{1}\sqrt{\log N_{[\;]}(\varepsilon,\mathcal{G},\|.\|_{L_{Z}^{2}})}\mathrm{d}\varepsilon<\infty$ . This shows that the empirical process $\sqrt{n}(P_{n}-P)g,g\in\mathcal{G}$ is Donsker, hence, asymptotically equicontinuous; see van der Vaart and Wellner (1996), Theorem 1.5.7. Then for any $\varepsilon>0$

[TABLE]

where $\mathrm{Pr}^{*}$ denotes the outer probability measure.

Next, we show that for every $u\in\mathbf{R}$ , $\rho^{2}(\hat{f}_{u})=\mathds{E}[\hat{f}_{u}^{2}]-(\mathds{E}[\hat{f}_{u}])^{2}=o_{P}(1)$ with $\hat{f}_{u}=\mathbbm{1}_{(-\infty,u+\hat{\Delta}(Z)]}(U)-\mathbbm{1}_{(-\infty,u]}(U)$ , where the expectation is computed with respect to $(U,Z)$ only. Indeed,

[TABLE]

where the third line follows by the Cauchy-Schwartz inequality and Corollary A.1.1 under Assumptions 3.1, 3.2, and 3.4. Similarly,

[TABLE]

Lastly, let $\|\hat{\nu}_{n}\|_{\infty}$ denote the supremum in Eq A.2. Then

[TABLE]

where the second probability tends to zero as we have just shown and the last probability tends to zero since under the maintained assumptions, by Corollary A.1.1, $\|\hat{\varphi}-\varphi\|_{c}=o_{P}(1)$ . Therefore, it follows from the asymptotic equicontinuity in Eq. A.3 that $\limsup_{n\to\infty}\mathrm{Pr}^{*}(\sqrt{n}\|\hat{\nu}_{n}\|_{\infty}>\varepsilon)=0$ , which concludes the proof. ∎

Lemma A.4.2.

Suppose that Assumptions 3.1, 3.2, 3.3, and 3.4 are satisfied. Then uniformly over $(u,w)\in\mathbf{R}\times\mathbf{R}^{q}$

[TABLE]

and

[TABLE]

where $\hat{\Delta}=\hat{\varphi}-\varphi$ and $\mathscr{X}=(Y_{i},Z_{i},W_{i})_{i=1}^{\infty}$ ,

Proof.

Note that the first expression and the expression in the statement of Lemma A.4.1 multiplied by $F_{W}$ differ only by

[TABLE]

which is $O_{P}(n^{-1})$ by Corollary A.2.1 and the classical Donsker central limit theorem. By Lemma A.4.1, we obtain the first statement since $F_{W}$ is uniformly bounded by one.

The proof of the second statement is similar to the proof of Lemma A.4.1 and is omitted. ∎

B.1 Additional proofs and auxiliary results

This section contains proofs of several results from the main part of the paper as well as several auxiliary result.

Proof of Theorem A.1.

Decompose

[TABLE]

with

[TABLE]

For the first term

[TABLE]

where the second line follows by Engl, Hanke, and Neubauer (2000), Corollary 8.22 with $\nu=(s-c)/(a+s)\leq 1$ ; the third line by the definition of operator norm; the fourth line by the isometry of functional calculus; and the last since $\sup_{\lambda}|\lambda^{d}/(\alpha_{n}+\lambda)|\lesssim\alpha_{n}^{d-1}$ for all $d\in[0,1]$ .

Similarly, since for bounded linear operators $A$ and $B$ , $\|AB\|_{\mathrm{op}}\leq\|A\|_{\mathrm{op}}\|B\|_{\mathrm{op}}$ ,

[TABLE]

Next, since $L^{s}\varphi\in H^{b-s}$ and $s\geq(b-a)/2$ , by Engl, Hanke, and Neubauer (2000), Corollary 8.22, there exists $\psi\in L_{2}$ such that $L^{s}\varphi=(T_{s}^{*}T_{s})^{\frac{b-s}{2(a+s)}}\psi$ . Therefore,

[TABLE]

Next, decompose

[TABLE]

with

[TABLE]

and

[TABLE]

Similarly, decompose

[TABLE]

with $S_{1n}$ and $S_{2n}$ defined below. In particular,

[TABLE]

where the last two lines follow by Engl, Hanke, and Neubauer (2000), Corollary 8.22 with $\nu=s/(a+s)\leq 1$ and previous computations. Similarly,

[TABLE]

The result follows from combining all estimates. ∎

Proof of Corollary A.1.1.

By the Cauchy-Schwartz inequality

[TABLE]

where the second line follows from the well-known risk bound; see, e.g., Giné and Nickl (2015), p. 403-404 under Assumption 3.2. Therefore, by Theorem A.1

[TABLE]

The proof of

[TABLE]

under Assumption 3.2 can be found in Babii and Florens (2020). ∎

Lemma B.1.1.

Suppose that Assumptions 3.1, 3.2, 3.3, and 3.4, and are satisfied. Then

[TABLE]

Proof.

Similarly to the proof of Theorem A.1, decompose

[TABLE]

with

[TABLE]

We show below that $\|II_{n}+III_{n}+IV_{n}+V_{n}\|_{\infty}=o_{P}(1)$ . To that end, first since $T_{s}=TL^{-s}$

[TABLE]

where the third line follows under Assumptions 3.1 (i) and 3.3 (i); and the fourth by arguments as in the proof of Corollary A.1.1 under Assumptions 3.2 and 3.4 (ii).

Second,

[TABLE]

where the first equality follows since $L$ is self-adjoint; the second line by the Cauchy-Schwartz inequality since $\sup_{u}\|L^{a}f_{UZ}(u,.)\|<\infty$ under Assumption 3.3 (i) and by Assumption 3.1 (i); the third since $L^{s}\varphi=(T^{*}_{s}T_{s})^{\frac{b-s}{2(a+s)}}\psi$ for some $\psi\in L_{2}$ by Engl, Hanke, and Neubauer (2000), Corollary 8.22; the fourth by the isometry of the functional calculus; and the last since $2s=b-a$ and since $n\alpha_{n}^{2}\to 0$ under Assumption 3.4 (iii).

Next, decompose $III_{n}(u)=R_{1n}(u)+R_{2n}(u)$ with

[TABLE]

By the Cauchy-Schwartz inequality and previous computations

[TABLE]

and

[TABLE]

Therefore, under Assumption 3.4 since $\kappa>2a$

[TABLE]

Similarly, decompose $IV_{n}(u)=S_{1n}(u)+S_{2n}(u)$ with

[TABLE]

Likewise, by the Cauchy-Schwartz inequality and previous computations

[TABLE]

and

[TABLE]

where we use $\|\hat{T}-T\|_{\rm op}\lesssim_{P}\alpha_{n}^{1/2}$ and $2s=b-a$ ; see also the proof of Theorem A.1. Therefore, $\|IV_{n}\|_{\infty}=o_{P}(1)$ .

Combining all estimates, we obtain uniformly over $u\in\mathbf{R}$

[TABLE]

Next, note that

[TABLE]

with $[\varphi\ast K_{z}](z)\triangleq\int\varphi(v)h_{n}^{-p}K_{z}\left(h_{n}^{-1}(z-v)\right)\mathrm{d}v$ , whence

[TABLE]

with $[f_{ZW}\ast K_{w}](z,w)\triangleq\int f_{ZW}(z,v)h_{n}^{-q}K_{w}\left(h_{n}^{-1}(w-v)\right)\mathrm{d}v$ . Using this observation, decompose equation (B.1) further

[TABLE]

with

[TABLE]

By the Cauchy-Schwartz inequality

[TABLE]

where the second line follows by triangle inequality and Assumption 3.3 (i); the third by Assumption 3.2 (i), Cauchy-Schwartz inequality, and since $f_{Z}$ and $f_{W}$ are uniformly bounded under Assumption 3.2 (ii); and the last by the standard bias computations under Assumptions 3.1 (ii) and 3.2, and Young’s inequality under Assumption 3.2 (ii) and (iv).

Similarly, by the Cauchy-Schwartz inequality and Assumption 3.1 (i)

[TABLE]

where the second line follows under the i.i.d. assumption; the third since $\mathds{E}[U|W]\leq C$ under Assumption 3.2 (i); the fourth since $f_{W}$ is uniformly bounded under Assumption 3.2 (ii); and the last by the standard bias computations under Assumptions 3.1 (ii) and 3.2 (iv).

Lastly, by the Cauchy-Schwartz inequality

[TABLE]

where the second inequality follows under Assumptions 3.2 (i); the third line under Assumptions 3.1, 3.2 (i)-(ii), and 3.3 (i); and the last by the isometry of functional calculus.

Combining these estimates under Assumptions 3.3 (i) and 3.4, we obtain the result

[TABLE]

∎

B.2 Additional Monte Carlo experiments

In this section, we report results of additional Monte Carlo experiments when the structural function is $\varphi(x)=\exp(-x^{2}/4)$ . The rest of the data-generating process is the same as in the main part of the paper.

Figure B.1 shows the distribution of the test statistics under the null hypothesis and the two alternative hypotheses for different sample sizes. The two distributions are sufficiently distinct once the alternative hypothesis becomes more separated from the null hypothesis.

We plot in Figure B.2 the power curves when the level of the test is fixed at $5\%$ . The power of the test increases once alternative hypotheses become more distant from the null hypothesis and when the sample size is larger. The Cramér-von Mises test seems to have a higher power for the class of considered alternatives.

Overall, the findings are largely similar to the findings of experiments presented in the main part of the paper.

Bibliography63

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Akritas and Van Keilegom (2001) Akritas, M., and I. Van Keilegom (2001): “Non-parametric estimation of the residual distribution,” Scandinavian Journal of Statistics , 28(3), 549–567.
3Almås (2012) Almås, I. (2012): “International income inequality: measuring ppp bias by estimating Engel curves for food,” American Economic Review , 102(2), 1093–1117.
4Andrews (1994) Andrews, D. (1994): “Chapter 37: Empirical process methods in econometrics,” Handbook of Econometrics , 4, 2247–2294.
5Babii (2020) Babii, A. (2020): “Honest confidence sets in nonparametric IV regression and other ill-posed models,” Econometric Theory , 36(4), 658–706.
6Babii (2021) Babii, A. (2021): “High-dimensional mixed-frequency IV regression,” ar Xiv preprint ar Xiv:2003.13478 .
7Babii and Florens (2020) Babii, A., and J.-P. Florens (2020): “Is completeness necessary? Estimation in nonidentified linear models,” ar Xiv preprint ar Xiv:1709.03473 .
8Babii and Kumar (2021) Babii, A., and R. Kumar (2021): “Isotonic regression discontinuity designs,” Journal of Econometrics (forthcoming) .

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Are Unobservables Separable?

Abstract

1 Introduction

2 Separability of unobservables

2.1 Motivating examples

Example 2.1** (Demand function).**

Example 2.2** (Production function/frontier).**

2.2 A testable implication

Proposition 2.1**.**

3 Independence test

3.1 Tikhonov regularization in Sobolev spaces

3.2 Distribution of statistics

Assumption 3.1**.**

Assumption 3.2**.**

Assumption 3.3**.**

Assumption 3.4**.**

Theorem 3.1**.**

Proposition 3.1**.**

Corollary 3.1**.**

3.3 Critical values

4 Monte Carlo experiments

5 Are Engel curves separable?

6 Conclusions

Acknowledgement

Notation.

A.1 Tikhonov regularization in Sobolev spaces

Theorem A.1**.**

Corollary A.1.1**.**

A.2 Distribution of nonparametric IV residuals

Theorem A.1**.**

Proof.

Corollary A.2.1**.**

Proof.

A.3 Proofs of main results

Proof of Proposition 2.1.

Proof of Theorem 3.1.

Proof of Proposition 3.1.

Proof of Corollary 3.1.

A.4 Auxiliary technical results

Lemma A.4.1**.**

Proof.

Lemma A.4.2**.**

Proof.

B.1 Additional proofs and auxiliary results

Proof of Theorem A.1.

Proof of Corollary A.1.1.

Lemma B.1.1**.**

Proof.

B.2 Additional Monte Carlo experiments

Example 2.1 (Demand function).

Example 2.2 (Production function/frontier).

Proposition 2.1.

Assumption 3.1.

Assumption 3.2.

Assumption 3.3.

Assumption 3.4.

Theorem 3.1.

Proposition 3.1.

Corollary 3.1.

Theorem A.1.

Corollary A.1.1.

Theorem A.1.

Corollary A.2.1.

Lemma A.4.1.

Lemma A.4.2.

Lemma B.1.1.