Relaxing the Exclusion Restriction in Shift-Share Instrumental Variable Estimation
Nicolas Apfel

TL;DR
This paper introduces methods to relax the strict exclusion restriction in shift-share instrumental variable estimation, allowing for invalid shares and improving causal inference in economic studies.
Contribution
It proposes novel techniques to relax the exclusion restriction in shift-share IV estimation and demonstrates their application in empirical examples.
Findings
The estimated effect of immigration on wages becomes lower and sometimes negative.
Results on Chinese import exposure and employment are mostly robust to the new methods.
The new methods reconcile some previous discrepancies in causal estimates.
Abstract
Many economic studies use shift-share instruments to estimate causal effects. Often, all shares need to fulfil an exclusion restriction, making the identifying assumption strict. This paper proposes to use methods that relax the exclusion restriction by selecting invalid shares. I apply the methods in two empirical examples: the effect of immigration on wages and of Chinese import exposure on employment. In the first application, the coefficient becomes lower and often changes sign, but this is reconcilable with arguments made in the literature. In the second application, the findings are mostly robust to the use of the new methods.
| 0 | ||||||
| A | 0 | 0 | 0 | 0 | 0 | 1 |
| B | 0 | 0 | 0 | 0 | 1 | 1 |
| C | 0 | 0 | 0 | 1 | 1 | 1 |
| D | 0 | 1 | 1 | 1 | 1 | 1 |
| E | 0 | 0 | 1 | 1 | 1 | 1 |
| Hansen (p) | 0.0005 | 0.001 | 0.07 | 0.21 | - | - |
| Author | Journal | Citation |
|---|---|---|
| Amior (2020) | Working Paper (WP) | “The enclave instrument’s validity depends on the exogeneity of the initial (origin-specific) migrant population shares (Goldsmith-Pinkham, Sorkin and Swift, 2018).” |
| Edo, Giesing, Öztunc, and Poutvaara (2019) | Europ. Econ. Rev. (EER) | “The identifying assumption is that the distribution of immigrants in 1968 is not correlated with voting […]. This exclusion restriction means that, for instance, local economic shocks in 1968 are not correlated with voting more than 20 years later […]. The assumption would be invalid if the initial distribution of immigrants is correlated with persistent local factors that influence future votes.” |
| Bratti and Conti (2018) | Reg. Stud. | “The main identifying assumption is that […] the between-province variation within the same NUTS-2 region in the distribution of immigrants by different nationalities in 1995 was approximately random with respect to provinces’ future innovation prospects.” |
| Aydemir and Kirdar (2017) | EER | “ The validity of our instrument requires that the ratio of earlier repatriates to non-repatriates across locations be unrelated to the change in unemployment rate from 1985 to 1990 in any way other than through its effect on the number of 1989 repatriates. […] The key concern as to the validity of our instrument is that if earlier repatriates chose their locations based on economic circumstances, we could expect their location of residence in 1985 to be related to the change in the economic conditions from 1985 to 1990 in that location.” |
| Hunt (2017) | J. of Human Res. (JHR) | “The instrument will be invalid if nonimmigration shocks to high school completion are correlated with 1940 immigrant densities”. |
| Foged and Peri (2016) | AEJ: Applied | “ The plausibility of the exclusion restriction is predicated on the independence of the dispersal policy from labor demand conditions. ” Note: the dispersal policy determines the shares. |
| Moreno-Galbis and Tritah (2016) | EER | “For our exclusion restriction to be valid, we require the natives’ distribution within each educational group across occupations […] to be independent from immigrants’ labor supply shock.” |
| Basso and Peri (2015) | WP | “This instrument is based on the idea that the distribution of foreign born of nationality in CZ in is uncorrelated with subsequent demand shifts and productivity changes in that CZ.” |
| Bosetti, Cattaneo, and Verdolini (2015) | J. of Int. Econ. | “The underlying exclusion restriction for this instrument is that the 1991 settlement of migrants by origin is not correlated with the economic situation after 1996. ” |
| Cattaneo, Fiorio, and Peri (2015) | JHR | “ The assumption behind this instrument is that the distribution of immigrants of specific nationality across countries or occupations in 1991 is the result of historical settlements and past historical events. ” “ It should, therefore, be correlated with the share of foreign-born, but not with the region-sector specific demand shocks. ” |
| Dustmann and Glitz (2015) | J. of Labor Econ. (JoLE) | “The idea is that immigrants tend to settle in areas in which other immigrants of the same country of origin have already settled earlier […] but that these historical settlement patterns are not related to current demand-induced changes in local labor supply.” “Under the plausible assumption that current regional demand-induced labor market shocks are uncorrelated with past immigrant settlement patterns, this instrument leads to estimates that have a causal interpretation.” |
| Kerr, Kerr, and Lincoln (2015) | JoLE | The authors mention a concern about the shift-share IV: “whether the initial distribution of country groups for skilled immigrants used in the interaction is correlated with something else that affects the measured outcomes.” |
| Author | Journal | Citation |
|---|---|---|
| Orrenius and Zavodny (2015) | JoLE | “To be valid, the instrument requires assuming that the distribution of immigrants by country or region of origin across states 10 years ago is not correlated with shocks that affect the probability that natives in a state major in a STEM field 10 years later.” |
| Peri, Shih, and Sparber (2015) | JoLE | The IV’s “validity is based, in large part, on the assumption that the 1980 employment share of foreign STEM workers varied across cities because of factors related to the persistent agglomeration of foreign communities in some localities. These historical differences […] affected the change in the supply of foreign STEM workers but were unrelated to shocks affecting city-level native wage and employment growth. […] For example, the initial distribution of foreign STEM may be correlated with persistent city factors that influenced future labor market outcomes, resulting in omitted variables bias.” |
| D’Amuri and Peri (2014) | J. of the Europ. Econ. Assoc. (JEEA) | “The underlying assumption is that while new immigrants tend to settle where existing immigrant communities already exist, in order to exploit ethnic networks and amenities, their historical presence is unrelated to current cell-specific changes in labor demand. […] Current changes in labor demand have no correlation with the past presence of immigrants, which only affects the supply of labor and skills in that cell.” |
| Dustmann, Frattini, and Preston (2013) | Rev. of Econ. Stud. (REStud) | “We instrument the change in this ratio using two alternative but closely related instruments: the 1991 ratio of immigrants to natives for each of these regions, from the Census of Population, interacted with year dummies, and four period lags of the ratio of immigrants to natives in each region from the LFS.” Note that shares are used as IVs directly. |
| Smith (2012) | JoLE | “The exclusion restriction for this instrument requires that the composition of the immigrant population in t-1 […] affects changes in native labor market outcomes only through its effect on changes in immigrant stocks.” |
| Cortes and Tessada (2011) | AEJ: Applied | “The instrument will help in identifying the causal effect of immigration concentration on time use of native women as long as the following conditions hold: (1) The unobserved factors determining that more immigrants decided to locate in city versus city (both cities in the same region) in 1970 are not correlated with changes in the relative economic opportunities for skilled women offered by the two cities during the 1980s and 1990s.” |
| Farré, González, and Ortega (2011) | BE J. of Econ. Anal. & Pol. | “Our exogeneity assumption is that regional shocks to the demand for female skilled labor between 1999 and 2008 are uncorrelated with immigrant location patterns prior to 1991.” |
| Dustmann, Fabbri, and Preston (2005) | Economic J. | “Pre-existing immigrant concentrations are unlikely to be correlated with current economic shocks if measured with a sufficient time lag, since existing concentrations are determined not by current economic conditions, but by historic settlement patterns of previous immigrants.” |
| Ottaviano and Peri (2005) | NBER WP | “Since the instrument uses only the initial composition of foreign-born residents in a city and subsequent average immigration rates in the U.S. by nationality, it is not correlated with any city-specific factor that would affect actual immigration in the city during the decade. As a consequence it is by construction orthogonal to any city-specific shock to productivity, amenities and labor market conditions.” |
| DV | Method | SL | Migration | Battle-related deaths | Onesided violence | Nonstate violence | Population | FH Civil Liberties | FH Political | FH Status | Polity | Press Freedom Status | Press Freedom Score |
| dlweekly | AL (HS) | 0.1 | - | x | - | - | - | - | x | - | - | - | x |
| dlweekly | AL (AR) | 0.01302 | - | x | x | x | - | x | x | x | - | x | x |
| dlweekly_hskill | AL (HS) | 0.1 | - | x | - | - | - | - | x | - | - | - | - |
| dlweekly_hskill | AL (AR) | 0.01302 | - | x | x | x | - | x | x | x | - | - | x |
| dlweekly_lskill | AL (HS) | 0.1 | - | x | - | - | - | - | x | - | - | - | x |
| dlweekly_lskill | AL (AR) | 0.01302 | - | x | x | - | x | x | x | - | - | x | x |
| Note: This table reports the countries chosen as invalid in tables LABEL:tab:BP-SSIV and LABEL:tab:BP-mult-SSIV for the reanalysis of Basso and Peri (2015). The left columns display the method and the outcome variable used. x denotes a shock selected as invalid. | |||||||||||||
| Analysis | Table, Column | Countries / Excluded SIC codes |
|---|---|---|
| China Shock | ||
| AL | LABEL:tab:ADH, 3 | Broadwoven Fabric Mills, Manmade Fiber and Silk (2221) |
| CIM | LABEL:tab:ADH,5 | Poultry Slaughtering and Processing (2015), (2221), Aluminum Foundries (3365), Ordnance and Accessories, Nec (3489), Industrial Patterns (3543), Household Audio and Video Equipment (3651), Electronic Components, Nec (3679), Measuring and Controlling Devices, Nec (3829) |
| AL AR | LABEL:tab:ADH, 6 | Food (20): 8, Textile Mill Products (22): 4, Apparel & other (23): 4, Lumber & Wood (24): 1, Furniture (25): 2, Paper (26): 2, Chemicals and allied (28): 4, Leather (31): 2, Stone, Clay, Glass, and Concrete (32): 3, Primary Metal Industries (33): 2, Fabricated Metal Prdcts, Except Machinery & Transport (34): 4, Industrial and Commercial Machinery and Computer Equipment (35): 10, Electronic, Electrical Eqpmnt & Cmpnts, Excpt Computer Eqpmnt (36): 8, Transportation Equipment (37): 5, Mesr/Anlyz/Cntrl Instrmnts; Photo/Med/Opt Gds (38): 2, Miscellaneous Manufacturing Industries (39): 2 |
| CIM AR | LABEL:tab:ADH, 7 | Food (20): 13, Tobacco (21): 1, Textile Mill Products (22): 5, Apparel & other (23): 13, Lumber & Wood (24): 6, Furniture (25): 5, Paper (26): 3, Chemicals and allied (28): 8, Petroleum Refining and Related (29): 2, Rubber (30): 3, Leather (31): 3, Stone, Clay, Glass, and Concrete (32): 6, Primary Metal Industries (33): 8, Fabricated Metal Prdcts, Except Machinery & Transport (34): 6, Industrial and Commercial Machinery and Computer Equipment (35): 16, Electronic, Electrical Eqpmnt & Cmpnts, Excpt Computer Eqpmnt (36): 12, Transportation Equipment (37): 6, Mesr/Anlyz/Cntrl Instrmnts; Photo/Med/Opt Gds (38): 8, Miscellaneous Manufacturing Industries (39): 4 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRegional Economic and Spatial Analysis · Regional Economics and Spatial Analysis · Economic Growth and Productivity
Relaxing the Exclusion Restriction in Shift-Share Instrumental Variable Estimation
Nicolas Apfel Contact: [email protected] School of Economics, University of Surrey
Abstract
Many economic studies use shift-share instruments to estimate causal effects. Often, all shares need to fulfil an exclusion restriction, making the identifying assumption strict. This paper proposes to use methods that relax the exclusion restriction by selecting invalid shares. I apply the methods in two empirical examples: the effect of immigration on wages and of Chinese import exposure on employment. In the first application, the coefficient becomes lower and often changes sign, but this is reconcilable with arguments made in the literature. In the second application, the findings are mostly robust to the use of the new methods.
*Keywords: Causal inference, Invalid instruments, Lasso, Shift-Share instrument
JEL classification: C36, C52, F22, F66*
1 Introduction
The shift-share instrument is often used in applied economics to obtain estimates of causal effects. The numerous applications have spanned three decades, beginning with Bartik’s (1991) seminal paper, and included several fields, such as migration, labor, international economics and many other. A shift-share strategy exploits shares at an earlier point in time and current aggregate-level changes to create an instrumental variable (IV). For example, the share of migrants from a certain origin country is interacted with the inflow from that country. The methods proposed in this paper are not restricted to the examples mentioned here, but can be applied to a wide range of studies in which shift-share instruments are used.
The key assumptions and properties of shift-share instruments have been investigated only in very recent work. Centrally, in an important setting the exclusion restriction must hold for all initial shares, for the estimator to be consistent. In practice, a potentially large number of shares cannot have a direct effect on the outcome. This assumption is very strict, because it requires the researcher to have perfect structural knowledge about all shares, which is typically unavailable. The natural question to ask, therefore is: Is consistent estimation still possible when the exclusion restriction is violated for some but not all shares?
In this paper, my main contribution is to show how consistent shift-share estimation is possible, when not all shares fulfil the exclusion restriction. This paper is a practitioner’s guide on how to select invalid shares in the shift-share setting, using two methods developed in statistical learning. I also extend one of the methods that I present to allow for multiple endogenous regressors. This was not possible so far and is a further contribution of the paper.
The proposed methods go beyond existing econometric diagnostics typically applied in this setting. Rotemberg weights, proposed by Goldsmith-Pinkham, Sorkin, and Swift (2020), report the sensitivity to misspecification of the different shares. These weights often fail to provide clear-cut guidance as to which shares should finally be included in the construction of the instrument, because they tell the researcher how large the relative bias of the entire shift-share estimator stemming from the bias of a single industry is. Instead, with the methods proposed in this paper the researcher obtains an estimate for the identity of valid and invalid instruments and a consistent estimate, adjusted from the absolute bias. This bears the advantage that valid instruments do not need to be discarded just because their potential invalidity could lead to bias.
Applying the new methods to the estimation of the effects of immigration and of Chinese import exposure on the US labor market illustrates that the shares selected as invalid in these applications are consistent with those discussed as problematic in the literature. So far, no way to locate invalid shares has been proposed. This paper fills this gap and provides a principled approach to share selection. To make the methods more widely accessible to practitioners, I also provide simple-to-use Stata-programs.
I begin by presenting the shift-share IV and its key identifying assumption, the exclusion restriction. In the shift-share approach, there are multiple class-specific shares and shifts which are interacted to produce the final instrument. Goldsmith-Pinkham, Sorkin, and Swift (2020) show that if the exclusion restriction holds for each class-specific share, the IV estimator is consistent. That is, shares should not be directly correlated with the outcome variable through unobservable shocks or longterm effects. The exclusion restriction from the shares perspective is a sufficient condition for the consistency of the shift-share IV estimator. Instruments which fulfil the exclusion restriction are called valid, while those that do not fulfil it are called invalid. This definition of validity assumes that all instruments are related to the treatment. This exclusion restriction is very restrictive because it must hold for all classes. While the general idea behind the shift-share IV is credible, typically structural relationships between instruments and outcome variables are difficult to exclude for each single class.
In Section 3, I show how to obtain consistent estimators, when many shares are invalid. To achieve consistency, invalid shares are selected using two methods: the adaptive Least absolute shrinkage and selection operator (AL) by Windmeijer, Farbmacher, Davies, and Smith (2019) and the Confidence Interval Method (CIM) by Windmeijer, Liang, Hartwig, and Bowden (2020). The two methods have been developed primarily for the use in Mendelian randomization, which is the application of instrumental variable estimation in genetic epidemiology. In these applications, genetic markers are used as IVs when estimating the effect of an exposure on a health outcome.
The key advantage of the leveraged methods is that they consistently select shares, which violate the exclusion restriction. When the majority of instruments is valid, both metods have so-called oracle properties and when the largest group of shares fulfills the exclusion restriction the CIM has oracle properties. This means that asymptotically the post-selection estimators perform as well as if the researcher knew the identity of invalid IVs. Intuitively, these estimators both exploit the fact that just-identified estimates, which use only one valid IV at a time, converge against the same value. None of the existing methods allow for multiple endogenous regressors. I therefore propose a simple extension of the AL. Here, the exclusion restriction becomes stricter with increasing number of endogenous regressors.
To show the implications and generality of the presented methods in practice, in Sections 4 and 5, I apply them to two empirical examples. I apply the methods to the estimation of the effect of immigration on wages in the US as in Altonji and Card (1991) and Card (2001) and of Chinese import exposure on manufacturing employment, following Autor, Dorn, and Hanson (2013). These two empirical examples are representative for a long series of applications in international and migration economics, which rely heavily on shift-share IVs.
The use of the new methods suggests a lower effect of immigration on wages in the US. Using data from the US, the coefficients for the estimates that do not account for endogenous shares are positive. Using the estimators which adjust for shares selected as invalid, the estimates become smaller, often even change sign and retain statistical significance, when they were significant in the standard estimations. For example the effects on high-skilled wages change from 0.52 to -0.53 (). Among the selected there are many countries which were suspected of invalidity in the literature, such as the Philippines (Card, 2009). When using a model with lagged immigration, the standard shift-share analysis produces estimates which point in different directions than expected. When using the proposed extension, the coefficients get switched in the expected direction again. Overall, the proposed methods seem to induce a large qualitative difference and should be used as a robustness check in migration settings.
When estimating the effect of import competition on employment, a large number of instruments is selected as invalid in some settings. It is noteworthy that many of the industry classes which have been discussed as problematic in Autor, Dorn, and Hanson (2013) and are likely to affect estimates (Goldsmith-Pinkham, Sorkin, and Swift, 2020) are selected as invalid.
These two applications illustrate the value of the proposed methods. The identity of chosen shares is consistent with economic intuition, because many of the origin countries and industries chosen as invalid are also discussed as being potentially problematic in the literature. This speaks for the plausibility of the outcomes of the new methods. Also, many shares which are similar to the discussed groups but were not specifically pointed out as being problematic in the literature, have been chosen by the new methods. Results can change qualitatively, when using the adjusted estimators. This shows that the methods are complementary to economic intuition and can help in designing an appropriate shift-share instrument.
In an online Appendix I show an extension of the method to the multiple regressor case, a summary and illustrations of the methods as well as simulations confirming that in applications with shares as IVs with increasing sample size the performance approaches that of the estimator which uses only valid shares. This holds already for relatively small sample sizes. Second, allowing for weak instruments and stronger direct effects of the instruments on the outcome does not change the fact that the estimators converge to oracle performance quickly. In a third set of simulations, I test the performance of my extension to multiple endogenous regressors. The simulations confirm that with increasing number of regressors, the allowed number of invalid IVs becomes lower.
This paper relates to two strands of the literature. Recent work has brought forward two ways of motivating shift-share research designs: one which is justified by quasi-random shocks and one which stresses the exclusion restriction for shares only. Borusyak, Hull, and Jaravel (2020) show consistency of the shift-share estimator when shocks are quasi-randomly assigned, conditionally on the shares. Also in this setting, Adão, Kolesár, and Morales (2019) discuss issues in inference. Unlike these two papers, Goldsmith-Pinkham, Sorkin, and Swift (2020) put forward an interpretation of shift-share designs, according to which the shares express differential exposure to common shocks. The identification of the causal effect relies on the exclusion restriction for shares. Which setting is appropriate depends on the economic question at hand.
This paper mainly relates to the exogenous share setting. Still, in situations in which it comes natural to think of the instrument from the perspective of random shifts, when many class-specific shifts are available these can be used to create multiple shift-share IVs or can be used as IVs directly. The selection methods then select among them instead of among the shares. For example, when imports are available for several high-income countries, in the Autor, Dorn, and Hanson (2013) example, the shifts can be used separately in a regression where the observations are industry-specific.
The approach that I propose is helpful when both of these approaches fail. When the shifts are not random, consistency as derived by Borusyak, Hull, and Jaravel (2020) does not follow. When the identifying assumption is motivated through the shares, but some of the class-specific shares are subject to criticism, but the general motivation of exogeneity still stands, the methods proposed here can offer interesting insights.
This paper also relates to the literature that proposes the use of machine learning methods for causal inference (e.g. Athey and Imbens, 2019). Shift-share IV estimation does not resemble a high-dimensional problem prima facie, because there is only one instrument. Arguably however, all of the shares can be used as separate instruments and the need to select invalid shares substantially increases the complexity of the problem, making the use of machine learning methods appropriate. Mullainathan and Spiess (2017) have pointed out that in the context of economic research, machine learning methods lend themselves mostly to predictive tasks and less to causal inference. This paper provides a remedy for a commonly seen endogeneity problem, which threatens the reliability of causal inference in a wide range of economic studies.
2 Shift-share instrumental variables and
the exclusion restriction
In this section, I present the shift-share setup and the exclusion restriction in terms of shares. I show under which conditions the exclusion restriction is fulfilled and show a setting in which it is plausible that some shares are valid and some invalid. A discussion of indications that a some valid - some invalid setting applies concludes this section.
2.1 Endogeneity problem and instrument
First, consider a linear model with a constant treatment effect :
[TABLE]
where indicates the location and the time period. A discussion of the constant treatment effect assumption can be found later in the text. The outcome variable is denoted by , is the treatment, is an idiosyncratic error term with and denotes unobservable shocks which might be correlated with the treatment, i.e. . For example, the outcome variable is employment growth in a certain region and year, the independent variable is growth of the immigrant share and the unobserved shocks are labor demand shocks which might be correlated with the growth of the immigrant share. I abstract from covariates for ease of exposition.
In model 1, assume the treatment variable has the structure
[TABLE]
where indicates a class (e.g. the industry or the origin country of migrants), is the class-specific share in a certain region and is the region-specific growth-rate (or shift) of that class at time t. For example, is the share of Mexicans in California in 2020 and is the inflow of migrants from Mexico to California in 2020. These shifts and shares are available for classes, i.e. origin countries, in the migration example.
In many settings, can be subject to endogeneity problems such as correlation with unobserved shocks and reverse causality. In this model, the regressor is endogenous when . In the migration context, Mexican migrants may have chosen to settle down in California precisely because of the high wages at destination. Part of the correlation that is measured with ordinary least-squares regressions would thus be due to migrant selection into regions.
To circumvent this problem, a shift-share approach replaces components of the treatment variable by shares and shifts which are presumably unrelated with changes of the outcome variable. For example the share of Mexicans in California relative to Mexicans in the US is replaced with the same share, at a certain base period earlier in time (say 1990), while the growth rate of Mexican immigrants in California is replaced by its equivalent at the national level. The resulting shift-share IV is
[TABLE]
where is the national growth rate of industry (i.e. the shift) at time and is then used to instrument for .
2.2 Exclusion restriction
The exclusion restriction is the key identifying assumption for any instrumental variable approach. In this setting, the exclusion restriction is stated in terms of shares. This is the setting proposed by Goldsmith-Pinkham, Sorkin, and Swift (GSS, 2020). To show fulfilledness, violation and partial violation of the exclusion restriction, I set up a simple model. The structural equation is augmented by the shares , with coefficients which model the direct effects on the outcome. This is the definition of validity found in Kang, Zhang, Cai, and Small (2016).
The model becomes
[TABLE]
Equation 4 denotes the first stage. Relevance is given when . When all shares are to be used as instruments, separately, relevance is given when for all in equation 5. This paper focuses on the exclusion restriction. Relevance is plausible because the underlying idea of this instrument is that immigrants settle in regions where they find communities of earlier migrants from their same country of origin, for example because they rejoin family members or there is a network of their country of origin which eases their arrival. This is why the shift-share instrument has also been called “network”, “enclave” or “past settlement instrument”. The higher probability to settle in regions in which communities of their same origin country can be found creates a correlation between past and present settlement, and the instrument is relevant.
Shares might fail validity because they have a direct effect on the outcome, as measured by but they might also need to be discarded because they are related to the outcome through unobservable shocks, . To show this, I allow for a non-zero correlation between current and past unobservable shocks:
[TABLE]
Now, assume that the past unobservable shocks can be written as
[TABLE]
Then, the structural equation becomes
[TABLE]
where . In order for the initial shares to be valid instruments, they should not be directly related with the outcome () and they should not be related to the initial shocks () or there should be no serial correlation between initial and current shocks ().
Next, I summarize the above and introduce the definition of share validity in the context of this model to state the exclusion restriction more easily.
Definition 1**.**
**Validity
A share is called valid when (1) and (2) at least one of the following two holds: or .**
In the migration example, the first part of validity means that there is no adjustment through other factors of production. The second part means that unobserved shocks are not related to initial shares and/or these shocks are not correlated over time. The strict exclusion restriction can now be stated as
Assumption 1**.**
**Strict exclusion restriction: All shares are valid. **
Under the strict exclusion restriction and relevance, the shift-share IV estimator is consistent (Proposition 2 in GSS). Note that when Assumption 1 is fulfilled, the shifts do not play a role for the validity of the instrument.
Another way to achieve consistency of the estimator is relying on random shifts. This is the setting in Borusyak, Hull, and Jaravel (2020) and Adão, Kolesár, and Morales (2019). Which of the settings should be considered is dependent on the application. Still, the methods proposed here are also applicable to the random shocks setting of Borusyak, Hull, and Jaravel (2020), when there are multiple shifts. The following sections discuss how this can be achieved.
Applications in labor and migration economics often are related to share exogeneity, because they stress that past shares are not directly related with the outcome of interest and are hence valid. Twenty-one examples for this are listed in Table 2 in the Appendix. This list is not exhaustive. In the mentioned papers, the reader can find explicit statements that share exogeneity motivates the validity of the shift-share IV strategy.
2.3 Violations of the exclusion restriction
The definition of validity in the preceding section makes it clear that violations of the exclusion restriction can come from two different sources. First, a non-zero invalidates the shares. In the migration setting, Jaeger, Ruist, and Stuhler (2020) warn that there might be direct effects through general equilibrium adjustments. The concern is that the economy reacts dynamically to migrant inflows. If this is the case, there is a direct correlation between instrument and outcomes, through native labor, capital and other general equilibrium adjustment channels, invalidating the instrument. One way that this might apply is illustrated by Borjas (2003): if migrants choose to move to regions with persistently high wages and native workers choose to migrate in response to the immigration of foreign workers, then the effect of immigration is positively biased.
Second, when unobserved shocks today and at the initial period are correlated () and the initial shocks are related with initial shares (), this induces a non-zero correlation between instrument and error term. A violation is plausible, because serial correlation of unobservables is typically discussed in the literature (see Table 2 and Jaeger, Ruist, and Stuhler, 2020) and initial migrants might well have been attracted by economic conditions. In principle, the bias could go in both directions because migrants might endogenously select into regions with higher wages, or into regions with lower growth potential.
The exclusion restriction is strict in the sense that it must hold for all shares. What looks like a single exclusion restriction in a just-identified model is in fact a set of exclusion restrictions. Therefore, the researcher needs to feel comfortable defending the exclusion restriction for Mexicans, Cubans, Canadians, Indians and all origin countries used when constructing the IV.
In practice, it is very difficult, if not impossible to credibly uphold the strict exclusion restriction. While building an intuition about which shares are valid might be feasible, arguing that none of them had a long-term effect or was correlated with initial shocks is very restrictive. Thinking about which factors determined migrant settlement at an initial point in time makes it clear how difficult and hypothetical such an argument is destined to be. Institutional knowledge about which origin country group was mostly drawn into cities which were experiencing a boom at the time of settlement is typically unavailable. This holds true especially in settings in which a large number of countries of origin is used. Such detailed knowledge about the structural mechanisms at work is only available for very few countries, if any.
Until now, there have not been attempts to make shift-share designs robust to violations of the exclusion restriction in Assumption 1. GSS propose computing sensitivity-to-misspecification (Rotemberg) weights, which indicate by what percentage the bias of the shift-share IV estimator changes if the bias from a certain share increases by one percent. The authors point out that one should argue prudently for the validity of shares associated with large weights. While these weights indicate the relative importance with which an individual invalid share contributes to the bias of the estimator, the latter can still be considerable in absolute terms, even if only shares associated with low weights violate the exclusion restriction. Therefore, it does not suffice to argue for the validity of the shares associated with the largest weights to make a case for a low bias in absolute terms.
2.4 Some valid and some invalid share setting
These reasons for violations of the strict exclusion restriction indicate that in many settings it can at best be hoped that some but not all shares are valid. The general share validity setup as stated by GSS might be credible, but not for all shares.
The migration example applies to a setting with partial violation of the exclusion restriction for the following reasons. First, when , some migrant groups might be related with labor demand shocks at the base year (), while others are not. The absence of correlation with unobservable shocks is credible for some shares, because only some origin country groups might have migrated mainly because of economic reasons. This is in line with Jaeger (2007), who finds that migrants with employment visa where most responsive to economic conditions in their location choice. If the visa composition varies by origin groups, then some shares might have been driven mostly by factors orthogonal to economic conditions. Jaeger (2007) also finds that in the beginning of the 1970s the share of employment-based visa was low. The increase of employment visa over the decades implies that origin country groups in a later base period are more likely to be invalid.
Second, there are multiple sets of shares, which vary by base year. Some base years are correlated with the current shocks. Then for some years, , while for others , when the correlation breaks after a few decades. Third, some origin country groups might have had long-term effects on wages, while the effects of others have worn off quickly. This might be the case when origin country shares which consisted mostly of people with family visas did not affect other factors of production in the long-term.
In applications, the discussion of single shares as potentially problematic indicates that the researchers think of a setting in which some shares are valid, while others are invalid. Another telltale sign of such a setting is when researchers report Rotemberg weights and exclude the shares with the highest weights as a robustness exercise. The questions in the application of this diagnostic are: “By how much does the bias of the estimator change, if a certain share is invalid? What happens if we assume that the most influential shares are invalid and exclude them from the estimation?” These questions imply that it is feasible that some shares are valid, while others are invalid.
In the literature, there is also evidence that such a some valid - some invalid setting is indeed the case. Tabellini (2020) raises the concern that specific origin country shares violate the exclusion restriction because Italian or Irish migrants could have chosen their city of location endogenously, based on the possibility to influence the local economy and politics. Hunt (2017) and Wozniak and Murray (2012) use adapted versions of the shift-share instrument where certain origin countries are excluded from the construction.
Arguably, the random shocks setting can be used when the strict exclusion restriction fails, but when the shocks are not numerous and random, which is often the case, this alternative approach is of little help. This offers a further setting where the some valid - some invalid IV setting applies. Several shift-share IVs can be constructed by using various push factors of emigration as shifts, such as economic, conflict- or civil liberties related variables. One might argue that economic variables are most likely to be related across countries, while political variables at origin are more likely to be unrelated with the local economic outcomes at destination. This setting is not based on the validity of shares and illustrates that the methods are in fact more widely applicable, also to the Borusyak, Hull, and Jaravel (2020) context.
3 Selection of valid IVs in shift-share estimation
In this section, I introduce how to obtain modified estimators which are robust to invalid shares. I present the general procedure, the leveraged methods and extensions of these methods.
3.1 Two-step procedure
The idea of the procedure is to preselect valid shares beforehand with methods that will be presented in the following. I first introduce some notation. Let be the matrix of valid IVs with the set of valid IVs and the set of IVs selected as valid. Let be the matrix of invalid IVs with the set of invalid IVs and the set of IVs selected as invalid. Further, is the number of valid and is the number of invalid IVs.
In short, the procedure works as follows:
Use Adaptive Lasso or Confidence Interval Method with as outcome, as exposure and instrument with the share matrix . The share matrix consists of all shares that the researcher believes to be valid.111The outcome and exposure are denoted by vectors and , while the matrix collects the share instruments. Generally, scalars are in lower-case, vectors are in lower-case bold and matrices in upper-case bold. 2. 2.
Use shares chosen as valid (associated with ) for constructing the corrected IV
[TABLE]
and estimate the model with
- (a)
2SLS or limited-information maximum likelihood with the selected shares 2. (b)
the adjusted shift-share IV.
It is important that the shares selected as invalid are controlled for. The invalid shares can only be omitted from the regression, if they are uncorrelated with the valid shares. However, this is unlikely to be the case in practice. Consistency of the proposed method follows directly if the selection methods used in the first step consistently select the invalid shares.
When validity is plausible only with random shifts and there are multiple shifts, one can also apply an industry-level regression with multiple shifts and select shifts instead of shares, analogously to above. Disregarding which source of validity is put emphasis on, the preselection of variables starts with theoretical arguments. The set of shares (or shifts) selected is hence the intersection of the shares considered to be valid by the researcher and the algorithm.
3.2 Relaxed exclusion restriction
In this section, I start with the critical assumptions needed for identification in the adaptive Least absolute shrinkage and selection operator (AL) by Windmeijer, Farbmacher, Davies, and Smith (2019) and the Confidence Interval Method (CIM) by Windmeijer, Liang, Hartwig, and Bowden (2020), that I will use in this paper. Descriptions of the methods can be found in appendix A and in the original papers.
The properties of the methods that will be leveraged to improve shift-share estimation are the so-called “oracle properties”. Oracle properties mean consistent selection of invalid IVs and convergence in distribution to the ideal (oracle) estimator that uses the model under perfect knowledge about the identity of invalid IVs. In Appendix A.5 I describe the oracle estimator more closely.
Definition 2**.**
Oracle properties
Consistent selection of invalid IVs: 2. 2.
Convergence in distribution: ,
where is the variance of the oracle estimator.
In other words, if an estimator has oracle properties, it works as well as if one knew the true identity of invalid IVs. The AL has oracle properties when the majority of IVs is valid. All of the IVs also need to be relevant, as noted in equation 5.
Assumption 2**.**
Majority condition:
The Confidence Interval Method has oracle properties when the largest group of IVs is valid. The plurality condition in Windmeijer, Liang, Hartwig, and Bowden (2020) states that the group of valid IVs is larger than any other group. A group is defined as a set of IVs associated with an estimate which asymptotically deviates from the true by the same constant . For the valid group, is zero. Formally, the plurality exclusion restriction is
Assumption 3**.**
Plurality exclusion restriction:
To compare these two assumptions, consider the following example: there are five IVs. The true effect is . For three of these IVs: and hence the three IV-specific estimands are , while the remaining estimands are with . In this example: while e.g. and . Clearly, the majority assumption is fulfilled. The plurality assumption is also fulfilled, because the largest group of IVs is valid. When only two IVs are valid and the third now has an estimand which is , the majority is violated, because only 2/5 IVs are valid, but the plurality is still fulfilled, because there is one valid group of two IVs and three singleton groups. Therefore, the plurality assumption can still hold even when the majority is violated. Next, I discuss the choice of methods. I introduce how the two methods work in Appendices A.1 and A.2.
3.3 Choice of methods
The procedure builds on two methods from an emerging literature that investigates IV estimation in presence of invalid IVs. The proposed methods are the only ones which combine the following four benefits.
First, they are computationally feasible. Andrews (1999) requires to search over all possible models, which is computationally infeasible when the number of IVs is moderately large. Second, they do not require a priori knowledge about an initial set of valid IVs. Caner, Han, and Lee (2018) also allow for invalid IVs when a set of valid IVs is known a priori. Third, the methods do not need assumptions on the correlation of first-stage and structural parameters. Kolesár, Chetty, Friedman, Glaeser, and Imbens (2015) assume that first stage and direct effects are uncorrelated, but in applications, this assumption is rather strict. Finally, the direct effect of invalid IVs on the outcome need not be close to zero. This needs to be the case in Conley, Hansen, and Rossi (2012), where additionally prior knowledge on possible values of is needed. The methods used in this paper allow for arbitrarily strong direct effects. In fact, their performance even improves when the direct effects are large.
In the following, I apply the methods to two real-world examples. I first reproduce the original estimates by using the standard shift-share IV which uses all shares, irrespectively of their validity. I then compare this regression with the result of the adjusted estimators, using AL and the confidence interval method. In the Appendix (Section B) I also apply the methods in a Monte Carlo simulation that illustrates how the methods work with weaker IVs and strong violations.
4 Example 1: The effect of immigration on wages
4.1 Setting
The first empirical application is the estimation of the effect of immigration on wages in the United States. Basso and Peri (2015) estimate the linear model222I choose to use this paper as a reference even though it is unpublished, for the following reasons: the number of locations is large, which is helpful, because the methods I use make asymptotic arguments. Many of the published papers for which data is available have observation numbers which are low. For example, Card (2009), which GSS use as an illustration, uses only 124 city-observations.
[TABLE]
with three time periods (1990, 2000, 2010) and 722 commuting zones . On the left hand side, stands for the three dependent variables used in separate regressions: the change in log weekly wages, and the change in log weekly wages of high- and low-skilled workers. On the right-hand-side, is the change in share of immigrants in total employment and is the coefficient of interest. Decade fixed-effects are denoted by and is the error term. Commuting zone fixed effects are accounted for by first-differencing.
As discussed before, estimating by OLS does not account for migrant sorting into regions: migrants might select into more prosperous or declining regions, creating a correlation between migrant location and outcome, which cannot be accounted to the impact of immigration. To tackle this problem, a shift-share IV, which uses origin-specific migrant shares in 1970 and changes in migrant populations, is used. The shift-share IV is
[TABLE]
where is the share of immigrants from country in a location at base period 1970, and 19 origin country groups are used. The change of immigrants from country is denoted by . All of the origin countries are assumed to plausibly fulfill the exclusion restriction a priori.
The SSIV estimates are in column 1 of LABEL:tab:BP. The coefficient estimate of the change in log weekly wages of the natives is 0.09, but it is insignificant. The 2SLS estimate is 0.479 () and the LIML estimate is 0.568 (). For the change in log weekly wages of the high-skilled, the coefficients are 0.35 for SSIV, 0.519 for 2SLS and 0.672 for LIML. The coefficient is significant for 2SLS. For low-skilled workers, the effects on wages are negative at -0.66 and statistically significant for the shift-share analysis, and they are close to 0.1 for 2SLS and LIML. Overall, the standard estimates suggest positive effects on high-skilled and null or negative effects on low-skilled wages. The first-stage F-statistics are at 171.3 for the overidentified models and at 21.8 for the SSIV.
The possible violations in the migration context have been discussed in section 2.3. However, it is unclear whether these concerns really applied to some origin country groups and if yes to which. The remainder of this section shows the shares selected as invalid and how large the adjusted estimate is.
4.2 Results
The results of applying AL and CIM on the immigration example are in Table LABEL:tab:BP. Each panel presents the results for one of the outcomes. A list of selected countries can be found in table LABEL:tab:ExclCountriesBP. Overall, with the new methods, the coefficients of immigration decrease and often switch sign. The decrease in coefficients tends to be stronger when CIM is used in the selection step.
When choosing the significance level of (0.01375) in the downward testing procedure as proposed in WLHB, no country is chosen as invalid. Thus, the adjusted estimators are identical to the original ones (col. 1). Bowsher (2002) shows that the use of many IVs leads to low power when using the HS test. Larger significance levels of the HS test are more conservative, which is the inverse logic as with conventional tests of coefficient significance (Roodman, 2009). Hence, a more conservative strategy would be to set the threshold to a more conventional level, for example to or . Increasing this threshold in the testing procedure leads to the selection of a few countries for low-skilled wages (Panel C), but does not change the results qualitatively.
One might be concerned that the IVs are weak and hence the HS test is unreliable. To address this concern, I also use the Anderson-Rubin test in the downward selection procedure. As a threshold I use , as originally proposed in WFDS. Now, both methods select many IVs. Often, more than half of shares are selected as invalid. If a majority is invalid, AL in fact does not have oracle properties. I therefore rely on the CIM. The preferred analyses are hence those in the last column of table LABEL:tab:BP for 2SLS and LIML. All estimates decrease strongly, and mostly become negative.
For overall weekly wages, the coefficients become negative and are statistically insignificant. For wages of the high-skilled, the estimates from overidentified models become negative and statistically significant when selecting via CIM, which is in stark contrast with the original results. Interestingly, the absolute size of coefficients is very similar to the original ones, with the difference that they changed direction. For wages of the low-skilled, now 15 countries are chosen as invalid by CIM. The coefficients of 2SLS and LIML are negative but none of them are significant. This might be because the F-statistic becomes low. Still, for most other analyses the F-statistics are still reasonably high.
It is reassuring to see that the differences between 2SLS and LIML estimators are smaller with the corrected as compared to the standard estimators. The remaining differences in the adjusted shift-share (SSIV), 2SLS and LIML estimates may stem from different reasons. First, LIML approximately eliminates the finite sample bias that is due to weak instruments when using 2SLS. Second, the weighting scheme of each just-identified IV estimate differs across SSIV and 2SLS. The weights shown in GSS are dependent on the shift variable. The weighting implicit in the 2SLS estimator which uses only shares, does not take shifts into account. Therefore, different results may also arise because different methods estimate different weighted combinations of just-identified estimates.
4.3 Results for dynamic effects
Taking into account the critique of Jaeger, Ruist, and Stuhler (2020), who argue that using a single regressor compresses the long- and short-term effects, means to include lagged migration. The equation now becomes
[TABLE]
where is the coefficient of interest for the contemporaneous impact and denotes the coefficient of interest for the lagged impact. I include an additional shift-share IV, now using 1980 as a base period. When using 2SLS or LIML, the number of shares increases to 38 (19 per base year).
The results are shown in Table LABEL:tab:BP-mult. The standard estimates always suggest positive effects in the short and negative effects in the long run, across estimators and outcomes. This is exactly the opposite of what Jaeger, Ruist, and Stuhler (2020) expect: partial equilibrium effects should be negative and general equilibrium adjustments are expected to offset these negative effects. These unexpected coefficient estimates might be due to the same endogeneity problems as before: both base years of the number of foreign borns might be directly correlated with the endogenous treatment.
Using the extension of the adaptive Lasso presented in Appendices A.1 and A.6, with the Hansen-Sargan downward testing procedure, only for wages of the high-skilled the UK and Ireland are selected as invalid once and the adjusted estimates still have the same signs.
When using the Anderson-Rubin test instead, nine to eleven countries are selected for each outcome. The countries selected have a large overlap. Most variables selected come from the year 1980. I focus on 2SLS and LIML results, because the Cragg-Donald statistic of the SSIV is very low. The estimates now have the expected sign: the coefficient of contemporaneous immigration is negative and that of lagged immigration is positive.
4.4 Overidentification from multiple shifts
One might fundamentally question the share exogeneity interpretation of the shift-share design in the migration setting. In principle all shares could directly affect wages. If this is the case, the shift-share IV can be motivated via random shifts as in Borusyak, Hull, and Jaravel (2020). This offers an alternative starting point for the selection methods. This shows that the new methods are not restricted to the exogenous share world of Goldsmith-Pinkham, Sorkin, and Swift (2020).
If validity was still a concern for all shares, one additional way to check for robustness of the results is to motivate the exclusion restriction through quasi-random shifts and to use country-of-origin specific push factors related to war, civil liberties or natural disasters. One example for such an approach is Llull (2017). The selection methods can then be used with the different shifts in an overidentified model. There would be reason to believe that some instruments are valid while others are invalid. Some shifts are related to war, others to politics, again others to other country-of-origin factors. Some might be correlated with unobservable shocks which drive wages at destination, for others it is difficult to think of a reason why that would be the case.
If multiple shifts are available, multiple shift-share instruments can be generated and used in an over-identified model. The SSIV constructed with shocks that fulfill the conditions in Borusyak, Hull, and Jaravel (2020) can then be selected. Alternatively, one could also directly use the class-level regression, and use the shocks as IVs directly. The latter approach will be used in the international trade application. Borusyak, Hull, and Jaravel (2020) show consistency of the IV-estimator, taking into account that the data is non-iid. This does not pose a challenge for the selection methods, because the key assumption is that a large-enough group of IV-specific estimators is consistent, regardless of how that consistency is established.
Moreover, selection of a particular group does not necessarily mean that the other groups are invalid instruments. Different shocks might produce heterogeneous effects. The migration inflow due to war might be different from that due to a decrease in civil liberties, which is more likely to induce migration of the elite.
To illustrate this approach, I used eleven shifts and produced eleven shift-share instruments, still using the 1970 country shares.333The shifts and their sources are as follows: migration (as in the preceding subsections), battle-related deaths, onesided violence and nonstate violence (Uppsala Conflict Data program, www.ucdp.uu.se), population (World Development Indicators), Civil Liberties, Political Rights, Freedom House Status (Freedom House, 2020), Polity Score (Polity V project), Press Freedom Status and Press Freedom Score (Freedom House, 2017). I directly estimate the dynamic model suggested by Jaeger et al. (2020), including lagged immigration. My findings can be found in table LABEL:tab:BP-Shifts2. In brief, the main results stay the same: with the HS-testing procedure, only few IVs are selected as invalid, while with the Anderson-Rubin test, more IVs are selected. With the AR testing procedure, all estimates turn negative but insignificant. This could be due to a loss in relevance, as the first-stage F-statistic becomes low. The shifts selected as invalid can be found in table 3. The variables that are selected most often are the IVs constructed with battle-related deaths, with the political freedom indicator (every analysis) and with the Press Freedom Score (five times). Since the last two express similar things, it makes sense that they constitute a group.
4.5 Discussion
There are five key takeaways from the application of AL and CIM to the estimation of the effect of immigration on wages. First, the results from adjusted estimators suggest a strong positive bias of standard estimates. This is in line with most of the literature, that expects an upward bias. This doesn’t seem to be due to weaker instruments after selection, because the first-stage statistics are still reasonably high and the use of the LIML estimator, which has better finite-sample properties in presence of weak IVs suggests the same direction of the bias.
Second, the selection of shares is consistent with economic intuition. The selection of Central and Eastern Europe (including Russia) in almost all analyses can be explained by the emigration from the Soviet Union in the 1970s and the Post-Soviet countries in the 1990s. The emigrants predominantly chose coastal cities which had large country-of-origin communities, but also cities which had experienced lasting prosperity. The conditions which have made these places attractive might be correlated over time. This makes a violation of the exclusion restriction likely. The share of migrants from the UK and Ireland, which has been picked by Tabellini (2020) as an example for possibly invalid shares is chosen nine times. When shares from multiple base years are used, mostly IVs with base year 1980 are selected. This is consistent with more job-related visa in 1980 as compared to more family-related visa 1970 and is in line with the common practice in the literature of choosing longer lags to break eventual correlation between shares and current unobservable shocks.
Third, the application shows the added value of the methods to existing econometric tools. The Rotemberg weights proposed by GSS help understand which share’s invalidity is most likely to bias results, but it does not tell the researcher whether this bias is large in absolute terms and it does not lend guidance on which country should be excluded effectively. A few of the countries flagged as potentially problematic by high weights have been selected. The Philippines have received the highest sensitivity-to-misspecification weight in GSS. Indeed, they have been selected seven times by AL and CIM, and adjusting for them results in large qualitative changes of the coefficients.
However, if the Rotemberg weights for some origin countries are low, their invalidity could still contribute to a large part of the inconsistency of estimators. Notably, many country groups which are not worrisome according to the top-5 Rotemberg weights, such as Central and Eastern Europe have been chosen as invalid, while some that have high weights have not been selected. This shows how the new methods can guide the selection of shares beyond the discretion of researchers.
Fourth, the fraction of shares selected as invalid can be high. A maximum of 15 out of 19, are selected as invalid suggesting that the majority assumption is likely to be violated. This suggests that the adaptive Lasso can not consistently select valid IVs in the migration setting with one regressor. Also, there is a large overlap of selected countries as invalid, by variables and methods used. This is reassuring in that it confirms that the share selection is not erratic.
Fifth, when including lagged immigration, the coefficient estimates have the expected sign, only with the proposed AL adjustment. The origin-country variables selected are mostly those from the year 1980. This is consistent with Jaeger, Ruist, and Stuhler (2020), who worry that spatial adjustments might take around ten years444“Research on regional evolutions in the U.S. concludes, however, that spatial adjustments can take around a decade or more.” (p. 10). In my analysis, I also use data from 1990. Hence, my analysis confirms Jaeger, Ruist, and Stuhler’s (2020) result that using contemporaneous and lagged immigration can help uncover the effects of immigration. It also confirms the common practice of taking longer lags of the country-of-origin distribution to plausibly fulfill the exclusion restriction.
5 Example 2: The China Shock
5.1 Setting
Autor, Dorn, and Hanson (2013, ADH) study the impact of Chinese imports on employment in manufacturing in the US. The regression equation is
[TABLE]
where the left-hand side is decadal change in manufacturing employment in commuting zone , is the coefficient of interest and is import exposure, defined as . Here, are the shares of workers in commuting zone employed in industry at time and measures the growth of imports from China in industry . This regression is estimated in first-differences to exclude commuting-zone fixed effects and augmented by a time dummy and a set of commuting-zone-level controls. The time period used ranges from 1990 to 2007 and there are 397 industry shares, indexed by four-digit SIC codes.
The endogeneity issue that affects this analysis is that both employment and imports might be correlated with unobserved shocks to US demand. To address this problem, a shift-share instrument is used, which replaces the share of workers with the same share ten years earlier and uses import exposure of other high-income countries rather than the US. ADH find a coefficient of -0.596. I report the same coefficient for the original estimate in row 1, column 1 (1,1) of table LABEL:tab:ADH. When using all shares separately in a 2SLS estimation, a lower coefficient of -0.183 is found (2,1). The same model is also estimated by LIML (3,1). These are the baseline coefficients to which the adjusted estimation results will be compared.
5.2 Results
One might understand the analysis from the viewpoint of GSS in the framework of a pooled exposure research design, in which employment shares capture local exposure to common import shocks. My results show which industry shares one should worry about if one chooses to rely on share validity.
ADH discuss the possible invalidity of three specific industries: the computer industry, construction materials as well as apparel, footwear and textiles. GSS show that electronic computers display the highest sensitivity-to-misspecification weight, making the validity of this specific share especially important.
The results of the AL-adjusted IV estimators are presented in table LABEL:tab:ADH. Using AL, the coefficients change by little. With the default threshold of the over-identification test at () as in WFDS, the test does not reject the Null hypothesis, all shares can be used for the construction of the shift-share IV and all coefficients are identical to the original estimates (column 2 of table LABEL:tab:ADH). To account for the problem of too many instruments in the HS-test, I set the threshold to . Now, only one industry is selected. When excluding this industry from the construction of the instrument in column 3, the estimate is virtually unaltered.
When applying CIM, the industry chosen by AL and seven additional industries are selected as invalid. The estimates becomes larger in absolute terms but the confidence interval still includes the original estimate. Hence, the application is also robust to omitting shares chosen as invalid.
When estimating the post-selection model by LIML, the estimates are very different from 2SLS. This might indicate that many IVs are weak and 2SLS is therefore biased. In order to adjust for this, I use the Anderson-Rubin test in the downward testing procedure instead of the HS-test. When using adaptive Lasso with the AR-test, the method now selects 63 shares as invalid. When selecting with CIM and the AR-test (column 7 of table LABEL:tab:ADH), even 128 shares are chosen as invalid. The estimate for SSIV now moves to -0.92, but the 95% significance interval still includes the original estimate. The 2SLS estimate becomes positive, with a coefficient of 0.12, while the coefficient of LIML is positive and large.
The industries chosen as invalid are listed in table 4 of the supplementary material. These industries concord with those discussed in ADH. The first industry labeled as problematic was the computer industry. Industries belonging to electronic and computer equipment (SIC35 and 36) are among those chosen most often, constituting up to 29 percent of the shares selected as invalid. The second industry class that is discussed in ADH is related to construction. Up to 16 percent of selected shares come from industries that are associated with construction (32, 33, 34). The third industry discussed in ADH is apparel, footwear and textiles. Also, up to 16 percent of shares selected comes from these industries (22, 23, 31).
The analysis offers additional information beyond the sensitivity to misspecification illustrated by Rotemberg weights. Games, Toys and Children Vehicles as well as Household Audio and Video Equipment have obtained the second- and third-largest Rotemberg weights in GSS, and they have been selected by AL and CIM. The industry with fourth-largest Rotemberg has been selected by CIM and the one with the fifth-largest weight has been selected by both methods. However, the SIC-4 industry with the largest weight has not been selected, while numerous industries from the SIC-2 industry related to it and SIC-codes from the food sector have been selected. This shows how the proposed methods can guide share selection in expected ways but it can also inspire to think about the possible endogeneity of some other industries.
Overall, if one believes that some shares are valid and some invalid the selection procedures single out industries which are also in harmony with the ones discussed by ADH. The results are relatively robust to the use of the new methods.
5.3 Overidentification from multiple shifts
If share exogeneity is not credible, one can also understand exogeneity of the shift-share instrument from a random shift perspective, as in Borusyak, Hull, and Jaravel (2020), who run an equivalent industry-level regression which uses the shift-variable as instrument. In table C4 of their paper, they use an overidentified model with all eight shifts from other high-income countries instead of the aggregated shift.555In this analysis, the authors add lagged sum of shares for each period as control variables. I follow this modeling choice to keep results comparable. The common concern is usually that imports of high-income countries are correlated with unobservable shocks. The estimates for these estimations lie at roughly -0.24 for both 2SLS and LIML. It is reassuring to see that the coefficients of the two methods coincide.
I reproduce the results from this table in unreported estimations. The HS- and AR-tests do not reject at any conventional significance level, and therefore the selection algorithms select all eight shifts as valid. This robustness to the use of the new methods illustrates how in this example identification should be thought of in terms of shifts. This is in line with Borusyak, Hull, and Jaravel’s (2020) and Adão, Kolesár, and Morales’s (2019) interpretation of the exclusion restriction as shock exogeneity.
Researchers can also leverage a larger set of import shifts from even more countries in which they are confident that most of the shifts are valid and select shifts via AL and CIM. This example hence shows that even in settings where the exogenous share interpretation is controversial the two methods can be helpful.
6 Conclusion
This paper proposes adjusted shift-share IV estimators which require that only a majority or plurality of shares is valid. New statistical methods are used to select invalid shares. The STATA-programs in the supplementary material offer a simple way to apply the proposed methods.
In the migration setting, many shares are chosen as invalid and the adjusted estimates are much lower than the original ones, suggesting negative effects of immigration on wages. When including lagged migration, with the adjustments the coefficients have the expected signs. In this setting the proposed methods can be helpful for retrieving a causal estimate. In the China shock example the results are mostly robust to the use of the new methods. The results are also robust to the use of the new methods when the exclusion restriction is motivated through random shifts. In simulations I show that even in settings with weak instruments the estimators can continue to perform well. Severe violations of the exclusion restriction even improve the performance of the estimators in small-sample settings.
In the appendices I provide detailed descriptions of the employed methods, briefly discuss the implications of weak IVs and heterogeneous effects, provide an extension to multiple endogenous regressors and additional simulations.
The methods are complementary to the recent literature on shift-share IVs. Before using them, it is important to think carefully about which source of validity is most feasible. The methods can be most helpful when researchers think about the shift-share instrument from the perspective of valid shares and whenever some shares are suspected to be directly correlated with the outcome variable. If there are many class-specific shocks, for example for multiple high-income countries, there is also scope for applying the methods in the quasi-random shocks setting. When doing so in this paper, my conclusions do not change, qualitatively.
I conclude with two shortcomings of the methods. First, the original methods only allow for one endogenous regressor. In the Appendix, I developed an extension of AL to multiple endogenous regressors which calls for stricter qualified majority assumptions. These new assumptions are confirmed in simulations. Further improvements would be to develop methods which can be readily extended to the multiple endogenous regressor case without making the exclusion restriction stricter.
Second, the validity of all shares might be a concern. Given that validity of shares relies on similar arguments, it is possible that they are all inconsistent in similar ways. In this case, consistent selection can not be guaranteed. In fact, even though the majority and plurality assumptions are considerable relaxations of the strict exclusion restriction, they are still strict. Importantly, researchers should find a set of variables whose validity can be credibly defended from a theoretical point of view. The methods proposed here complement thorough theoretical considerations and do not replace them; a convincing justification of the exclusion restriction is still imperative for the new estimators.
Acknowledgements
I’d like to thank Kirill Borusyak, David Dorn, Ben Elsner, Helmut Farbmacher, Paul Goldsmith-Pinkham, Chirok Han, Ines Helm, Stephan Huber, Peter Hull, Xiaoran Liang, Jan Stuhler, Frank Windmeijer, Joachim Winter and seminar participants at LMU Munich, TU Munich, DAGStat 2019, IAAEU, Regensburg, the Ammersee Workshop, EALE 2019 and the EALE/SOLE/AASLE World Meeting for helpful comments and discussions. I also thank Gaetano Basso and Giovanni Peri for sharing their data and code. I acknowledge funding through the International Doctoral Program “Evidence-Based Economics” of the Elite Network of Bavaria.
Supplementary material
- •
STATA-program to run Confidence Interval Method
- •
Code to reproduce results in sections 4 and 5 and in Appendix B.
- •
Methodological appendix with details on the methods and additional simulations
Online Supporting Material for “Relaxing the Exclusion Restriction in Shift-Share Instrumental Variable Estimation”
Appendix A Methodological appendix
A.1 Adaptive Lasso
A.1.1 Method
I first present the AL for IV selection developed by Windmeijer, Farbmacher, Davies, and Smith (2019, WFDS). The method consists of two parts. In short, the method chooses invalid instruments to then apply 2SLS with the instruments which have been selected as invalid.
The method consists of three parts. In the first part, an initial consistent estimator is obtained through the median of IV estimates of exactly identified models (Han, 2008). From this estimate , a plug-in estimate can be directly obtained. This estimator is consistent when Assumption 2 holds. The intuition for why the median estimate is consistent is the following: More than one half of IV-estimators which use only one IV at a time are consistent if a majority of IVs is valid. More than half of the points will hence converge to the same value. The median then will pick one of the consistent estimates. Why shouldn’t the analysis stop here? Windmeijer, Farbmacher, Davies, and Smith (2019) show that even though it is consistent, the estimator has an asymptotic bias. Also, the limiting distribution is that of the order statistic of a normal distribution and this distribution is unknown, making inference on the parameter difficult. Moreover, the median estimate does not use the information contained in the additional valid IVs, missing out on efficiency gains.
In the second part, the AL uses the initial consistent estimates as weights. The AL minimization problem is
[TABLE]
where , is the linear projection of on the subspace orthogonal to and is the initial consistent estimate of , directly obtained from . For a given value of the penalty parameter , some entries of will be shrunk to zero. In the third part, IVs associated with an of zero are used as valid in 2SLS estimation and those associated with non-zero coefficients are used as controls.
In summary, the estimation procedure works as follows:
Compile the vector of exactly identified estimates 2. 2.
Take the median 3. 3.
Calculate 4. 4.
Estimate by adaptive Lasso 5. 5.
2SLS with IVs chosen as invalid included as controls and those chosen as valid used as IVs.
The key requirement for the AL to have oracle properties is that it uses an initial consistent estimate. The key assumption for the AL to have oracle properties hence also is that the majority exclusion restriction holds.666According to Theorem 1 and Proposition 3 in WFDS, the adaptive Lasso has oracle properties when the majority exclusion restriction is fulfilled. As compared to the strict exclusion restriction, this assumption is already a considerable relaxation. Moreover, the AL having oracle properties does not depend on the different strength or the correlation of instruments.
For any given sample, the AL is dependent on the value of the tuning parameter . Windmeijer, Farbmacher, Davies, and Smith (2019) show that selection under the majority condition is consistent for any sequence of penalty parameters and , where is the number of observations. They propose to use the Hansen-Sargan statistic in a stopping rule, testing at each AL step, following Andrews’s (1999) downward testing procedure. For each selection of IVs on the AL-path, the J-statistic is evaluated once. The authors first specify a significance level. They propose to use , following Belloni, Chen, Chernozhukov, and Hansen (2012). Successively smaller sets of valid IVs are tested. When a prespecified significance level is exceeded, the testing procedure stops.
In applications, one might be interested in including additional endogenous regressors. However, the methods proposed here do not allow for this. Therefore, I propose a simple extension of the AL, by using an extension of the median estimator. In the multiple endogenous regressor case, the just-identified estimates use IVs and the estimates are stacked into matrices. In this extension I take the median along each dimension. This gives the following vector of marginal medians:
[TABLE]
The key assumption for this estimator to be consistent is that the fraction of exactly identified models, which uses valid instruments exceeds 0.5. I call this the “qualified majority condition”, because the condition on the number of valid instruments, , becomes stricter than the simple majority assumption.
Assumption 4**.**
Qualified majority condition
For the simulations and applications, it is important to know how many valid IVs the new condition requires in the following settings. If we fix , for the minimum needed to achieve an initial consistent estimate is 15 and for it is 17. For and , it is 28. These assumptions are now much more strict than the simple majority condition in WFDS. A more detailed presentation of the method and discussion of the multiple regressor setting can be found in Appendix A.6.
A.1.2 Illustration
In order to illustrate the proposed methods, consider the following toy example. Assume that one is interested in the effect of change in immigration () on change in wages (). The parameter of interest is in equation 3. There are cross-sectional observations from commuting zones in the US. The matrix of instruments is composed of employment shares of immigrants in 1970. There are five origin countries A, B, C, D and E and hence is a matrix. Assume that the effect of immigrants on wages is . For countries A, B and C, the exclusion restriction is fulfilled and there is no direct effect of immigration on wages. However, the shares of countries D and E are invalid, because the base-period settlement of migrants from these countries was driven by economic factors and hence was not as-good-as-random. The selection of these five countries is the result of researcher’s scrutiny, who ignores non-random settlement for countries D and E.
With AL, the first step is to use each share separately to obtain a vector of just-identified estimates. The IV estimates are illustrated in figure 1. The dotted, vertical line shows the true effect . This effect is identified with the valid country shares A, B, and C. Let’s assume that the inconsistency of the country share IV estimators D and E is . The just-identified estimates for country shares A to E are illustrated by the grey circles. In this example, the median of these estimates is . Note that if the majority exclusion is fulfilled, an IV estimator which uses a valid share is always used. From , a consistent estimate of can be obtained by
[TABLE]
This vector is . These estimates do not clearly indicate which of the shares is valid and which invalid, but they can be plugged into the AL minimization problem in equation 15. For a specific , this gives us a new vector of -estimates, , where some entries are equal to zero, for example . The vector which indicates which shares have been selected as invalid is , where zero-entries denote valid shares. The first to third share vectors in , are associated with a zero in the -vector and are hence selected as valid, while the shares with non-zero values of are chosen as invalid. These shares are finally used as instruments directly in the 2SLS, LIML or SSIV estimator.
Beginning with a very large value of , more importance is given to the second part of the adaptive Lasso minimization problem and no country is chosen as invalid, because all elements of are assigned zero. Adaptive Lasso is estimated via the LARS-algorithm by Efron, Hastie, Johnstone, and Tibshirani (2004), which produces a path of models, illustrated in table 1. The lower gets, the more shares are chosen as invalid.
The Hansen-J over-identification test is performed for each model along this path and the corresponding p-value is compared with the pre-specified significance level of at each step. In this illustrative example, the Hansen-test would correctly suggest to select the oracle model in column 3, with countries A, B and C selected as valid and D and E as invalid, because for this model, the p-value of the test is larger than .
A.2 Confidence Interval Method
A.2.1 Method
Next, I present the Confidence Interval Method, which relies on an exclusion restriction which is relaxed even further. Windmeijer, Liang, Hartwig, and Bowden (WLHB, 2020) propose the Confidence Interval Method (CIM) which builds on Guo, Kang, Cai, and Small’s (2018) two-stage hard-thresholding. The idea behind the CIM is that IV estimators which use one valid instrument at a time converge to the same value. The method works as follows:
Set a critical value and calculate a confidence interval (CI) for each just-identified estimate. 2. 2.
Confidence intervals are ordered by their lower endpoints. 3. 3.
Lower endpoints of CIs are compared to the upper endpoints of each CI preceding it in order. If the upper endpoint of the -th interval is larger than the lower endpoint of the -th interval, the estimates are said to belong to the same group. The points and denote lower and upper endpoints of the CI when using the -th instrument. The number of overlapping intervals when comparing from instrument ’s CI downwards is . 4. 4.
The largest group corresponds to the set of estimates with the most overlapping confidence intervals.
Again, the result is dependent on the value of a tuning parameter. This time plays the role of the tuning parameter. For large values of , all CIs will overlap and hence all variables will be chosen as valid. Gradually decreasing the value of narrows the confidence intervals down, and decreases the number of IVs chosen as valid. Analogously to AL, the HS test is used in a testing procedure to choose an optimal level of . WLHB formally prove consistency of the Hansen-Sargan testing procedure.
The exclusion restriction needed for AL is stricter than the exclusion restriction needed for CIM. Why should a researcher then rely on AL? First, AL is more established than CIM. Second, the path of AL is more stable than that of CIM. A cautious researcher should use both methods and compare their results. If many IVs are chosen as invalid, suggesting a violation of the majority assumption, one should concentrate on the results of CIM.
A.2.2 Illustration
The Confidence Interval Method computes confidence intervals for each just-identified estimate and orders them by the lower endpoint of the CIs. Then it counts how often a given CI overlaps with the preceding CIs. The largest overlapping group is chosen as valid. Figure 2 illustrates the method in the toy example presented above. The second comparison, from the confidence interval of country C downwards, already selects the largest group, which includes countries A, B, and C. The other groups include only one or two IVs. In practice, the algorithm starts with a large critical value for the CIs, so that all confidence intervals overlap. Decreasing this critical value produces a selection path analogous to the AL selection path illustrated in table 1 and the algorithm stops when a prespecified level of the Hansen-J test is exceeded.
The two-stage least squares estimators adjusted by AL or CIM now estimate the following model:
[TABLE]
[TABLE]
where the shares of people from countries D and E are additionally used as controls and the rest of the country shares are used as instruments.
The illustration of the methods in figures 1 and 2 can also be used to understand the heterogeneous effect case. Imagine that countries A, B and C still constitute the largest group, but country shares D and E are now valid IVs which estimate effects consistently. The Confidence Interval Method treats countries D and E as a different causal mechanism, reporting the effect of countries A, B and C.
A.3 Discussion: Weak instruments and heterogeneous effects
There are two main concerns with the proposed methods: weak instruments and heterogenous effects. In applications, weak IV bias is a concern, because each share is used individually to predict the endogenous variable. There are three answers to this problem: First, the limited information maximum likelihood (LIML) which has better finite-sample properties than the two-stage least squares can be used after selection. Second, I also use the Anderson-Rubin test statistic in the downward testing procedure to use a test which can detect violations of the exclusion restriction in presence of weak instruments. Third, in simulations I show that the algorithms also perform well, when IVs are weak. Still, these are just practical answers to weak instruments. How to address weak instruments in valid IV selection is the object of future research.
The methods presented in this paper rely on the constant treatment effect assumption. Goldsmith-Pinkham, Sorkin, and Swift (2020) allow for location-specific coefficients . When all the first-stage coefficients have the same sign (monotonicity), each class-specific IV estimates a weighted average of location-specific effects,
[TABLE]
according to Proposition 4.1 in Goldsmith-Pinkham, Sorkin, and Swift (2020). The definition of can be found there. In a LATE-framework the shift-share estimator is therefore a weighted combination of class-specific weighted averages. These class-specific estimates therefore might differ.
Can the selection methods proposed before deal with heterogeneous effects? One practical solution is to perform the analogous analyses for clusters of industries, inside of which the constant treatment effect assumption is believed to hold. If the constant effect assumption is more generally violated, another approach is needed. Combining heterogeneous treatment effects with valid IV selection is also the object of current research and an exhaustive answer is beyond the scope of this paper.
A.4 Additional notation
The projection matrix is , the annihilator matrix is , are the fitted values from running a regression of the endogenous regressors on the instruments and .
A.5 Oracle estimator
In this subsection I introduce the oracle model as defined in Windmeijer, Liang, Hartwig, and Bowden (2020).
[TABLE]
where denotes the true set of invalid IVs. The oracle estimator then is
[TABLE]
where . Under Assumptions 6, 7, 9 (below) and as
[TABLE]
where .
A.6 Details on adaptive Lasso with multiple endogenous regressors
In this Appendix, I provide additional details on the AL as presented in section A.1.
A.6.1 Model and assumptions
With multiple endogenous regressors, the first stages are
[TABLE]
There are now endogenous regressors , …, , which can be subsumed in a matrix , instrument vectors , …, , which can be subsumed to a matrix and error terms and for which can be correlated with cov. The latter covariance measures the endogeneity of regressors in . The coefficient vector of interest is (). The -vector of first-stage coefficients for regressor is . Define as the number of valid instruments, i.e. the instruments for which . If we estimate all possible combinations of exactly identified models, taking out of instruments at a time, we get estimate vectors. Let with be the matrix of one combination of instruments.
Considering the model from the main text (equations 3 and 4) and following WFDS, I assume the following:
Assumption 6**.**
**Existence of first stages
For all possible values of , let be the combinations of the -row of for all . Then assume**
[TABLE]
Assumption 7**.**
**Rank condition
**
[TABLE]
Assumption 8**.**
**Error structure
**
[TABLE]
[TABLE]
Assumption 9**.**
[TABLE]
Assumptions 4, 6 and 7 imply Assumption 3 in Han (2008) (with the difference that in our example we abstracted from covariates). Assumptions 7, 8 and 9 ensure the existence of the Hansen-Sargan test statistic. Assumption 6, is the standard relevance assumption for each just-identified model. This assumption implies, that the first stage coefficients are all non-zero and is crucial for identification.
As will be shown below, the vector of marginal medians of the matrix of estimates from exactly identified models is a consistent estimator of . This proof follows closely the one implied by Han (2008).
A.6.2 Additional examples of the qualified majority
For the two-regressor case, P = 2, it follows:
Corollary 1**.**
When and the number of candidate instruments grows to infinity, the fraction of valid IVs has to exceed 70,7107%.
Proof:
[TABLE]
Multiply both sides in 19 with and take
. ∎
To get a better feeling about how the majority condition is altered with the number of endogenous regressors in the model, we fix at 100. The minimal needed for oracle properties is found by plugging into 4 and sequentially decreasing until the condition is fulfilled. For the fraction of valid instruments needs to be at least 80, for , 85; for , 88; for , 90; for , 96 and for , 99. The relationship between number of invalid IVs and fraction of models using only valid IVs is visualized in 3.
With growing number of endogenous regressors, the number of exogenous instruments needed also grows. In the limiting case, when the number of endogenous regressors is maximal, it is equal to the number of candidate instruments. In this case, we have only one exactly identified model. The method does not provide any benefit then, because it cannot discard any instrument, as the model would then be underidentified. Assumption 4 now becomes a consensus rule: all of the instruments need to be valid.
A.6.3 Consistency of the vector of marginal medians
I rewrite estimators as in the proof of Proposition A1 in Windmeijer, Liang, Hartwig, and Bowden (2020). First, partition the matrix , where is a and is a matrix. is the equivalent partition of the matrix of first-stage coefficients. , then , with
[TABLE]
Each of the estimators can be written as
[TABLE]
Note that the part in parentheses is equal to
[TABLE]
The resulting vector will be
[TABLE]
Hence, the inconsistency of is .
is the matrix, which consists of the stacked row vectors. The easiest way to compute a median for this multidimensional set of points is to take the median along each of the dimensions separately.777I have considered alternatives to the vector of marginal medians, such as the Tukey median or Oja’s simplex median, but have not implemented them because they are not continuous functions. The resulting -dimensional vector is called the vector of marginal medians:
[TABLE]
Proposition 1**.**
Under assumptions 4, 6 and 7, , where is the true -vector.
Proof: Let be the 2SLS-estimator for the -th coefficient when a certain combination of IVs is used. Then
[TABLE]
hence
[TABLE]
where is the -th entry of , i.e. the inconsistency from using at least one invalid instrument in the -set. There are entries in , which is the vector collecting inconsistency terms for a certain regressor, when using all possible combinations of IVs.
The median function is continuous. Hence, by the continuous mapping theorem (CMT):
[TABLE]
Under assumption 4, holds for a majority of entries inside each column. Then
[TABLE]
A.6.4 Adaptive Lasso
Given the consistent estimator , the following procedure is analogous to WFDS. For a detailed overview over the original AL method, refer to WFDS and Zou (2006). A consistent estimator for can be achieved by rewriting the moment conditions :
[TABLE]
The adaptive Lasso (AL) estimator as proposed in Zou (2006) can be used, where the penalty term is weighed by the initial consistent estimate:
[TABLE]
The AL estimator is then retrieved from the conditions E(\hat{\mathbf{D}}^{\prime}\mathbf{u})=E\big{(}\hat{\mathbf{D}}^{\prime}(\mathbf{y}-\hat{\mathbf{Z}}\bm{\alpha}-\mathbf{\hat{D}}\bm{\beta})\big{)}=0
[TABLE]
Appendix B Monte Carlo simulations
The following simulations illustrate that AL and CIM select the invalid shares as invalid, when the sample is reasonably large. The adjusted estimators perform better in terms of bias as compared to the standard 2SLS estimator, which uses all the shares, irrespective of their validity. Moreover, I show that even with weak instruments good selection results are possible and that strong violations of the exclusion restriction even improve their performance. Finally, I show that with multiple regressors a higher fraction of IVs needs to be valid for the AL to have oracle properties.
B.1 Single regressor
B.1.1 Setup
The data is created based on the model in 3 and 4. The coefficient of interest is set to 0 and the elements of the first-stage coefficient vector to 0.6. To create ten shares, I draw a matrix with columns from a uniform distribution between 0 and 0.1. The vector of direct effects of the IVs, , is set to , so that a majority of shift-share products is still valid. In a second simulation, the vector is set to , such that only the largest group of IVs is valid. The error terms and are distributed as
[TABLE]
I vary the sample size from 400 to 6000 observations, gradually increasing by 400. The number of repetitions is 100 for each parameter combination, each time drawing the errors anew.
B.1.2 Results
The two baseline estimators are the standard and the oracle 2SLS. The standard 2SLS is the estimator for which all shares are assumed to be valid, and the oracle shift-share estimator is the ideal estimator for which only valid shares are used as IVs and invalid ones are used as controls. I compare these two estimators with the 2SLS estimator adjusted by AL and CIM. I report the median absolute deviation (MAD), the mean number of IVs selected as invalid, the frequency with which all invalid IVs have been selected as invalid and the F-statistic of the oracle model.
The main result is that the adjusted 2SLS estimators outperform the standard 2SLS estimator in the majority setting for all sample sizes and approach the performance of the ideal estimator with growing sample size. In the plurality setting only the CIM approaches the performance of the oracle estimator. This is in line with the predictions: When the majority rule holds, both methods should work well and AL is expected to fail when only the plurality rule holds but majority does not.
The graphs on the left of Figure 4 depict the setting in which a majority (seven out of ten) of shares is valid. The MAD of the standard estimator is at about 0.1 and does not decrease as the sample size gets larger. The oracle 2SLS estimator’s median absolute deviation is below 0.05 and gets closer to zero as increases. Notably, the MAD of the 2SLS adjusted by AL visualized by the dotted line and the 2SLS estimator adjusted by CIM, visualized by the dashed-dotted line, have a MAD equal to that of the oracle estimator (the grey, solid line) already for a moderate sample size of . In the second and third rows, it becomes clear why that is the case. From a sample size of 1000 upwards, only three shares are chosen as invalid on average and in 100% of the cases the chosen IVs are the invalid ones, as can be seen in the third row. Whenever one appeals to the asymptotic properties of an estimator, the question of how large the sample size needs to be is legitimate. In the setting of this simulation, the adjusted estimators already attain oracle properties from a relatively low sample size on.
The graphs on the right of Figure 4 show the results from the setting in which only a plurality of shares is valid (four out of ten). Here, the estimator adjusted by AL fares only slightly better than the standard estimator, with a MAD at about 0.15, which does not monotonously decrease with growing sample size. On average, about eight shares are selected as invalid, as the sample gets large, but in no MC replication all invalid IVs are correctly selected as invalid. This can be seen from the right graph in row three: the dotted line and the solid black line coincide. Selection via the CIM achieves a performance which is equal to the oracle 2SLS from a sample size of about 2000 on. The average number of IVs selected as invalid reaches six when and the frequency with which all invalid are selected as invalid is close to one at .
B.2 Weak IVs and strong violations
Next, I ask how weak instruments and stronger direct effects change the performance of the estimators. Notably, in the lowest graphs in figure 4 the F-statistic grows steadily with sample size but it is lower than 10 for many sample sizes. Still, even when F is lower than 5, the performance of AL and CIM is close to that of the oracle estimator. This suggests that there are settings in which weak IVs do not severely affect the selection performance.
For the additional simulations in figures 5 and 6, I use the same setup as before but I change the first-stage parameter and the parameter of direct effect coefficients . The following table explains the changes. Lines 1 and 5 replicate the original setup, already reported in table 4.
[TABLE]
The results for these simulations can be found in Figures 5 and 6. In a nutshell, the results are that lowering the maximal F statistic in the simulations changes the results by little and increasing the entries of to make the direct effect stronger even improves performance in small-sample settings.
B.3 Multiple regressors
In a further simulation, I also compare the performance of the adaptive Lasso with the vector of marginal medians as an initial estimator, the standard and the oracle 2SLS estimators, but now with multiple endogenous regressors. I use 20 IVs in total and gradually increase the number of invalid instruments from zero to 18 or 17 (when ). I expect that the performance of the AL breaks down at the predicted cutoffs, when there are more than ten (), five () and three () invalid IVs.
In the simulations involving multiple endogenous regressors, the following is equal in all settings: There are 100 iterations per number of invalid IVs, the sample size is , there are IVs. The coefficient of interest is set to 0. is a matrix drawn from a uniform distribution between 0 and 1. The vector indicating invalidity, in equation 3, is set to , when one IV is invalid, to when two IVs are invalid, etc.
The error term from the structural equation is normally distributed with . The first-stage error terms are constructed as , obtaining the variances , and the covariances .
For the case with one regressor, the first-stage coefficient vector is a matrix of ones only. When there are two endogenous regressors, the matrix of coefficients is made of two vectors. The first-stage coefficient vectors for the first and second regressors are and . For , the matrix is composed of the two just-mentioned vectors and of an additional one: .
The results can be found in figure 7. As expected, the performance of the adaptive Lasso with the vector of marginal medians breaks down as soon as the majority rule is violated, i.e. when more than ten out of twenty IVs are invalid. When there are two endogenous regressors the performance of the adaptive Lasso diverges from that of the oracle estimator when there are seven or more invalid IVs. The consistency of the vector of marginal medians is guaranteed only as long as five or less IVs are invalid. In this particular setting, however, the AL continues performing well when six IVs are invalid. This can be the case when some just-identified estimates that use invalid IVs end up above and some below the consistent estimates, making the qualified majority assumption stricter than needed. When there are three endogenous regressors, again the AL continues performing as well as the oracle as long as five or less instruments are invalid.
Overall, the results suggest that the extension of the adaptive Lasso also has oracle properties as long as sufficiently many instruments are valid, with the qualified majority condition becoming stricter with the number of invalid instruments.
Appendix C Figures
Appendix D Tables
Appendix E Documentation for ado-files
The following subsection provides the documentation for the ssada - and sscim - programs in Stata. Preliminaries: Save ssada and sscim to your personal ado-directory.
E.1 Adaptive Lasso shift-share
The Stata implementation of AL shift-share is called ssada. The code is a variation of sivreg (Farbmacher, 2017) and shares its syntax. The differences are that in ssada analytical weights are allowed, the standard errors to be reported in the post-AL regression can be chosen and locals containing valid and invalid IVs are returned. moremata is required.
Syntax
ΨΨssada depvar indepvars [if] [in] [aw], /// ΨΨendog(varlist) exog(varlist) id(string) [options] Ψ
Options
**Required:
endog
Endogenous variable
exog
Exogenous controls as well as potentially endogenous shares used for construction of the shift-share IV. The shares should have the following naming: e.g. stub1, stub2, stub3, …
id
String denoting variables by which observations are identified
** **Optional:
aw
Analytical weights (aweight) are allowed
vce
Specifies the type of standard error reported. Same as in standard vce-option. Default is robust
c
real specifying the significance level as for the Andrews-Hansen stopping rule. Default is 0.1
**
Stored results
ssada stores the results of the last post-AL ivregress-command in e(). Moreover, the following macros are returned:
[TABLE]
E.2 Confidence Interval Method shift-share
The setup of sscim is adopted from ssada (a large part of lines 1-150).
Syntax
ΨΨsscim depvar indepvars [if] [in] [aw], /// ΨΨendog(varlist) exog(varlist) ssstub(string) [options] Ψ
Options
**Required:
endog
Endogenous variable
exog
Exogenous controls as well as potentially endogenous shares used for construction of the shift-share IV
ssstub
Stub of shares. The shares should have the following naming: e.g. stub1, stub2, stub3, …
** **Optional:
aw
Analytical weights (aweight) are allowed
vce
Specifies the type of standard error reported. As in vce()-option in ivreg2, e.g. vce("cluster(state)"). Default is "robust"
c
real specifying the significance level as for the Andrews-Hansen stopping rule. Default is 0.1
psif
Specifies initial critical value with which confidence intervals are calculated, according to psif. Set this larger than one if in the beginning already more than one IV are chosen as invalid. Default is 1.
**
Stored results
sscim stores the results of the last post-CIM ivregress-command in e() and the following macros:
[TABLE]
Post-estimation
For both ssada and sscim, the same post-estimation results as for ivregress apply.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Adão et al. (2019) Adão, R., M. Kolesár, and E. Morales. 2019. “Shift-Share Designs: Theory and Inference”. The Quarterly Journal of Economics 134 (4), 1949–2010.
- 2Altonji and Card (1991) Altonji, J. G. and D. Card. 1991. “The Effects of Immigration on the Labor Market Outcomes of Less-Skilled Natives”. In Immigration, Trade, and the Labor Market , pp. 201–234. University of Chicago Press.
- 3Amior (2020) Amior, M. 2020. “The Contribution of Immigration to Local Labor Market Adjustment”.
- 4Andrews (1999) Andrews, D. W. 1999. “Consistent Moment Selection Procedures for Generalized Method of Moments Estimation”. Econometrica 67 (3), 543–563.
- 5Athey and Imbens (2019) Athey, S. and G. W. Imbens. 2019. “Machine Learning Methods That Economists Should Know About”. Annual Review of Economics 11 , 685–725.
- 6Autor et al. (2013) Autor, D. H., D. Dorn, and G. H. Hanson. 2013. “The China Syndrome: Local Labor Market Effects of Import Competition in the United States”. American Economic Review 103 (6), 2121–68.
- 7Aydemir and Kirdar (2017) Aydemir, A. B. and M. G. Kirdar. 2017. “Quasi-Experimental Impact Estimates of Immigrant Labor Supply Shocks: The Role of Treatment and Comparison Group Matching and Relative Skill Composition”. European Economic Review 98 , 282–315.
- 8Bartik (1991) Bartik, T. J. 1991. “Who Benefits from State and Local Economic Development Policies?”. WE Upjohn Institute for Employment Research .
