Selective Instrumental Variable Regression

Shu Shen is an assistant professor of economics. Her project, which seeks to advance the application (and improve the statistical precision) of an important econometric tool in social science research, was awarded an ISS Individual Research Grant in 2015. She provided this update in February 2016.

What motivated you to pursue this project?

Researchers in social science, including economists, are particularly interested in uncovering causal relationships between social actions and economic phenomenon such as the wealth and health return to an additional year of education. Such questions are of vital importance, but are hard to answer because there are many confounding factors happening at the same time as the occurrence of social actions. Also, economists and social scientists generally do not have the luxury to conduct random experiments in their research.

Instrumental variable (IV) regression is an important econometric tool that helps researchers to estimate the causal impact of some self-determined variable (say, years of education) on the economic outcome (say, wage or health outcomes). The IV regression method relies on researchers finding instruments that are related with the independent variable but have an impact on the economic outcome only through its relationship with the independent variable.

One practical problem of using natural experiments as instruments for IV regression analysis is that sometimes the instrument may have only a weak relationship with the self-determined independent variable (e.g. years of schooling). Weak instruments are problematic as they not only bias the two stage least squares (2SLS) estimators towards ordinary least squares (OLS) estimators, but also nullify the classical inferential procedures.

In this project, we investigate how utilizing potential heterogeneity in the correlation between the instrument and the self-determined independent variable can help improve the statistical precision in IV regression.

How has it progressed since you received an ISS Individual Research Grant?

Our theoretical method has been refined and we have been applying the proposed method to study the return to compulsory schooling, using changes in minimum schooling leaving age in the United States as the instrument.

What notable or surprising findings can you share at this point?

Applied researchers sometimes restrict IV regression to subsamples with a strong first stage to help alleviate the weak IV problem when there is first stage heterogeneity. We formally show that such a naïve selection method, based on subsample first stage correlation, is invalid, and tends to generate excessive rejections in hypothesis testing.

Intuitively, because the naïve method uses subsamples with large t-statistics on the instrument in first stage regressions, it picks out not only subsamples with (true) strong first stage effects but also subsamples with large correlations between the first stage error term and the instrument. Because the error terms in the first stage and the second stage are correlated, selecting subsamples on the basis of the first stage t-statistic results in violations of the exclusion restriction, and in over-rejection in significant tests in the second stage.

What is the next step?

We are currently conducting large scale Monte Carlo simulations to compare the proposed method with other existing methods.

Learn more about Shu Shen at her faculty webpage.