# IV regressions without instruments (technical)

Arthur Lewbel published a very interesting paper back in 2012 in the Journal of Business & Economic Statistics (ungated version here). The paper attracted quite some attention because it lays out a method to do two-stage least squares regressions (in order to identify causal effects) without the need for an outisde instrumental variable. Consider a triangular model

$(1) \quad Y_1 = \beta_{01} + \beta_{11}X_1 + \beta_{21}Y_2 + \epsilon_1$

$(2) \quad Y_2 = \beta_{02} + \beta_{12}X_1 + \epsilon_2$

and

$\epsilon_1 = \alpha_1 U + V_1$

$\epsilon_2 = \alpha_2 U + V_2$

The common factor $U$ (think about the textbook example of unobserved ability in a wage regressions) creates a correlation between the errors that leads to an endogeneity problem when estimating (1). You can see that there is no exclusion restriction available in equation (2), because $X_1$ appears in both lines. Nevertheless, it is possible to estimate the parameters in (1) consistently when the following two assumptions are fulfilled

$(A1) \quad Cov(Z, \epsilon_2^2) \neq 0$

$(A2) \quad Cov(Z, \epsilon_1 \cdot \epsilon_2) = 0$

$Z$ is an observed random vector, which can be (but doesn’t have to be) a subset of the regressor vector $X$. (A2) places restrictions on the covariance matrix of the model errors which are satisfied in the above case of a common unobserved factor. In addition, the method requires heteroskedasticity in $\epsilon_2$ (in both $\epsilon_1$ and $\epsilon_2$ for non-triangular models), which arises frequently in applied work.

Lewbel’s method works like a charm in simulation studies. However, it was developed for linear models (footnote 1). But what happens if you have a binary endogenous variable? Let’s consider $\ Y_2$ being Probit

$(3) \quad Y_1 = \beta_{01} + \beta_{11}X_1 + \beta_{21}Y_2 + \epsilon_1$

$(4) \quad Y_2 =$ 1[$\beta_{02} + \beta_{12}X_1 + \nu_2 > 0$]

with 1[…] being the indicator function and $\epsilon_1$ as before. $\nu_2 = \alpha_2 U + V_2$ has to be standard normal such that, for independent $U$ and $V$, it has to hold that $Var(U) + Var(V_2) = 1. Note that$latex Pr(Y_2=1|X) = E(Y_2|X) = \Phi(\beta_{02} + \beta_{12}X_1) = \Phi(X’\beta)\$

and we can rewrite equation (4) with additive error

$\Rightarrow (4) \quad Y_2 = \Phi(X'\beta) + \epsilon_2$

with $\epsilon_2 = Y_2 - \Phi(X'\beta)$, which is a function of X! Intuitively, the additive $\epsilon_2$ cannot vary freely for binary $\ Y_2$. It has to be smaller when $X$ is either small or large, otherwise we would not stay in the supposed bounds of zero and one. This means that there is heteroskedasticity in (4) by construction since

$Var(Y_2|X) = \Phi(X'\beta) (1 - \Phi(X'\beta))$

is clearly not constant.

Initially, I thought this is great because it should mean that Lewbel’s method is always applicable with a binary endogenous regressor. What’s with the second assumption though? Inserting $\epsilon_2$ in (A2) gives

$Cov(Z, \epsilon_1 \cdot \epsilon_2) = Cov(Z, \epsilon_1 \cdot (Y_2 - \Phi(X'\beta))$

When $Z$ (footnote 2) is a subset of $X$ this covariance is not zero. (A2) is violated!

To get a feeling for the problem I created simulated data with 2,000 observations and the following parametrization (notation is a bit sloppy due to the limited LaTex capabilities of WordPress)

$U = N(0,\sqrt{0.5})$, $\epsilon_1 = N(0,\sqrt{0.5}) + U$, $\nu_2 = N(0,\sqrt{0.5}) + U$, and $Z = X_1$

$(5) \quad Y_1 = 1 + X_1 + Y_2 + \epsilon_1$

$(6) \quad Y_2 =$ 1[$1 + X_1 + \nu_2 > 0$]

Using the user-written Stata command ivreg2h gave the following output

Estimates are far off the true coefficients (which are all equal to one). And this wasn’t just an unlucky draw. The average estimate of $\beta_{21}$ in a small Monte-Carlo study with 200 repetitions was equal to 1.83.

You might object that in order to construct the instruments Lewbel suggests, $E(Z-E(Z)) \epsilon_2$, you have to estimate the exact $\epsilon_2$. By contrast, ivreg2h assumes a linear equation for $Y_2$. But things don’t improve much if you estimate equation (6) by Probit and construct the instruments manually.

To conclude: Be careful with applying the method in a situation with binary endogenous regressor. There is at least one case ($Y_2$ being Probit) where the estimator is inconsistent. It might still work for other structural specifications. And it would be great if somebody worked out the conditions under which it does. Until then, however, I would refrain from using Lewbel’s method in the binary case. It’s not robust to miss-specifications of the $Y_2$-equation and we don’t know yet when it works and when it doesn’t.

Footnotes:

(1) He also presents an extension to partly linear systems which, however, does not capture the limited dependent data case.

(2) If, on the other hand, $Z$ is restricted to be an outside variable, not contained in $X$, then I don’t see how you can satisfy the requirement of heteroskedasticity (A1). Maybe with some sort of heteroskedastic Probit specification. But I haven’t worked that out. Especially introducing the common factor—which leads to endogeneity in the triangular model—seems to be non-trivial.

Update: Fixed an error and added some clarifying remarks. Thanks to Arthur Lewbel for the pointer!