Why Tobit models are overused

In my field of research we’re often running regressions with innovation expenditures or sales with new products aon the left-hand side. Usually we observe many zeros for these variables because firms do not invest at all in R&D and therefore also do not come up with new products. Many researchers then feel inclined to use Tobit models. But frankly, I never understood why. In an earlier version of one of my papers I’ve commented on this:

Authors in previous studies (Laursen and Salter, 2006; Leiponen and Helfat, 2010, 2011; Klingebiel and Rammer, 2014) often rely on limited dependent variable models, namely a Tobit type I regression (Tobin, 1958; Amemiya, 1985), because they recognize the non-negativity of sales with new products. In agreement with Angrist and Pischke (2009), we break with this tradition as we do not make sense of a latent variable interpretation with a separate censoring mechanism that forces negative sales to be zero. Rather we think that zeros occur naturally in this setting. Another justification for the Tobit model is sometimes provided by a hurdle model interpretation (Cragg, 1971). Here, the censoring point is thought as a threshold of “participation” which is modeled by a separate probabilistic process. Excess zeros (e.g., relative to the likelihood of a normal distribution) occur because a part of the sample is simply reluctant to engage in any innovation activities. We think that such a two-part approach is not appropriate for our application either as some form of innovation activity is a necessary condition to appear in our sample. In addition, we do not require fitted values to satisfy boundary conditions at the lower ends of the distribution, since we are not interested in effects that appear in certain distributional ranges of the dependent variable. Estimation by ordinary least squares, in contrast, conveniently allows to incorporate cluster-robust standard errors (clustered at the firm level) which is advisable when analyzing survey data considering that some firms appear in both survey waves.
A couple of (reiterating) points:
  • Cases of firms with no innovation expenditures (or sales with new products) are natural zeros. There is no censoring or truncation mechanism that forces negative expenditures to appear as zeros in the data. The actual value is zero, period.
  • Some people worry that with lots of zeros in the data the distribution of the outcome variable, Y, becomes very skewed. First of all, OLS can handle that as it doesn’t require normal errors for consistency. And secondly, if you worry about skewness there are other models you could use, such as Poisson, which are more robust to distributional misspecifications than Tobit.
  • Most importantly, if you introduce a latent variable (as in Tobit) you better have a good structural interpretation for it (like Heckman in his female labor supply example).  If you, e.g., argue that zero innovation expenditures are the result of a firm’s profit maximization problem—in which expected future cash flows are traded-off against project costs—then you should model this decision explicitly and tell me why you’re specifically interested in the effects on the latent rather than the observed variable. Everything else is too handwavy for my taste. In other words, if you’re doing reduced-form econometrics, do it properly (or switch to fully-fledged structural otherwise)!
  • A Heckman selection model “is equivalent to a Tobit model with stochastic threshold” (Cameron & Trivedi 2005, ch. 16.5.2) and therefore relies on a similar set of strong distributional assumptions. So if you’re worried about endogeneity as a result of sample selection I would usually advise you to go with two-stage least squares instead.



Follow-up on “IV regressions without instruments” (technical)

Some time ago I wrote about a paper by Arthur Lewbel in the Journal of Business & Economic Statistics in which he develops a method to do two-stage least squares regressions without actually having an exclusion restrictions in the model. The approach relies on higher moment restrictions in the error matrix and works well for linear or partly linear models. Back then, I expressed concerns that the estimator does not seem to work when an endogenous regressor is binary though; at least not in the simulations I have carried out.

After a bit of email back-and-forth we were able to settle the debate now. Continue reading Follow-up on “IV regressions without instruments” (technical)

IV regressions without instruments (technical)

Arthur Lewbel published a very interesting paper back in 2012 in the Journal of Business & Economic Statistics (ungated version here). The paper attracted quite some attention because it lays out a method to do two-stage least squares regressions (in order to identify causal effects) without the need for an outisde instrumental variable. Continue reading IV regressions without instruments (technical)

Econometrics: When Everybody is Different

Nowadays everybody is talking about heterogeneous treatment effects. That is, response to an economic stimulus that varies across individuals in a population. However, so far the discussion was concentrated on the instrumental variable setting where a randomized (natural or administered) experiment affects the treatment status of a so-called complier population. An average of the individual treatment effects can only be estimated for this group of compliers. Instead, for the always and never-takers we cannot say anything. But if individual treatment responses are different for everybody in the population, how can we be sure that what we’re estimating for the compliers is representative for the whole population? Continue reading Econometrics: When Everybody is Different

Successfully Mastering Econometrics

Because I’m currently sitting in the same lecture room in Strasbourg as Steve Pischke and yet another paper on labor markets is presented, I feel inspired to comment on the newest Angrist and Pischke piece on econometrics education. Furthermore, my own graduation doesn’t lie too much in the past, so I might still be part of the target group for an improved coursework in quantitative methods. Continue reading Successfully Mastering Econometrics