How to (and not to) log transform zero
Image courtesy of the authors: Survey results of the papers with log zero in the American Economic Review Academic's take Log transformation is widely used in linear models for several reasons: Making data "behave" or conform to parametric assumptions, calculating elasticity, etc. The figure above shows that nearly 40% of the empirical papers in a selected journal used a log specification and 36% had the problem of the log of zero. When an outcome variable naturally has zeros, however, log transformation is tricky. In most cases, the solution is to instinctively add a positive constant to each value in the outcome variable. One popular idea is to add 1 so that raw zeros remain as log-transformed zeros. Another is to add a very small constant, especially if the scale of the variable is small. Well, the bad news is that these are all arbitrary choices that bias the resulting estimates. If a model is correlational, a small bias due to the transformation may not be a big conc