methods for fitting Poisson or negative binomial models to data. Instead statistical procedures designed to deal with counts should be used, i.e. An additional problem with regression of transformed variables is that it can lead to impossible predictions, such as negative numbers of individuals. But with a glm, bias is always minimal.įor count data, our results suggest that transformations perform poorly. Note that the bias for transformed fits is all over the place. A low bias means that the method will, on average, return the ‘true’ value. Intriguingly, they find that glms (with either distribution) always perform well, while each transformation performs poorly at some or all values.Įstimated mean biases from six different models, applied to data simulated from a negative binomial distribution. They simulate count data from negative binomial distributions and look at the results from generalized linear models with negative binomial or quasi-poisson error terms (see here for the difference) versus a slew of transformations. O’Hara and Kotze’s paper takes this question and runs with it. Better to use a poisson or a negative binomial.īut, “Sheesh!”, one might say, “Come on! How different can these models be? I mean, I’m going to get roughly the same answer, right?” For example, count data is discrete, and hence, a normal distribution will never be quite right. More importantly, the error distributions from generalized linear models may often be far far faaar more appropriate to the data you have at hand. “But, hey!” you might say, “Glms and transformed count data should produce the same results, no?”įrom first principles, Jensen’s inequality says no – consider the consequences for error of the transformation approach of log(y) = ax+b+error versus the glm approach y=e^(ax+b)+error. Sure, one has to think more about the particular model and error distribution they specify, but, if you’re not thinking about these things in the first place, why are you doing science? The canonical book on this was first published ’round 1983. What? I’m biased!) whereby one specifies a nonlinear function with a corresponding non-normal error distribution. This has led to decades of thoughtless transformation of count data without any real thought as to the consequences by in-the-field ecologists.īut statistics has had a better answer for decades – generalized linear models ( glm for R nerds, gzlm for SAS goombas who use proc genmod.
Or SOMETHING to linearize it before fitting a line and ensure the sacrament of normality is preserved.
Always check your data and make sure it is normally distributed! Or, make sure that whatever lines you fit to it have normally distributed error around them! Normal! Normal normal normal!Īnd if you violate normality – say, you have count data with no negative values, and a normal linear regression would create situations where negative values are possible (e.g., what does it mean if you predict negative kelp! ah, the old dreaded nega-kelp), then no worries. If you’re like me, when you learned experimental stats, you were taught to worship at the throne of the Normal Distribution. OK, so, the title of this article is actually Do not log-transform count data, but, as mentioned, you just can’t resist adding the “bitches” to the end.