What is autocorrelation?
Autocorrelation, like more vanilla kinds of correlation, characterizes the way one thing relates to another. Most of the time when people report correlation, they are talking about the way that one variable changes with another. Autocorrelation describes the way one variable relates to itself, however, which can be tricky to visualize. There are three kinds of autocorrelation that ecologists and evolutionary biologists frequently discuss:
- Phylogenetic autocorrelation: species (and other evolvable entities) with a more recent common ancestor, (i.e. that share more evolutionary history) tend to be similar in many traits (e.g. body size). Felsenstein (1985) is usually credited as the first evolutionary biologist to articulate succinctly why this should be accounted for; he proposed a comparative method. (The acknowledgements admit that an unnamed, female graduate student came up with the initial idea for the comparative method, however.)
- Temporal autocorrelation: events that occur in quick succession tend to be more similar than events that occur after a long separation. This pattern crops up with great importance in many different situations (e.g. quarterly profit data), which means that mathematicians have developed a lot of tools to use for time series analysis.
- Spatial autocorrelation: sites that are closer together tend to be more similar than sites that are far apart. It usually involves two or more dimensions, which is why this form of autocorrelation is the most complicated to quantify and address. Unfortunately, spatial autocorrelation is a pervasive pattern in ecology that may necessitate sophisticated study design or analysis. The next post in this blog will walk through this in detail.
Why should I worry about autocorrelation?
Autocorrelation in a predictor variable violates keys assumptions in many statistical models. In particular, an independent variable is not independent: autocorrelation means that an observation depends on surrounding values. The number of truly independent observations is the basis for calculating the degrees of freedom (DF) in an analysis. So, if an observation is counted towards the degrees of freedom when it is actually a “pseudo-replicate” of other observations, then one will overestimate DF. This in turn leads to a serious bias in every statistical value (e.g. p-values) that uses DF in its calculation. Autocorrelation means it’s easy to be too confident in a result. Worse, it is also possible to be confident in a wrong result. The flip side, however, is that by accounting for autocorrelation a true pattern could emerge that otherwise would be hidden. Regardless of these specific concerns, it is good practice to account for autocorrelation wherever it might occur, or discuss why it is unlikely to occur or cause bias. Doing this makes you look credible and sets an example for others in your field.
In pedantic statistical terms, the crucial circumstance where autocorrelation causes problems is when the residuals in a regression are autocorrelated. This is sometimes called “residual autocorrelation” and led to some fuss among ecologists a few years ago. Florian Hartig discusses that in this blog post: https://theoreticalecology.wordpress.com/2012/05/12/spatial-autocorrelation-in-statistical-models-friend-or-foe/.
Suggestions for learning more
I’ve found it well worth the time investment to read up on each type of autocorrelation.
- Temporal: A good library will have a section of statistics books about time series analysis, and I recommend browsing these for something that fits your applications and level of background knowledge. If you want a concise explanation of the field and aren’t afraid of equations, Diggle’s (1990) Time series: a biostatistical introduction hits the spot. If you like to pick and choose examples to work through, I recommend Shumway and Stoffer’s (2006) comprehensive Time Series Analysis and its Application. With R examples.
- Phylogenetic: Felsenstein’s “Phylogenies and the comparative method” (in The American Naturalist, 1985) is essential reading. From there, it’s best to read recent papers to find out about the newest approaches and software.
- Spatial: To start, check out later posts in this blog! For a rigorous review, I recommend the spatial analysis chapters in Legendre and Legendre’s Numerical ecology (3rd ed. 2012).