Including random effects in statistical models in ecology: fewer than five levels?
Abstract
As generalized linear mixed-effects models (GLMMs) have become a widespread tool in ecology, the need to guide the use of such tools is increasingly important. One common guideline is that one needs at least five levels of a random effect. Having such few levels makes the estimation of the variance of random effects terms (such as ecological sites, individuals, or populations) difficult, but it need not muddy one’s ability to estimate fixed effects terms – which are often of primary interest in ecology. Here, I simulate ecological datasets and fit simple models and show that having too few random effects terms does not influence the parameter estimates or uncertainty around those estimates for fixed effects terms. Thus, it should be acceptable to use fewer levels of random effects if one is not interested in making inference about the random effects terms (i.e. they are ‘nuisance’ parameters used to group non-independent data). I also use simulations to assess the potential for pseudoreplication in (generalized) linear models (LMs), when random effects are explicitly ignored and find that LMs do not show increased type-I errors compared to their mixed-effects model counterparts. Instead, LM uncertainty (and p values) appears to be more conservative in an analysis with a real ecological dataset presented here. These results challenge the view that it is never appropriate to model random effects terms with fewer than five levels – specifically when inference is not being made for the random effects, but suggest that in simple cases LMs might be robust to ignored random effects terms. Given the widespread accessibility of GLMMs in ecology and evolution, future simulation studies and further assessments of these statistical methods are necessary to understand the consequences of both violating and blindly following simple guidelines.
Related articles
Related articles are currently not available for this article.