library(tidyverse) # for data wrangling
library(gridExtra) # for multi-panel plots
library(lme4) # for fitting random-effects models
library(nlme) # for fitting random-effects models
library(glmmTMB) # for fitting random-effects models
library(sjPlot) # for visualizing fitted models
library(modelsummary) # for creating output tables
library(kableExtra) # for tables
options(kableExtra.html.bsTable = T)
library(ggplot2)# for plotting
library(performance) # for model diagnostics
18 Linear Mixed Effects Models
Learning objectives
Be able to identify situations when it is appropriate to use a mixed model.
Learn how to implement mixed models in R/JAGs for Normally distributed response variables.
Be able to describe models and their assumptions using equations and text and match parameters in these equations to estimates in computer output.
18.1 R packages
We begin by loading a few packages upfront:
In addition, we will use data and functions from the following packages:
Data4Ecologists
for theSelake
andHRData
data setslmerTest
for testing hypotheses using linear mixed effect modelstidy
in thebroom.mixed
package for accessing model outputR2jags
for fitting models using JAGSMCMCvis
andmcmcplots
for summarizing MCMC samplescAIC
in thecAIC4
package for calculating conditional AICs
18.2 Revisiting the independence assumption
A common assumption of all of the models we have fit so far is that our observations are independent. This assumption is unrealistic if we repeatedly record observations on the same sample units (i.e. we have repeated measures); this assumption may also be violated if our observations are measured close in space or close in time. Let’s consider the implications of having non-independent data and also various strategies for accommodating non-independence using a simple simulated data set from Schwarz (2014). These data are contained in the Data4Ecologists
package:
library(Data4Ecologists) # for data
data(Selake)
head(Selake)
Lake Log_Water_Se Log_fish_Se
1 a -0.30103 0.9665180
2 a -0.30103 1.1007171
3 a -0.30103 1.3606978
4 a -0.30103 1.0993405
5 a -0.30103 1.2822712
6 b 0.83600 0.9713466
The data set contains selenium (Se) concentrations (on the log scale) measured in 83 fish sampled from 9 different lakes, along with log(Se) concentrations in those lakes. Selenium can leach from coal into surface waters when coal is mined, and therefore, this data set could conceivably have been collected during an environmental review to determine whether selenium in nearby lakes is bioaccumulating in fish. We begin by fitting a simple linear regression model to explore the relationship between logged concentrations in the water (Log_Water_Se
) and log concentrations in fish (Log_fish_SE
), assuming all locations are independent. We then plot the regression line overlaid on the observations, and inspect a residual versus fitted value plot; in both cases, we associate a unique color with the observations by lake (Figure 18.1.
Importantly, the number of fish sampled varies from lake to lake. Further, we see that the residuals for different lakes tend to be clustered with most values being either negative or positive for any given lake. Thus, two randomly chosen residuals from the same lake are likely to be more alike than two randomly chosen residuals from different lakes. The data are clearly not independent!
Think-pair-share: What are the consequences of ignoring the fact that we have multiple observations from each lake?
Often when I pose this question to ecologists, they express concern that the estimates of the regression parameters will be biased. Here, it is important to recognize that bias has a specific definition in statistics. In particular, bias is quantified by the difference between the expected (or average) estimate across repeated random trials and the true parameter value. Again, by random trial, we mean the process of collecting and analyzing data in the same way. In this case, that means collecting data in a way that leads to an unbalanced data set (with unequal numbers of fish per lake) and then analyzing that data set using simple linear regression. When our data are not balanced and we treat observations as independent, it is clear that some lakes will contribute more information than others. But, does that imply we have a biased estimator? Well, it depends on the process that leads to the imbalance.
18.2.1 What if sample sizes are random?
There are clearly other factors that influence selenium concentrations in the water and in fish, but if the sample size in each lake is completely random (i.e., the number of fish sampled per lake does not depend on these other factors that are associated with our response variable), then our regression parameter estimators may be unbiased even if the data set is not balanced. On the other hand, we might expect naive estimates of uncertainty to be too small.
Let’s explore the properties of the regression parameter estimators using simulations from the following data-generation model for log Se concentrations in fish:
\[ log(SeFish)_{i,j} = \beta_0 + \beta_1log(SeWater)_i + \tau_i + \epsilon_{ij} \nonumber \\ \] \[ \tau_i \sim N(0, \sigma_{\tau}^2) \nonumber \\ \] \[ \epsilon_{ij} \sim N(0, \sigma_{\epsilon}^2) \nonumber \tag{18.1}\]
where \(log(SeFish)_{i,j}\) is the log Se concentration of the \(j^{th}\) fish in the \(i^{th}\) lake, \(log(SeWater)_i\) is the log Se concentration in \(i^{th}\) lake’s water, and \(\beta_0\) and \(\beta_1\) are the regression parameters we are interested in estimating. Deviations about the regression line are due to two independent random factors represented by \(\tau_i\) and \(\epsilon_{ij}\); \(\tau_i\) is constant for all observations in a lake whereas \(\epsilon_{ij}\) is unique to each observation. We can think of the \(\tau_i\) as representing ecolgoical variables (other than the Se concentration in a lake) that cause all SeFish concentrations to be higher or lower, on average, for particular lakes1, whereas the \(\epsilon_{ij}\) represent other random factors that create variability among fish found in the same lake. Whenever writing down models for repeated measures data, it is critical that we use multiple subscripts (e.g., we use \(i\) to index each lake and \(j\) to index the different observations within each lake).
We will repeatedly generate a random number of observations from each of 9 randomly selected lakes. We will then fit a linear regression model to the data assuming all observations are independent and ignoring the lake-level random effects, \(\tau_i\). We will store the results to look at the sampling distribution of \(\hat{\beta_0}\), \(\hat{\beta_1}\), \(\widehat{SE}(\hat{\beta}_0)\) and \(\widehat{SE}(\hat{\beta}_1)\):
# Set.seed
set.seed(1024)
# Number of lakes and number of simulations
<- 10000 # Go Minnesota! :)
nlakes <- 10000 # to approximate the sampling distribution
nsims
# Parameters used to simulate the data:
= 1.16 # intercept
beta0 = 0.26 # slope
beta1 = 0.45 # sd tau
sigma_tau = 0.20 #sd epsilon
sigma_epsilon
# Matrix to hold results
<- matrix(NA, nsims, 2)
beta.hat <- matrix(NA, nsims, 2)
standard.error.hat
# Generate lake Se values for each of 10,000 lakes.
# We will assume these are ~ N(1,1)
<-rnorm(nlakes, 1, 1)
log_se_water
# Generate tau_i for 10,000 lakes
<- rnorm(nlakes, 0, sigma_tau)
tau
# Simulation loop
for(i in 1:nsims){
# pick a sample of 9 lakes
<- sample(1:nlakes, 9)
lakeids
# determine a random sample size for each lake
# using a negative binomial random number generator
# with mean = mu and overdispersion parameter = theta
<- 7 # mean number of observations per lake
mu <- 1.15
theta <- rnbinom(9, mu = mu, size = theta) # number of obs/lake
ni <- beta0 + beta1*rep(log_se_water[lakeids], ni) +
log_se_fish rep(tau[lakeids], ni) + rnorm(sum(ni), 0, sigma_epsilon)
<- data.frame(id = rep(lakeids, ni),
dat log_se_water = rep(log_se_water[lakeids], ni),
log_se_fish)<- lm(log_se_fish~log_se_water, data=dat)
lmtemp <- coef(lmtemp)
beta.hat[i,] <-sqrt(diag(vcov(lmtemp)))
standard.error.hat[i,] }
Let’s begin by looking at the sampling distributions of our regression parameter estimators with the true parameters used to generate the data highlighted in red:
# plot
par(mfrow=c(1,2))
hist(beta.hat[,1], main="Intercept", xlab="Estimate", ylab="Sampling Distribution")
abline(v=beta0, col="red")
hist(beta.hat[,2], main="Slope", xlab="Estimate", ylab="Sampling Distribution")
abline(v=beta1, col="red")
There appears to be little bias when estimating \(\beta_0\) and \(\beta_1\) despite having incorrectly assumed the observations were independent, which is also confirmed, below:
# Estimate of Bias: Intercept
mean(beta.hat[,1])- beta0
[1] -0.00434286
# Estimate of Bias: Slope
mean(beta.hat[,2])- beta1
[1] 0.00175792
Next, we consider the impact that repeated measures have on our estimates of uncertainty when assuming observations are independent. If our estimators of the standard errors of \(\hat{\beta}_0\) and \(\hat{\beta}_1\) are unbiased, then we should expect the average estimated standard errors, \(E[\widehat{SE}(\hat{\beta}_0)]\) and \(E[\widehat{SE}(\hat{\beta}_1)]\), to be close to the standard deviation of \(\hat{\beta}\) (remember, these SEs estimate the standard deviation of our sampling distribution!).
# Estimate of Bias: SE(intercept)
mean(standard.error.hat[,1]) - sd(beta.hat[,1])
[1] -0.231249
# Estimate of Bias: SE(slope)
mean(standard.error.hat[,2]) - sd(beta.hat[,2])
[1] -0.1746312
We see in both cases that the \(E[\widehat{SE}(\hat{\beta}_i)] < sd(\hat{\beta}_i)\), suggesting our estimates of uncertainty are too small if we ignore the repeated measures and assume our observations are independent.
18.2.2 What if sample sizes are not completely random?
What if the sample sizes associated with each lake are not random? Maybe we are more likely to sample lakes where fish tend to have high levels of Se (more so than would be predicted by the lake’s Se concentration). I.e., what if the sample sizes were correlated with the \(\tau_i\)? Let’s simulate data under this assumption (but for a sample of 100 lakes) and again fit the simple linear regression model. In addition, we will fit the data-generating model using the lmer
function in the lme4
package (Bates et al. 2015) (this is a mixed-effects model, which will be the subject of the remainder of this chapter):
# Set.seed
set.seed(1024)
# Matrix to hold results
<- matrix(NA, nsims, 2)
beta.hat2 <- matrix(NA, nsims, 2)
beta.hat2.lme4
# Simulation loop
for(i in 1:nsims){
# pick a sample of 100 lakes
<- sample(1:nlakes, 100)
lakeids
# determine random sample size for each lake
# generate using negative binomial with mean = mu*exp(tau[i])
# overdispersion parameter = theta = 1.15
<- rnbinom(100, mu = exp(log(mu)+tau[lakeids]), size = theta)
ni <- beta0 + beta1*rep(log_se_water[lakeids], ni) +
log_se_fish rep(tau[lakeids], ni) + rnorm(sum(ni), 0, sigma_epsilon)
<- data.frame(id = as.factor(rep(lakeids, ni)),
dat log_se_water = rep(log_se_water[lakeids], ni),
log_se_fish)<- coef(lm(log_se_fish~log_se_water, data=dat))
beta.hat2[i,] <- fixef(lmer(log_se_fish~log_se_water + (1|id), data=dat))
beta.hat2.lme4[i,] }
We again look at the sampling distributions with the true parameters used to generate the data highlighted in red (the top row of panels correspond to the simple linear regression model and the bottom panels to the mixed-effects model used to generate the data):
We see that when using simple linear regression, our estimator of the intercept is biased. Nonetheless, the estimator of the slope appears to be unbiased. The estimators of both parameters appear to be unbiased when fitting the mixed-effects (i.e., data-generating) model. Also, note that the estimates are less variable when using the mixed-effects model.
<-data.frame(
bias.betasMethod = rep(c("Simple Linear Regression", "Data-Generating (Mixed) Model")),
Intercept = round(c(mean(beta.hat2[,1])- beta0,
mean(beta.hat2.lme4[,1])- beta0), 3),
Slope = round(c(mean(beta.hat2[,2])- beta1,
mean(beta.hat2.lme4[,2])- beta1),3))
%>% kable(., booktabs = TRUE) bias.betas
Method | Intercept | Slope |
---|---|---|
Simple Linear Regression | 0.193 | -0.003 |
Data-Generating (Mixed) Model | 0.029 | -0.003 |
18.3 Optional: Mixed-effect models versus simple alternatives
In the last section, we saw that if sample sizes are random, then our estimates of regression parameters will be unbiased, but estimates of their uncertainty will be too small. Thus, one simple strategy for analyzing repeated measures data in cases where sample sizes can be argued to be completely random or the data are balanced (i.e., contain equal numbers of observations for each cluster) is to assume independence when estimating regression parameters, but then use a method for estimating uncertainty that considers the cluster as the independent unit of replication. For example, in Chapter 2, we explored a cluster-level bootstrap when analyzing the Rikzdat
data (resampling beaches with replacement). Similarly, in Chapter 20 will learn about Generalized Estimating Equations (GEE) that use robust standard errors calculated by considering variation among independent clusters.
Alternatively, we could consider aggregating our data to the lake level before fitting our regression model. Each lake has a single recorded value of log(Se) in the water. We could calculate the average log(Se) among fish for each lake and then fit a regression model with this average as our response variable. This strategy is often reasonable when the predictors of interest do not vary within a cluster (Murtaugh 2007). The code below shows how to accomplish this in R.
<- Selake %>% group_by(Lake) %>%
fishse.avg ::summarize(Log_Water_Se=mean(Log_Water_Se), fish.avg.se=mean(Log_fish_Se))
dplyr<- lm(fish.avg.se ~ Log_Water_Se, data=fishse.avg) fit.avg
Lastly, although we have not yet introduced mixed-effect models, we can fit the exact model that was used to generate the data using the following code:
<- lmer(Log_fish_Se ~ Log_Water_Se + (1|Lake), data=Selake) fit.mixed
Let’s compare the estimated intercept and slope from these three approaches: a naive linear regression model, a regression model fit to lake averages, and the mixed-effect model fit using lmer
, which was the model used to generate the data.
modelsummary(list("Naive linear model" = fit.naive,
"Linear model fit to averages" = fit.avg,
"Mixed-effects model" = fit.mixed),
gof_omit = ".*", estimate = "{estimate} ({std.error})", statistic=NULL,
coef_omit = "SD")
Naive linear model | Linear model fit to averages | Mixed-effects model | |
---|---|---|---|
(Intercept) | 1.164 (0.064) | 1.131 (0.209) | 1.132 (0.209) |
Log_Water_Se | 0.265 (0.059) | 0.367 (0.181) | 0.366 (0.179) |
We see that the estimated standard errors for \(\beta_0\) and \(\beta_1\) are much narrower in the naive linear model, which is not a surprise since this approach treats the data as though they were independent (Table 18.2). On the other hand, the estimates of \(\beta_0\), \(\beta_1\), and their standard errors are nearly identical when fitting a linear model to the aggregated data or when fitting an appropriate mixed-effects model. An advantage of the former approach is its simplicity – it requires only methods you would likely have seen in an introductory statistics course.
Murtaugh (2007) provided a similar example in which he compared two analyses, both aiming to test for differences in mean zooplankton size between ponds that either contained or did not contain fish. In each pond, he measured the body length of thousands of zooplankters. Clearly, these observations will not be independent (zooplankton from the same lake are likely to be more similar to each other than zooplankton from different lakes, even after accounting for the effects of fish presence-absence). The first analysis, reported in Murtaugh (1989), required a full paragraph to describe:
I ran a nested analysis of variance (ANOVA) on the lengths of zooplankton in the six ponds—the effect of pond on body size is nested within the effect of predation treatment. Note that, since the study ponds can be thought of as a sample from a larger population of ponds that might have been used, ‘pond’ is a random effect nested within the fixed effect of fish . . . Because of the complications involved in nested ANOVA’s with unequal sample sizes (Sokal and Rohlf 1981), I fixed the number of replicates per pond at the value for the pond with the smallest number of length measurements available . . .; appropriately-sized samples were obtained from the larger sets of measurements by random sampling without replacement.’’
In the second analysis, he calculated pond-level means and then conducted a two-sample t-test. He points out that the two hypotheses tests relied on the exact same set of assumptions and the p-values were identical. The nested analysis of variance will sound more impressive, but the latter approach has the advantage of simplicity – again, requiring only methods learned in an introductory statistics class. Murtaugh (2007) makes a strong argument for the simpler analyses, highlighting that:
Simpler analyses are easier to explain and understand; they clarify what the key units in a study are; they reduce the chances for computational mistakes; and they are more likely to lead to the same conclusions when applied by different analysts to the same data.
On the other hand, there are potential advantages of fitting a mixed-effects model (e.g., we will see that we can study variability in Log_Se_fish
measurements both among and within lakes). In addition, mixed-effects models will be more powerful when studying predictor-response relationships when the predictor variable varies within a lake. For example, if we also wanted to study how Se concentrations change as fish age, we would be better off studying this using a model that included the age of each fish rather than modeling at the lake level using the average age in each lake as a predictor. Thus, it is important to be able to identify when using a mixed-effects model will be beneficial and when it may be viewed as just statistical machismo.2
The rest of this chapter will be devoted to mixed-effect models that allow observational units or clusters to have their own parameters. We will learn how to write down a set of equations for mixed models (e.g., Equation 18.1), match R output to the parameters in these equations, explore the assumptions of the model and how to evaluate them, and discover how these models may or may not properly account for non-independence resulting from having multiple observations on the same sampling unit or cluster.
18.4 What are random effects, mixed-effect models, and when should they be considered?
The model fit using the lmer
function allowed each lake to have its own intercept. More generally, we can consider models that allow each cluster to have its own intercept and slope parameters. Actually, we already saw how we could allow for cluster-specific parameters in Chapter 3.
Think-pair-share: Using what we learned in Chapter 3, how could we allow each lake to have its own intercept? And, more generally, how could we allow each lake to have its own slope?
We could have included Lake
in the model using lm(Log_fish_Se ~ Log_Water_Se + Lake, data=Selake)
3. This would have allowed each lake to have its own intercept. Furthermore, if we had a predictor that varied within a lake (e.g., fish age), we could allow for lake-specific slopes by interacting Lake
with age
, e.g., using lm(Log_fish_Se ~ Log_Water_Se + age + Lake + age:Lake, data=Selake)
. These approaches model among-cluster variability using fixed effects, parameters that we directly estimate when fitting the model. How do these parameters differ from random effects?
The defining characteristic of random effects is that they are assumed to come from a common statistical distribution, usually a Normal distribution. Thus, models that contain random effects add another assumption, namely that the cluster-specific parameters come from a common distribution. Specifically, if we examine Equation 18.1, we see the assumption that \(\tau_i \sim N(0, \sigma^2_{\tau})\).
The assumption that the cluster-specific parameters follow a common distribution makes the model hierarchical. Note that most models with random effects also include fixed effects, and are thus referred to as mixed-effects models or mixed models. Thus, we may refer to the model we fit using lmer
as any of the following:
- a mixed-effects (or mixed model)
- a random-effects model
- a hierarchical or multi-level model
- a random-intercept model.
Mixed-effect models are useful for analyzing data where:
- there are repeated measures on the same sample units (same site, same animal, same lake).
- the data are naturally clustered or hierarchical in nature (e.g., you are collecting data on individuals that live in packs within multiple populations).
- the experimental or sampling design involves replication at multiple levels of hierarchy (e.g. you have a split-plot design with treatments applied to plots and subplots, you have sampled multiple individuals within each of several households).
- you are interested in quantifying variability of a response across different levels of replication (e.g., among and within lakes, among and within sites, etc).
- you want to generalize to a larger population of sample units than just those that you have observed.
These latter two points will hopefully become clearer as we proceed through this Chapter.
18.5 Two-step approach to building a mixed-effects model
It can be useful to consider building hierarchical models in multiple steps. In the first step, we can build models for individual observations within a cluster (referred to as the level-1 model). In the second step, we can model how the cluster-specific parameters from stage 1 vary across clusters (referred to as the level-2 model). We will demonstrate this approach using data from a study investigating tradeoffs between maximum age of trees and their size in mountain pines (Pinus montana) in Switzerland (Bigler 2016). The data, originally downloaded from dryad here, record the (maximum) age
and diameter at breast height (dbh
) of 160 trees measured at 20 different sites (site
); I have also appended a variable that records the predominate aspect (i.e., direction of the site’s facing slope), Aspect
. Although a two-step approach to model fitting has some drawbacks, it is a useful pedagogical tool that reinforces the key feature of mixed-effects models; namely, they include cluster-specific parameters that are assumed to arise from a common statistical distribution. A two-step approach can also serve as a useful tool for exploratory data analysis, particularly for data sets that have few clusters but lots of observations per cluster; a good example is wildlife telemetry studies that collect lots of location data on few individuals.
An important consideration when fitting mixed-effects models is whether available predictors vary within a cluster (referred to as level-1 predictors) or they are constant for each cluster (level-2 predictors). In the pines
data set, age
is a level-1 predictor, whereas Aspect
is a level-2 predictor. We can see this by exploring the relationship between age
and dbh
, which varies by site (Figure 18.4). We also see that Aspect
is constant for all observations from the same site.
data(pines)
ggplot(pines, aes(age, dbh, col = Aspect)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x) +
facet_wrap(~ site)
18.5.1 Stage 1 (level-1 model)
In the first stage (or step), we build a separate model for each cluster (i.e., site
). When building this model, we can only consider variables that are not constant within a cluster. Thus, we can include age
but not Aspect
.
Level 1 model:
\[ dbh_{ij} = \beta_{0i} + \beta_{1i}age_{ij} + \epsilon_{ij} \tag{18.2}\]
In this model, each site has its own intercept \(\beta_{0i}\) and slope \(\beta_{1i}\). We can use a loop to fit this model to each site, and then pull off the coefficients, or we can accomplish the task in a few lines of code using functions in the purrr
(Henry and Wickham 2020), tidyr
(Wickham 2021), and broom
packages (Robinson, Hayes, and Couch 2021). Before fitting models to each site, we will create a centered and scaled version of age
, which will make it easier to fit mixed-effects models down the road (see Section 18.13.2). The scale
function creates a new variable, which we name agec
, by subtracting the mean age
and diving by the standard deviation of age
from each observation.
<- pines %>%
pines mutate(agec = scale(age))
Below, we provide code for fitting a model to each site using a for
loop:
<- unique(pines$site)
usite <- length(usite)
nsites <- matrix(NA, nsites, 2) # to hold slope and intercept
Beta <- matrix(NA, nsites, 1) # to hold aspect (level- covariate)
Aspect for(i in 1:nsites){
<- subset(pines, site == usite[i])
dataseti <- lm(dbh ~ agec, data = dataseti)
LMi <- coef(LMi)
Beta[i,] <- dataseti$Aspect[1]
Aspect[i]
}<- data.frame(site = usite,
betadat intercept = Beta[,1],
slope = Beta[,2],
Aspect = Aspect)
betadat
site intercept slope Aspect
1 SNP.South.18 16.097300 2.849878 South
2 SNP.North.03 20.309811 5.823510 North
3 SNP.East.27 18.953448 1.000244 East
4 SNP.North.01 19.069013 3.497746 North
5 SNP.North.02 14.697916 3.469895 North
6 SNP.East.25 19.580211 1.604887 East
7 SNP.North.05 15.136417 1.158623 North
8 SNP.South.11 22.318424 5.666733 South
9 SNP.West.31 19.140467 5.061144 West
10 SNP.West.33 14.366430 1.821074 West
11 SNP.South.13 14.189394 1.189321 South
12 SNP.South.19 19.034423 5.336379 South
13 SNP.East.24 15.108255 3.547825 East
14 SNP.South.14 13.727017 3.160221 South
15 SNP.East.28 26.010214 11.332107 East
16 SNP.West.39 13.760994 1.759141 West
17 SNP.West.38 11.105753 11.127715 West
18 SNP.West.32 12.578978 4.060440 West
19 SNP.East.22 9.500404 -8.309131 East
20 SNP.North.08 12.899520 7.010497 North
And, here is an option using functions in the tidyverse
set of packages:
# Nest the data by site, then fit a model and pull off
# the coefficients and associated information
<- pines %>% nest_by(site) %>%
betadata mutate(mod = list(lm(dbh ~ agec, data = data))) %>%
::reframe(tidy(mod))
dplyr
# Now, merge this information with Aspect using a left join
<- left_join(betadata, unique(pines[,c("site", "Aspect")])) betadata
Joining with `by = join_by(site)`
betadata
# A tibble: 40 × 7
site term estimate std.error statistic p.value Aspect
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 SNP.East.22 (Intercept) 9.50 5.49 1.73 0.127 East
2 SNP.East.22 agec -8.31 6.20 -1.34 0.222 East
3 SNP.East.24 (Intercept) 15.1 0.953 15.9 0.000000960 East
4 SNP.East.24 agec 3.55 0.747 4.75 0.00209 East
5 SNP.East.25 (Intercept) 19.6 4.81 4.07 0.00962 East
6 SNP.East.25 agec 1.60 6.25 0.257 0.807 East
7 SNP.East.27 (Intercept) 19.0 4.65 4.08 0.00954 East
8 SNP.East.27 agec 1.00 3.50 0.286 0.786 East
9 SNP.East.28 (Intercept) 26.0 3.25 8.00 0.000203 East
10 SNP.East.28 agec 11.3 3.00 3.78 0.00916 East
# ℹ 30 more rows
18.5.2 Stage 2 (level-2 model)
In the second step, we treat the coefficients from stage 1 as “data”, and we model the coefficients as a function of variables that are constant within a cluster. Let’s begin by exploring, graphically, how much the coefficients vary across the different sites. We can also see if the coefficients co-vary with our level-2 predictor, Aspect
.
ggplot(betadata, aes(Aspect, estimate)) + geom_point() + facet_wrap(~term)
age
to diameter at breast height.
There appears to be quite a bit of site-to-site variability but no strong effect of Aspect
on the coefficient estimates. We could then fit regression models relating estimates of cluster-specific parameters to our level-2 predictors (e.g., Aspect
). To facilitate further modeling, we will create a “wide” data set containing separate columns for the estimated intercepts and slopes:
<- betadata %>%
betacoefswide ::select(site, estimate, Aspect, term) %>%
dplyrpivot_wider(names_from = "term", values_from = "estimate") %>%
rename(intercept = `(Intercept)`, slope = "agec")
head(betacoefswide)
# A tibble: 6 × 4
site Aspect intercept slope
<chr> <chr> <dbl> <dbl>
1 SNP.East.22 East 9.50 -8.31
2 SNP.East.24 East 15.1 3.55
3 SNP.East.25 East 19.6 1.60
4 SNP.East.27 East 19.0 1.00
5 SNP.East.28 East 26.0 11.3
6 SNP.North.01 North 19.1 3.50
We then use this data set to explore how our cluster-specific intercepts and slopes vary and co-vary with our level-2 predictor, Aspect
:
summary(lm(intercept ~ Aspect, data = betacoefswide))
Call:
lm(formula = intercept ~ Aspect, data = betacoefswide)
Residuals:
Min 1Q Median 3Q Max
-8.3301 -2.7627 -0.7028 2.1325 8.1797
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.8305 1.8569 9.602 4.82e-08 ***
AspectNorth -1.4080 2.6260 -0.536 0.599
AspectSouth -0.7572 2.6260 -0.288 0.777
AspectWest -3.6400 2.6260 -1.386 0.185
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.152 on 16 degrees of freedom
Multiple R-squared: 0.118, Adjusted R-squared: -0.04738
F-statistic: 0.7135 on 3 and 16 DF, p-value: 0.5581
summary(lm(slope ~ Aspect, data = betacoefswide))
Call:
lm(formula = slope ~ Aspect, data = betacoefswide)
Residuals:
Min 1Q Median 3Q Max
-10.1443 -1.2390 -0.5873 1.7001 9.4969
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.835 1.905 0.964 0.350
AspectNorth 2.357 2.694 0.875 0.395
AspectSouth 1.805 2.694 0.670 0.512
AspectWest 2.931 2.694 1.088 0.293
Residual standard error: 4.259 on 16 degrees of freedom
Multiple R-squared: 0.07676, Adjusted R-squared: -0.09635
F-statistic: 0.4434 on 3 and 16 DF, p-value: 0.7252
Taken together with Figure 18.5, we might conclude that the coefficients do not vary significantly with respect to Aspect
. Thus, we could consider a level-2 model for the site-specific intercepts and slopes that allow these parameters to vary about their overall mean but not depend on Aspect
.
Level-2 model:
\[ \beta_{0i} = \beta_0 + b_{0i}\\ \] \[ \beta_{1i} = \beta_1 + b_{1i} \tag{18.3}\]
In Equation 18.3, the \(\beta_0\) and \(\beta_1\) represent the mean intercept and slope in the population of sites, respectively, and \(b_{0i}\) and \(b_{1i}\) represent deviations from these average parameters for site
\(i\).
18.5.3 Putting things together: Composite equation
We can combine the level-1 and level-2 models, substituting in the equation for the level-2 model (Equation 18.3) into the equation for the level-1 model (Equation 18.2) to form what is referred to as the composite equation:
\[ dbh_{ij} = (\beta_0 +b_{0i}) + (\beta_1+b_{1i})age_{ij} + \epsilon_{ij} \tag{18.4}\]
This leads us to what is referred to as a random coefficient or random slopes model in which both the intercepts and slopes are site-specific. Typically, we also assume that the \((b_{0i}, b_{1i}) \sim N(0,\Sigma_b)\), where \(\Sigma_b\) is a 2 x 2 matrix.
Think-Pair-Share: Why is \(\Sigma_b\) a 2 x 2 matrix? What do the diagonal and non-diagonal entries represent? We will discuss the answer to this question when we explore output from fitting this model in R in Section 18.7.
18.6 Random-intercept versus random intercept and slope model
The 2-step approach of the last section led us to consider a model that included site-specific intercepts and slope parameters associated with our level-1 covariate, agec
.4 However, in a great many cases, users default to including only random intercepts without consideration of random slopes5; this choice often leads to p-values that are too small, confidence intervals that are too narrow, and overconfident conclusions when evaluating the importance of level-1 covariates (Schielzeth and Forstmeier 2009; Muff, Signer, and Fieberg 2020). As noted by Schielzeth and Forstmeier (2009), random slopes should generally be considered for predictors that vary within a cluster, especially when cluster-specific parameters exhibit a high degree of variation (Figure 18.6 a), when within cluster variability is low (Figure 18.6 b), and when there are lots of observations per cluster (Figure 18.6 c). In addition to Schielzeth and Forstmeier (2009), Arnqvist (2020) and Silk, Harrison, and Hodgson (2020) also identify this issue as being one that is neglected by all too many users of mixed-effect models.
18.7 Fitting mixed-effects models in R
There are several R packages that can be used to fit mixed-effects models. In this book we will consider 4 of them:
nlme
(which only fits linear mixed effects models) (Jose Pinheiro et al. 2021)lme4
(Bates et al. 2015)glmmTMB
(Brooks et al. 2017)GLMMadaptive
(Rizopoulos 2021)
We have already seen the nlme
package in Chapter 5, which we used to fit linear models that relaxed the constant variance assumption using generalized least squares. The nlme
package is older and less commonly used to fit mixed-effect models than other packages (e.g., lme4
and glmmTMB
). The primary advantage of the nlme
package is that it can be used to fit mixed models that relax the assumption that within-cluster errors, \(\epsilon_{ij}\), are independent and have constant variance (i.e., one can account for temporal or spatial autocorrelation and non-homogeneous variance). Yet, the nlme
package does not include methods for fitting generalized linear mixed effects models (Chapter 19), and it is slowly becoming outdated. The lme4
package is arguably the most popular package for fitting mixed-effects models, and it will be the focus of this chapter. The glmmTMB
package is newer on the scene and can be quicker when fitting generalized linear mixed effects models to response data that are not Normally distributed (Brooks et al. 2017). It uses syntax that is similar to lme4
but allows for a wider range of models (e.g., one can fit models that account for zero-inflation, and glmmTMB
can also fit mixed-effects models using a negative binomial distribution). Thus, we will explore glmmTMB
when we get to Chapter 19. Lastly, the GLMMadaptive
package has certain advantages when fitting generalized linear mixed effects models, particularly when it comes to estimating population-level response patterns. We will explore this package in Chapter 19. For now, we will focus on implementations using the lmer
function in the lme4
package.
18.7.1 Random intercept model
Let’s first fit a model containing a random intercept for each site
by adding (1 | site)
to the model formula (recall, 1
is used to refer to the intercept).
<- lmer(dbh ~ agec + (1 | site), data = pines)
lmer.ri summary(lmer.ri)
Linear mixed model fit by REML ['lmerMod']
Formula: dbh ~ agec + (1 | site)
Data: pines
REML criterion at convergence: 918.8
Scaled residuals:
Min 1Q Median 3Q Max
-1.9218 -0.7456 -0.0585 0.6020 2.8824
Random effects:
Groups Name Variance Std.Dev.
site (Intercept) 4.216 2.053
Residual 16.147 4.018
Number of obs: 160, groups: site, 20
Fixed effects:
Estimate Std. Error t value
(Intercept) 16.4743 0.5608 29.375
agec 2.4305 0.4100 5.928
Correlation of Fixed Effects:
(Intr)
agec 0.012
Think-Pair-Share: try to describe the model using a set of equations and link the parameters in the equations to the output. We will leave this exercise to the reader but provide this information for the random intercept and slope model below.
18.7.2 Random intercepts and slopes model
Let’s next fit a model with random intercpets and random slopes for agec
. To do so, we add (1 + agec | site)
to the model formula, letting lmer
know that we want each site
to have its own intercept and slope for agec
.
<- lmer(dbh ~ agec + (1 + agec | site), data = pines)
lmer.rc summary(lmer.rc)
Linear mixed model fit by REML ['lmerMod']
Formula: dbh ~ agec + (1 + agec | site)
Data: pines
REML criterion at convergence: 917.4
Scaled residuals:
Min 1Q Median 3Q Max
-1.90307 -0.72242 -0.08194 0.58100 2.93686
Random effects:
Groups Name Variance Std.Dev. Corr
site (Intercept) 3.740 1.934
agec 1.122 1.059 0.25
Residual 15.631 3.954
Number of obs: 160, groups: site, 20
Fixed effects:
Estimate Std. Error t value
(Intercept) 16.5258 0.5601 29.506
agec 2.4502 0.4934 4.966
Correlation of Fixed Effects:
(Intr)
agec 0.157
Note, we could have omitted the 1
, as lmer
by default includes a random intercept whenever a random slope is included. Thus, lmer(dbh ~ agec + (age | site), data = pines)
fits the same model as the one above.
Let’s write down this model using a set of equations and find the estimated parameters in the R output:
\[ dbh_{ij} = (\beta_0 +b_{0i}) + (\beta_1+b_{1i})age_{ij} + \epsilon_{ij} \nonumber \\ \] \[ \epsilon_{ij} \sim N(0, \sigma^2_{\epsilon}) \nonumber \\ \] \[ \begin{bmatrix} b_{0i}\\ b_{1i} \end{bmatrix} \sim N(\mu, \Sigma_b), \text { with } \mu = \begin{bmatrix} 0\\ 0 \end{bmatrix}, \Sigma_b = \begin{bmatrix} \sigma^2_{b_0} & Cov(b_0, b_1)\\ Cov(b_0, b_1) & \sigma^2_{b1} \end{bmatrix}\\ \tag{18.5}\]
\(\sigma^2_{b_0}\) is the variance of \(b_{0i}\), \(\sigma^2_{b_1}\) is the variance of \(b_{1i}\), and \(Cov(b_0, b_1)= E[b_{0i} b_{1i}] - E[b_{0i}]E[b_{1i}]\)is the covariance of \(b_{0i}\) and \(b_{1i}\)6. If you are unfamiliar with covariances, they quantify the degree to which two random variables co-vary (i.e., whether high values of \(b_{0i}\) are associated with low/high values of \(b_{1i}\)). We can also calculate the covariance from the correlation of two random variables and their standard deviations:
\[ Cor(b_0, b_1) = \frac{Cov(b_0, b_1)}{\sigma_{b_0}\sigma_{b_1}} \\ \] \[ \implies Cov(b_0, b_1) = Cor(b_0, b_1)\sigma_{b_0}\sigma_{b_1} \tag{18.6}\]
The estimates of \(\beta_0\) and \(\beta_1\) are under the label Fixed effects
and are equal to 16.52 and 2.45, respectively. The estimates of \(\sigma^2_{b_0}\), \(\sigma^2_{b_1}\), and \(\sigma^2_{\epsilon}\) are under the label Random effects
and are equal to 3.74, 1.12, and 15.63, respectively. Note that the Std.dev
column next to the Variance
column just translates these variances to standard deviations (i.e., \(sd(b_1) = \sqrt{\sigma^2_{b_1}} = 1.059\)). The R output from the summary
function provides \(Cor(b_0, b_1)\) rather than \(Cov(b_0, b_1)\). We can calculate \(Cov(b_0, b_1)\) using Equation 18.6, \(Cov(b_0, b_1) = (0.25)(1.934)(1.059) = 0.512\).
This is a good opportunity to highlight the difference between variability and uncertainty. We describe uncertainty using SEs and confidence intervals. For example, the SEs for the fixed-effects parameters (Std. Error
column) quantify the extent to which our estimates of \(\beta_0\) and \(\beta_1\) would vary across repeated random trials7. Our random intercept and slope model also contains 4 variance/covariance parameters that describe variability in the response variable (within and among sites): \(\sigma^2_{b_0}\), \(\sigma^2_{b_1}\), \(Cov(b_0, b_1)\), and \(\sigma^2_{\epsilon}\). It is important to understand that these parameters do not describe uncertainty in the estimated parameters like the SEs do for \(\beta_0\) and \(\beta_1\)!
Think-Pair-Share: do \(\hat{\sigma}^2_{b_0}\), \(\hat{\sigma}^2_{b_0}\), \(\widehat{cov}(b_0, b_1)\), and \(\hat{\sigma}^2_{\epsilon}\) have sampling distributions?
You bet they do! We estimate these parameters, and therefore, they will have a sampling distribution. We are not given any information here about the uncertainty associated with these parameters, and it turns out that their sampling distribution is usually right skewed. Thus, Normal-based confidence intervals usually do not work well, and SEs do not provide an adequate measure of uncertainty. We can, however, calculate confidence intervals using the profile-likelihood approach introduced in Section 10.10, which is what the confint
function does by default (see below):
confint(lmer.rc)
2.5 % 97.5 %
.sig01 0.4441978 3.247535
.sig02 -1.0000000 1.000000
.sig03 0.0000000 2.293833
.sigma 3.5168693 4.498968
(Intercept) 15.3979835 17.754390
agec 1.2692533 3.529877
R returns confidence intervals for \(\sigma_{b0}\) (.sigma01
), \(Cor(b_{0i}, b_{1i})\) (.sig02
), \(\sigma_{b1}\) (.sig03
), and \(\sigma_{epsilon}\) (.sigma
). We see that the confidence intervals for the standard deviations are asymmetric and the confidence interval for \(Cor(b_{0i}, b_{1i})\) spans the entire range from -1 to 1 (i.e., we do not estimate this parameter well). We can verify by adding the argument oldNames = FALSE)
to the confint
function.
confint(lmer.rc, oldNames=FALSE)
2.5 % 97.5 %
sd_(Intercept)|site 0.4441978 3.247535
cor_agec.(Intercept)|site -1.0000000 1.000000
sd_agec|site 0.0000000 2.293833
sigma 3.5168693 4.498968
(Intercept) 15.3979835 17.754390
agec 1.2692533 3.529877
Lastly, note that \(Var(\beta_0 + b_{0i}) = Var(b_{0i}) = \sigma^2_{b_0}\) because \(\beta_0\) is a constant. Similarly \(Var(\beta_1 + b_{1i}) = Var(b_{1i}) = \sigma^2_{b_1}\). Thus, we can think of these variance parameters as describing the variance of site-specific deviations from the mean intercept and slope or variability of the site-specific intercepts and slopes themselves.
18.8 Site-specific parameters (BLUPs)
When looking at the output in the last section, you might have wondered why R does not provide estimates of the site-specific parameters – i.e., the intercepts \((\beta_0 +b_{0i})\) and slopes \((\beta_1 +b_{1i})\) for each site. The deviations from the average intercept and slope, \(b_{0i}\) and \(b_{1i}\), are also missing in action. What gives?
In this section, we will learn how to generate “estimates” of site-specific parameters from the mixed-effects model. However, in a frequentist framework, we will actually refer to the estimated quantities as “predictions”, or more specifically, Best Linear Unbiased Predictions (BLUPs) in the context of linear mixed effects models. This distinction arises in frequentist inference because the \(b_{0i}\)s and \(b_{1i}\)s are considered to be random variables rather than parameters with fixed but unknown values. If this seems confusing, you are not alone. This is another situation where life just seems easier if you are a Bayesian, since all parameters are treated as random variables and assigned probability distributions. There is nothing philosophically different about random effects if you are a Bayesian, other than these parameters will all be assigned a common distribution. We will then need to specify priors for parameters in that common distribution. We will see how to implement linear mixed effects models in JAGS in Section 18.16.
We can generate the BLUPs for \(b_{0i}\) and \(b_{1i}\) in R using the ranef
function in the lme4
package
ranef(lmer.rc)
$site
(Intercept) agec
SNP.East.22 1.0581009 -0.539278703
SNP.East.24 -0.9713254 0.511836750
SNP.East.25 2.0985073 -0.195316866
SNP.East.27 1.9471416 -0.659731780
SNP.East.28 0.4036588 0.341754637
SNP.North.01 1.5173623 0.510452398
SNP.North.02 -0.5543028 0.087613361
SNP.North.03 1.8144629 0.262796638
SNP.North.05 -0.8777090 -0.282413026
SNP.North.08 -0.5255991 0.754697359
SNP.South.11 2.2017430 0.493258625
SNP.South.13 -1.9043070 -0.960482225
SNP.South.14 -1.7227241 -0.131886984
SNP.South.18 -0.3500402 0.005026152
SNP.South.19 -0.1677626 0.242216046
SNP.West.31 1.6388877 0.882281861
SNP.West.32 -1.9592144 -0.291839131
SNP.West.33 -1.6116716 -0.556945196
SNP.West.38 -0.1456621 0.214895844
SNP.West.39 -1.8895462 -0.688935759
with conditional variances for "site"
If we want an estimate of uncertainty to accompany the BLUPs, we can use the tidy
function in the broom.mixed
package (Bolker and Robinson 2021):
::tidy(lmer.rc, effects="ran_vals") broom.mixed
# A tibble: 40 × 6
effect group level term estimate std.error
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 ran_vals site SNP.East.22 (Intercept) 1.06 1.31
2 ran_vals site SNP.East.24 (Intercept) -0.971 1.10
3 ran_vals site SNP.East.25 (Intercept) 2.10 1.32
4 ran_vals site SNP.East.27 (Intercept) 1.95 1.45
5 ran_vals site SNP.East.28 (Intercept) 0.404 1.40
6 ran_vals site SNP.North.01 (Intercept) 1.52 1.36
7 ran_vals site SNP.North.02 (Intercept) -0.554 1.14
8 ran_vals site SNP.North.03 (Intercept) 1.81 1.15
9 ran_vals site SNP.North.05 (Intercept) -0.878 1.18
10 ran_vals site SNP.North.08 (Intercept) -0.526 1.11
# ℹ 30 more rows
We can also easily plot the BLUPs for \(b_{0i}\) and \(b_{1i}\) using the plot_model
function in the sjPlot
library (Lüdecke 2021) (Figure 18.7)
plot_model(lmer.rc, type="re")
plot_model
function in the sjPlot
package.
Lastly, we can calculate predictions of the cluster-specific parameters by adding \(\hat{\beta}_0\) and \(\hat{\beta}_1\) to the BLUPs for \(b_{0i}\) and \(b_{1i}\). These estimates can also be extracted directly using the coef
function:
coef(lmer.rc)
$site
(Intercept) agec
SNP.East.22 17.58393 1.910931
SNP.East.24 15.55450 2.962047
SNP.East.25 18.62434 2.254893
SNP.East.27 18.47297 1.790478
SNP.East.28 16.92949 2.791965
SNP.North.01 18.04319 2.960662
SNP.North.02 15.97153 2.537823
SNP.North.03 18.34029 2.713007
SNP.North.05 15.64812 2.167797
SNP.North.08 16.00023 3.204907
SNP.South.11 18.72757 2.943469
SNP.South.13 14.62152 1.489728
SNP.South.14 14.80310 2.318323
SNP.South.18 16.17579 2.455236
SNP.South.19 16.35807 2.692426
SNP.West.31 18.16472 3.332492
SNP.West.32 14.56661 2.158371
SNP.West.33 14.91416 1.893265
SNP.West.38 16.38017 2.665106
SNP.West.39 14.63628 1.761274
attr(,"class")
[1] "coef.mer"
Note, however, that it is not straightforward to calculate uncertainty associated with the cluster-specific intercepts \((\beta_0 +b_{0i})\) and slopes \((\beta_1 +b_{1i})\) as this would require consideration of uncertainty in both \(\hat{\beta}_0, \hat{\beta}_1\) and \(\hat{b}_{0i}, \hat{b}_{1i}\), including how these estimates and predictions, respectively, co-vary (see this link on Ben Bolker’s GLMM FAQ page).
18.8.1 Digging deeper, how are the BLUPS calculated?
To understand how how BLUPS are calculated from a fitted mixed-effects model, it is helpful to write the model down in terms of the conditional distribution of \(Y\) given the random effects (i.e., \(f_{Y | b}(y)\)) and the distribution of the random effects, \(f_b(b)\). This way of writing down the model also makes it easy to generalize to other types of probability distributions (i.e., generalized linear mixed effect models), which we will see in Chapter 19.
We can describe our random intercept and slope model from the last section as:
\[ dbh_{ij} | b_{0i}, b_{1i} \sim N(\mu_i, \sigma^2_{\epsilon}) \tag{18.7}\] \[ \mu_{ij} = (\beta_0 +b_{0i}) + (\beta_1+b_{1i})age_{ij} \nonumber \\ \begin{bmatrix} b_{0i}\\ b_{1i} \end{bmatrix} \sim N(0, \Sigma_b), \text { with } \Sigma_b = \begin{bmatrix} \sigma^2_{b_0} & Cov(b_0, b_1)\\ Cov(b_0, b_1) & \sigma^2_{b1} \end{bmatrix} \tag{18.8}\]
And, more generally, we can write down any linear mixed effects model using:
\[\begin{gather} Y_{ij} \mid b \sim N(\mu_i, \sigma^2_{\epsilon}) \nonumber \\ \mu_{ij} = X\beta + Zb \\ b \sim N(0, \Sigma_b) \nonumber \end{gather}\]
Here, \(X\) and \(Z\) are design matricies associated with the fixed and random effects, respectively. For a random-intercept model, Z would contain a column of 1s. For our random intercept and slope model, \(Z\) would also contain a column with the values of agec
.
The BLUPs, which are returned by the ranef
function, are estimated by maximizing:
\[\begin{gather} f_{b|Y}(b) = \frac{f_{Y|b}(y)f_b(b)}{f_Y(y)} \end{gather}\]
where \(f_{b|Y}(b)\) is the conditional distribution of the random effects given the observed data, \(f_b(b)\) is the distribution of the random effects (given by Equation 18.8), and \(f_{Y|b}(y)\) is the conditional distribution of \(Y |b\) (given by Equation 18.7). The BLUPs are often referred to as conditional modes and have a Bayesian flavor to them. To calculate the BLUPs, we have to substitute in the estimated fixed effects and variance parameters into our equations for \(f_{Y|b}(y)\) and \(f_b(b)\), leading to what is sometimes referred to as an Empirical Bayes estimator.
18.9 Fixed effects versus random effects and shrinkage
Earlier, I hinted at the fact that we could estimate site-specific intercepts and slopes using a linear model without random effects. Let’s go ahead and fit a fixed-effects model with site-specific intercepts and slopes and compare our results to those from the mixed model:
<- lm(dbh ~ site + agec + site : agec, data=pines)
lm.fc summary(lm.fc)
Call:
lm(formula = dbh ~ site + agec + site:agec, data = pines)
Residuals:
Min 1Q Median 3Q Max
-8.1059 -2.3715 -0.0893 2.0420 9.9533
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.500 4.879 1.947 0.05383 .
siteSNP.East.24 5.608 5.059 1.109 0.26985
siteSNP.East.25 10.080 5.791 1.741 0.08430 .
siteSNP.East.27 9.453 6.058 1.561 0.12127
siteSNP.East.28 16.510 7.246 2.279 0.02446 *
siteSNP.North.01 9.569 7.653 1.250 0.21361
siteSNP.North.02 5.198 5.340 0.973 0.33233
siteSNP.North.03 10.809 5.262 2.054 0.04212 *
siteSNP.North.05 5.636 5.100 1.105 0.27137
siteSNP.North.08 3.399 5.238 0.649 0.51762
siteSNP.South.11 12.818 5.386 2.380 0.01890 *
siteSNP.South.13 4.689 5.159 0.909 0.36524
siteSNP.South.14 4.227 5.100 0.829 0.40889
siteSNP.South.18 6.597 5.472 1.206 0.23038
siteSNP.South.19 9.534 6.542 1.457 0.14760
siteSNP.West.31 9.640 5.103 1.889 0.06128 .
siteSNP.West.32 3.079 5.265 0.585 0.55980
siteSNP.West.33 4.866 5.412 0.899 0.37040
siteSNP.West.38 1.605 6.096 0.263 0.79273
siteSNP.West.39 4.261 5.354 0.796 0.42775
agec -8.309 5.500 -1.511 0.13351
siteSNP.East.24:agec 11.857 5.600 2.117 0.03629 *
siteSNP.East.25:agec 9.914 6.832 1.451 0.14933
siteSNP.East.27:agec 9.309 6.129 1.519 0.13142
siteSNP.East.28:agec 19.641 7.393 2.657 0.00897 **
siteSNP.North.01:agec 11.807 9.197 1.284 0.20170
siteSNP.North.02:agec 11.779 5.741 2.052 0.04238 *
siteSNP.North.03:agec 14.133 6.835 2.068 0.04081 *
siteSNP.North.05:agec 9.468 6.039 1.568 0.11955
siteSNP.North.08:agec 15.320 5.845 2.621 0.00990 **
siteSNP.South.11:agec 13.976 5.882 2.376 0.01909 *
siteSNP.South.13:agec 9.498 5.605 1.695 0.09272 .
siteSNP.South.14:agec 11.469 6.147 1.866 0.06450 .
siteSNP.South.18:agec 11.159 7.677 1.454 0.14866
siteSNP.South.19:agec 13.646 6.767 2.016 0.04600 *
siteSNP.West.31:agec 13.370 5.730 2.333 0.02129 *
siteSNP.West.32:agec 12.370 6.156 2.009 0.04674 *
siteSNP.West.33:agec 10.130 6.046 1.676 0.09641 .
siteSNP.West.38:agec 19.437 7.852 2.475 0.01470 *
siteSNP.West.39:agec 10.068 5.817 1.731 0.08607 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.93 on 120 degrees of freedom
Multiple R-squared: 0.4727, Adjusted R-squared: 0.3013
F-statistic: 2.758 on 39 and 120 DF, p-value: 1.301e-05
This model gives us direct estimates of intercepts and slopes, and estimates of their uncertainties, for each site. If we are predominately interested in these particular sites, then this model is much more straightforward to work with (e.g., we do not have to try to understand what BLUPs are!). On the other hand, this model requires fitting a lot of parameters (40 in total) and it does not allow us to generalize to other sites that we may not have sampled. In addition, we are unable to consider level-2 predictors in this model since they will be confounded with the site-specific intercepts (to see this, we can try adding Aspect
to the model, which will result in NA
for several of the coefficient estimates):
<- lm(dbh ~ site + agec + site:agec + Aspect, data=pines)
lm.fc2 coef(lm.fc2)
(Intercept) siteSNP.East.24 siteSNP.East.25
9.500404 5.607851 10.079807
siteSNP.East.27 siteSNP.East.28 siteSNP.North.01
9.453044 16.509810 9.568609
siteSNP.North.02 siteSNP.North.03 siteSNP.North.05
5.197512 10.809407 5.636013
siteSNP.North.08 siteSNP.South.11 siteSNP.South.13
3.399116 12.818020 4.688990
siteSNP.South.14 siteSNP.South.18 siteSNP.South.19
4.226613 6.596896 9.534019
siteSNP.West.31 siteSNP.West.32 siteSNP.West.33
9.640063 3.078574 4.866026
siteSNP.West.38 siteSNP.West.39 agec
1.605349 4.260590 -8.309131
AspectNorth AspectSouth AspectWest
NA NA NA
siteSNP.East.24:agec siteSNP.East.25:agec siteSNP.East.27:agec
11.856956 9.914018 9.309375
siteSNP.East.28:agec siteSNP.North.01:agec siteSNP.North.02:agec
19.641238 11.806878 11.779026
siteSNP.North.03:agec siteSNP.North.05:agec siteSNP.North.08:agec
14.132642 9.467754 15.319628
siteSNP.South.11:agec siteSNP.South.13:agec siteSNP.South.14:agec
13.975865 9.498452 11.469352
siteSNP.South.18:agec siteSNP.South.19:agec siteSNP.West.31:agec
11.159010 13.645510 13.370275
siteSNP.West.32:agec siteSNP.West.33:agec siteSNP.West.38:agec
12.369572 10.130205 19.436846
siteSNP.West.39:agec
10.068272
Otherwise, how do our estimates of site-specific parameters from the fixed-effect model compare to the BLUPs from the mixed-effect model? Let’s plot them together to see! To facilitate comparison, we will refit our fixed effects model using means coding. Then, we create a data set with the coefficients from both models for plotting.
<- lm(dbh ~ site + agec + site : agec - 1 - agec, data=pines)
lmmeans.fc <- coef(lmmeans.fc)
coef.fixed <- coef(lmer.rc)$site
coef.random <- data.frame(intercepts = c(coef.fixed[1:20], coef.random[,1]),
coefdat slopes = c(coef.fixed[21:40], coef.random[,2]),
method = rep(c("Fixed", "Random"), each=20),
site = rep(names(coef.fixed)[1:20],2))
Figure 18.8 compares the estimates of site-specific intercept and slope parameters from the fixed-effects and mixed-effects models. The estimated mean intercept (\(\hat{\beta}_0\)) and slope (\(\hat{\beta}_1\)) in the population of sites is also depicted by a black horizontal line.
<- ggplot(coefdat, aes(site, intercepts, col=method)) +
p1 geom_point(size=2)+ ggtitle("Intercepts")+
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_hline(yintercept=summary(lmer.rc)$coef[1,1])
<- ggplot(coefdat, aes(site, slopes, col=method)) +
p2 geom_point(size=2)+ ggtitle("Slopes")+
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_hline(yintercept=summary(lmer.rc)$coef[2,1])
::ggarrange(p1, p2, ncol=2, common.legend=TRUE, legend="bottom") ggpubr
Alternatively, we could compare the two sets of coefficients by plotting the (intercept, slope) pairs and connecting them for the two methods (Figure 18.9).
ggplot(coefdat, aes(intercepts, slopes, group=site, col=method)) +
geom_point(size=2) + geom_path()
We see that the coefficients for the random effects are shrunk back towards the estimated mean coefficients in the population (\(\hat{\beta}_0\) and \(\hat{\beta}_1\)). The degree of shrinkage depends on the amount of information available for each site and also on the relative magnitude of the variance among sites (\(\hat{\sigma}^2_{b_0}, \hat{\sigma}^2_{b_1}\)) relative to the variance within sites (\(\hat{\sigma}^2_{\epsilon}\)). Coefficients for sites that have few observations and that lie far from the population mean will exhibit more shrinkage than those with more observations and that lie closer to the population mean. Coefficients will also exhibit more shrinkage in applications where the within-site variability is large relative to the total (within- plus among-site) variability.
Recall that what makes random effects different from fixed effects is that we assume our random-effect parameters all come from a common distribution. If parameters come from a common distribution, then it makes sense to “borrow information” from other sites when estimating site-specific parameters. The extent to which information from other sites is relevant depends on how similar sites are to each other (i.e., the among-site variance). And the value of using other information from other sites increases in cases where the information content from within-site observations is low. Thus, mixed-effect models provide a sensible way to weight information coming from within and among sites.
18.10 Fitted/predicted values from mixed-effects models
Now that we have explored how to estimate (or predict) site-specific parameters, we may also consider two types of predictions when fitting mixed-effect models. Specifically, we may be interested in exploring predictor-response relationships within specific sites or within the population (averaging across different sites). These two types of predictions are often referred to as subject-specific and population-averaged response patterns (e.g., Fieberg et al. 2009) or conditional and marginal response patterns, respectively (Muff, Held, and Keller 2016).
Subject-specific mean response patterns (associated with each sampled site):
- \(E[dbh | age, b_{0i}, b_{1i}] = (\beta_0 + b_{0i}) + (\beta_1 + b_{1i})age_{ij}\)
Population-averaged mean response patterns (associated with the population of sites):
- \(E[dbh | age] = E(E[dbh | X, b_{0i}, b_{1i}])) = \beta_0 + \beta_1age\)
We can generate these two types of predictions using the predict
function. By default, this will return subject-specific responses: \(\widehat{dbh}_{ij} = (\hat{\beta}_0 + \hat{b}_{0i}) + (\hat{\beta}_1 + \hat{b}_{1i})age_{ij}\). We compare site-specific response patterns from the mixed-effects model (red) to those obtained from the fixed-effects model (blue) in Figure 18.10).
# Subject-specific regression lines from the mixed-effects model
$sspred.rc<-predict(lmer.rc)
pines
# Subject-specific regression lines from the fixed-effects model
$fepred<-predict(lmmeans.fc)
pines
# Plot
ggplot(pines, aes(age, dbh)) +
geom_point(size = 2) +
geom_line(aes(age, sspred.rc), lwd = 1.3, col = "red") +
geom_line(aes(age, fepred), lwd = 1.3, col = "blue") +
facet_wrap( ~ site)
We see that the most of the fitted lines are similar between the two approaches (fixed-effects only and mixed-effects model). However, the estimated slope for SNP.East.22
(upper left corner) was negative when estimated from the fixed-effects model and positive when estimated from the mixed-effects model. This result again demonstrates how random effects “borrow information” across sites. The mixed-effects model assumes that all of the slope parameters come from a common distribution. In the fixed-effects model, all of the other slope estimates were positive, and site SNP.East.22
had a limited range of observed ages. Thus, the mixed-effect model infers that the slope for this site should look like the others and should be positive.
What if we had only included random intercepts, as many ecologists tend to do, or fit a fixed-effect model without the interaction between site
and agec
? Let’s create a similar plot, but this time comparing the random intercept (only) model to the model containing both random intercepts and random slopes.
# Subject-specific lines from random intercept mixed-effects model
$sspred.ri <- predict(lmer.ri)
pines
# Subject-specific regression lines from the fixed-effects model
$fpred2<-predict(lm(dbh ~ site + agec, data = pines))
pines
# Plot
ggplot(pines, aes(age, dbh)) +
geom_point(size = 2) +
geom_line(aes(age, sspred.ri), lwd = 1.3, col = "red") +
geom_line(aes(age, fpred2), lwd = 1.3, col = "blue") +
facet_wrap( ~ site)
dbh
to tree age using a fixed-effects (only) model (blue) and a mixed-effects model with random intercepts (red). In both cases, the effect of age
is assumed to be constant across sites.
We see that both methods assume the effect of age
is constant for all sites, but that each site has its own intercept. The slope parameters are similar for the two methods, but not identical. In addition, the intercepts differ in many cases due to the shrinkage noted previously.
We can also use the predict
function to generate population-averaged response curves by supplying an extra argument, re.form = ~ 0
, or equivalently, re.form = ~ NA
.8
$papred <- predict(lmer.rc, re.form = ~0)
pines
ggplot(pines, aes(age, dbh, col = site)) +
geom_point(size = 2) +
geom_line(aes(age, papred), lwd = 1.3, col = "red")
dbh
to age
using a mixed model containing random intercepts and slopes.
The predict
function can also be used to generate predictions for new data. Note, however, that when generating subject-specific predictions, the data associated with the newdata
argument must include columns corresponding to all of the grouping variables used to specify the random effects in the model (in this case, site
).
If you look at the help page for predict.merMod
9, you will see that it states:
There is no option for computing standard errors of predictions because it is difficult to define an efficient method that incorporates uncertainty in the variance parameters; we recommend
bootMer
for this task.
We illustrate the use of the bootMer
function in the lme4
package to calculate uncertainty in our subject-specific predictions, below10:
set.seed(101)
<- bootMer(lmer.rc,
boot.pred FUN=function(x){predict(x)},
nsim=500)
The bootMer
function will store the bootstrap statistics (i.e., our predictions for each observation) in a matrix named t
. The number of rows will equal the number of bootstrap replicates and the number of columns is equal to the number of observations in our data set.
dim(boot.pred$t)
[1] 500 160
nrow(pines)
[1] 160
We can use the apply
function to calculate quantiles of the bootstrap distribution for each observation, allowing us to construct 95% confidence intervals for the site-specific response patterns11:
<-apply(boot.pred$t, 2, quantile, c(0.025, 0.975)) predboot.CI
We then append these values to our pines data set.
$lowCI <-predboot.CI[1,]
pines$upCI <-predboot.CI[2,] pines
Then, we re-create our plot with the site-specific predictions (Figure 18.13).
# Plot
ggplot(pines, aes(age, dbh)) +
geom_point(size=2)+
geom_line(aes(age, sspred.rc), lwd=1.3, col="red") +
geom_line(aes(age, lowCI), lwd=1.3, col="red", lty=2) +
geom_line(aes(age, upCI), lwd=1.3, col="red", lty=2) +
facet_wrap(~site)
dbh
to age
using a mixed model containing random intercepts and slopes. A bootstrap was used to calculate pointwise 95-percent confidence intervals for site-specific lines.
One thing that sticks out in Figure 18.13 is that the point estimates do not always fall in the middle of the confidence interval for many of the sites. Recall that the mixed-effects models borrow information across sites. Like other statistical methods that borrow information by smoothing (e.g., generalized additive models, kernel density estimators), or induce shrinkage (e.g., the LASSO estimator from Section 8.7), the subject-specific predictions from the mixed-effects model trade off some level of bias to increase precision. This property makes it challenging to calculate proper confidence intervals for quantities that are a function of \(\hat{b}_{0i}, \hat{b}_{1i}\). If you inspect the boot.pred
object we created, you will see that bootMer
returns several non-zero estimates of bias associated with the subject-specific predictions.
18.11 Model assumptions
The assumptions of our mixed-effects model are similar to those of linear regression, except we have added another assumption that our random-effects come from a common Normal distribution. Below, we write these assumptions for the random intercept and slopes model; assumptions for the random intercept model can be inferred by setting \(b_{1i}\) to 0.
Constant within-cluster variance: the within-cluster errors, \(\epsilon_{ij}\), have constant variance (i.e., there is constant scatter about the regression line for each cluster); \(Var(\epsilon_{ij}) = \sigma^2_{\epsilon}\).
Independence: within-cluster errors, \(\epsilon_{ij}\), are independent (both within clusters and across clusters). We also assume the within-cluster errors, \(\epsilon_{ij}\), are independent of the random effects, \((b_{0i}, b_{1i})\).
Linearity: we may state this assumption in terms of the mean response pattern for a particular site, \(E[Y_i \mid X_i, b_{0i}, b_{1i}] = (\beta_0+ b_{0,i})+ (\beta_1+b_{1i})X_i\), or for the mean response pattern averaged across sites, \(E[Y_i \mid X_i] = \beta_0 + \beta_1X_i\) (see Section 18.10). We can derive the latter expression by noting that \(E[b_{0i}] = E[b_{1i}] = 0\).
Normality: the \(\epsilon_{ij}\) follow a Normal (Gaussian) distribution; in addition, the random effects, \((b_{0i}, b_{1i})\) are normally distributed: \((b_{0i}, b_{1i}) \sim N(0, \Sigma_b)\)
We can evaluate these assumptions using similar methods to those used to evaluate the assumptions of linear regression (Section 1.5):
- residual versus fitted values plots to evaluate the linearity and constant variance assumptions
- qqplots or histograms/density plots to evaluate assumptions of normality
18.11.1 Evaluating assumptions
There is a built-in plot
function that works with mixed-effect models fit using lme
or lmer
. This function plots residuals versus subject-specific fitted values, which we explore below for the random-intercept and random intercept and slope models.
<-plot(lmer.ri, main = "Random intercept") # residual versus fitted
ri<-plot(lmer.rc, main = "Random intercept and slope")
rcgrid.arrange(ri, rc, ncol=2)
As with linear regression, we are looking to see whether the residuals have constant scatter as we move from left to right and that there are no trends that would indicate a missing predictor or the need to allow for a non-linear relationship with one of the predictors. These plots generally look OK to me.
We can evaluate the Normality assumption for the \(\epsilon_{ij}\) and for the random effects using qqplots. The check_model
function in the performance
package (Lüdecke et al. 2021) will produce these plots, along with standard residual plots used to evaluate linearity and constant variance assumptions of linear regression (Figure 18.15 and Figure 18.16).
check_model(lmer.ri)
check_model
function in the performance
package (Lüdecke et al. 2021) for the random intercept model fit to the pines
data set.
The top-right plot is our standard residual versus fitted value plot (the same one that we saw with the default plot
method applied to our linear model object). Except for a few outlying points near the extreme left and right of the plot which cause the smooth (green) line to deviate from a horizontal line at 0, the residuals appear to have constant scatter and very little trend. Thus, the linearity assumption seems reasonable. The middle-left plot is the scale-location plot that we encountered previously for evaluating the constant variance assumption of linear regression models in Section 1.5. Here, we are hoping to see points that have constant scatter (and a flat green trend line), indicating the residuals have constant variability. There may be a slight increase in variance for larger fitted values, but overall, this plot doesn’t look too bad to me. The middle-right plot shows standardized residuals plotted against leverage; observations with high values for leverage have predictor values that fall far from the sample mean of the predictor values and are potentially influential observations. We have largely ignored this plot when discussing linear models, but we might consider looking into the points that fall outside the green dotted lines to see if they are in error or if dropping them substantially changes our estimates. Of particular concern would be observations that have large residuals and high leverage (i.e., points in the upper or lower right corners of the plot). The next two plots are used to examine the Normality assumption of the within-group errors, \(\epsilon_{ij}\). There are some larger-than-expected residuals that fall far from the line in the QQ-plot (lower-left plot), but these observations largely fall with in the simulation bounds. Lastly, the lower-right plot can be used to evaluate whether the random intercepts are Normally distributed. We see that there are a few sites with intercepts that fall off the QQ-line, but the individual estimates have a lot of uncertainty, with confidence intervals that overlap the QQ-line. Lastly, the top left panel provides a posterior predictive check, which is similar in flavor to our Bayesian goodness-of-fit test that we have seen in other chapters. To create this plot, several data sets are simulated the fitted model and their distribution is plotted (light blue lines) along with the distribution of the original data (thicker green line). The distributions of simulated data appear similar to the distribution of the observed data, giving us further confidence that the model assumptions are met and the model is appropriate for the data.
We can create the same set of plots for the random intercept and slope model (Figure 18.16). In this case, we get one more QQ-plot for the random slopes (lower-right plot). Otherwise, the plots look similar to those from the random intercept model.
check_model(lmer.rc)
check_model
function in the performance
package (Lüdecke et al. 2021) for the random intercept and slope model fit to the pines
data set.
Somewhat subjectively, I would conclude that there are no major issues with the assumptions of either model.
18.12 Model comparisons, hypothesis tests, and confidence intervals for fixed-effects and variance parameters
In this section, we will explore methods available for constructing hypothesis tests and confidence intervals for model parameters, and for selecting an appropriate model.
18.12.1 Estimation using restricted maximum likelihood (REML) or maximum likelihood (ML)
One topic we have avoided so far, but will come up when discussing hypothesis tests for fixed and random-effects parameters, is the difference between restricted maximum likelihood (REML) and maximum likelihood (ML) methods for estimating parameters. We briefly mentioned in Section 10.6 that the maximum likelihood estimator of \(\sigma^2\) for Normally distributed data is biased, and that we tend to use an unbiased estimator that divides by \(n-1\) instead of \(n\). It turns out that maximum-likelihood estimators of variance components in mixed-effect models are also biased. REML provides a similar correction by maximizing a modified form of the likelihood that depends on the fixed effects included in the model. Thus, REML is the default method used when fitting models using lmer
. To estimate parameters using maximum likelihood, we need to add the argument, method = "ML"
.
Why is this important? Comparisons of models with different fixed effects (e.g., using AIC or likelihood ratio tests) are not valid when using REML. Thus, a common recommendation is to use method = ML
when comparing models (e.g., using likelihood ratio tests) that differ only in their fixed effects. However, it is generally accepted, and in fact recommended, to compare models that differ only in terms of their random effects using models fit using REML.
18.12.2 Hypothesis tests for fixed-effects parameters and degrees of freedom
To begin our journey to understand tests of fixed-effect parameters, we will explore output when fitting random intercept and random slope models using the lme
function in the nlme
package. The syntax will be a little different than what we saw for lmer
. Instead of specifying the random intercepts associated with each site using (1 | site)
, we will have a separate argument, random = ~ 1 | site
. In both cases, we use 1 | site
to tell R that we want the intercepts (i.e., the column of 1s in the design matrix) to vary randomly by site
.
<- lme(dbh ~ agec, random=~1 | site, data = pines) # package nlme
lme.ri summary(lme.ri)
Linear mixed-effects model fit by REML
Data: pines
AIC BIC logLik
926.7807 939.0311 -459.3904
Random effects:
Formula: ~1 | site
(Intercept) Residual
StdDev: 2.053276 4.018271
Fixed effects: dbh ~ agec
Value Std.Error DF t-value p-value
(Intercept) 16.474305 0.5608232 139 29.375223 0
agec 2.430471 0.4100212 139 5.927671 0
Correlation:
(Intr)
agec 0.012
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.92182870 -0.74556286 -0.05850249 0.60196033 2.88237382
Number of Observations: 160
Number of Groups: 20
As with lmer
, we end up with estimates of the mean intercept and slope in the population, \(\hat{\beta}_0 = 16.47\) and \(\hat{\beta_1} = 2.43\), respectively; we also see that lme
provides estimates of the standard deviations of the site-level intercepts (\(\hat{\sigma}_{site} = 2.05\)) and within-site errors (\(\hat{\sigma}_{\epsilon} = 4.018\)). Unlike with the output from lmer
, however, we also get a test statistic and p-value for null hypothesis tests that \(\beta_0 = 0\) and \(\beta_1 = 0\) based on a t-distribution with \(df = 139\). This test is only approximate and there are better ways to conduct hypothesis tests (e.g., using simulation based approaches as discussed in Section 18.12.5). Because of the approximate nature of the tests, the developers of the lme4
package decided it would be best not to provide p-values when summarizing the output of a model fit using lmer
. Nonetheless, the default method that lme
uses to determine degrees of freedom is instructive, particularly if we fit models that include level-1 and level-2 covariates (i.e., variables that do and do not vary within each site):
<- lme(dbh ~ agec + Aspect, random=~1 | site, data = pines)
lme.ri2 summary(lme.ri2)
Linear mixed-effects model fit by REML
Data: pines
AIC BIC logLik
920.0064 941.3103 -453.0032
Random effects:
Formula: ~1 | site
(Intercept) Residual
StdDev: 1.994415 3.99883
Fixed effects: dbh ~ agec + Aspect
Value Std.Error DF t-value p-value
(Intercept) 18.337679 1.1513720 139 15.926807 0.0000
agec 2.679396 0.4435988 139 6.040134 0.0000
AspectNorth -1.362490 1.6508010 16 -0.825351 0.4213
AspectSouth -2.681490 1.5738823 16 -1.703742 0.1078
AspectWest -3.376375 1.6551813 16 -2.039883 0.0582
Correlation:
(Intr) agec AspctN AspctS
agec 0.310
AspectNorth -0.732 -0.329
AspectSouth -0.711 -0.160 0.514
AspectWest -0.741 -0.363 0.558 0.518
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.85312474 -0.71672171 -0.07700017 0.55987900 2.77709336
Number of Observations: 160
Number of Groups: 20
In this case, we see that the p-values associated with the coefficients for Aspect
are based on a t-distribution with only 16 degrees of freedom. The exact formulas used to calculate these default degrees of freedom (provided below) are not important (again, there are now better ways to conduct these tests) – what is important is that we have much less information about predictors that do not vary within a cluster than we do about predictors that do vary within a cluster. For level-1 predictors, lme
calculates the within-subjects degrees of freedom as the number of observations minus the number of clusters minus the number of level-1 predictors in the model. For level-2 predictors, lme
calculates degrees of freedom using the number of clusters minus the number of level-2 predictors the model - 1 for the intercept.
nrow(pines) - length(unique(pines$site))- 1
[1] 139
length(unique(pines$site))- 3 -1
[1] 16
Again, these formula are not that important. What is important is that lme
attempts to account for the data structure when carrying out statistical tests, recognizing that the site
is the correct unit of replication for tests involving Aspect
. The default degrees of freedom calculated by lme
are essentially correct for balanced data (where you have equal numbers of observations within each cluster), assuming the model assumptions hold. For unbalanced data, the tests (and degrees-of-freedom) are only approximate.
18.12.3 Alternative methods for testing hypotheses about fixed effects
Various other methods for approximating degrees of freedom have been suggested, and these methods are implemented in the lmerTest
and pbkrtest
packages (Kuznetsova, Brockhoff, and Christensen 2017; Halekoh and Højsgaard 2014). If the lmerTest
function has been loaded, you will automatically see tests using a t-distribution with Satterthwaite’s degrees of freedom, a method you may have seen when learning about a t-test for a difference in means with unequal variances (Giesbrecht and Burns 1985; Hrong-Tai Fai and Cornelius 1996).
library(lmerTest)
<- lmer(dbh ~ agec + (1 | site), data = pines)
lmer.ri summary(lmer.ri)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: dbh ~ agec + (1 | site)
Data: pines
REML criterion at convergence: 918.8
Scaled residuals:
Min 1Q Median 3Q Max
-1.9218 -0.7456 -0.0585 0.6020 2.8824
Random effects:
Groups Name Variance Std.Dev.
site (Intercept) 4.216 2.053
Residual 16.147 4.018
Number of obs: 160, groups: site, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 16.4743 0.5608 13.8574 29.375 7.02e-14 ***
agec 2.4305 0.4100 108.5317 5.928 3.70e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
agec 0.012
We see that the degrees of freedom are smaller (and p-values are larger) than those reported by default by lme
, particularly for the intercept. Another popular method for calculating degrees of freedom is the method proposed by Kenward and Rodger (Kenward and Roger 1997). This approximation can be used to compare two nested models using the KRmodcomp
function in the pbkrtest
package (Halekoh and Højsgaard 2014). Alternatively, tests using Kenward-Rodger degrees of freedom can be calculated by adding ddf = "Kenward-Roger"
when using the summary
function with a model fitted using lmer
:
summary(lmer.ri, ddf = "Kenward-Roger")
Linear mixed model fit by REML. t-tests use Kenward-Roger's method [
lmerModLmerTest]
Formula: dbh ~ agec + (1 | site)
Data: pines
REML criterion at convergence: 918.8
Scaled residuals:
Min 1Q Median 3Q Max
-1.9218 -0.7456 -0.0585 0.6020 2.8824
Random effects:
Groups Name Variance Std.Dev.
site (Intercept) 4.216 2.053
Residual 16.147 4.018
Number of obs: 160, groups: site, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 16.4743 0.5611 18.3642 29.358 < 2e-16 ***
agec 2.4305 0.4208 118.5132 5.776 6.29e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
agec 0.012
These alternative degrees of freedom have also been implemented in conjunction with the anova
function (lmerTest
also switches the default from sequential, added-in-order tests to backwards elimination tests – see Section 3.10). For example, we now have non-integer degrees of freedom if we consider testing for an effect of Aspect
in a model that also contains age
:
<- lmer(dbh ~ agec + Aspect + (1 | site), data = pines)
lmer.ri2 anova(lmer.ri2)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
agec 583.39 583.39 1 144.061 36.4832 1.249e-08 ***
Aspect 82.44 27.48 3 13.972 1.7186 0.209
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Further, the order of entry no longer matters:
<- lmer(dbh ~ Aspect + agec + (1 | site), data = pines)
lmer.ri2 anova(lmer.ri2)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Aspect 82.44 27.48 3 13.972 1.7186 0.209
agec 583.39 583.39 1 144.061 36.4832 1.249e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Luke (2017) compared type I error rates for likelihood ratio tests and tests using Kenward-Roger or Satterthwaite approximations and found the latter two approaches performed well.
Lastly, to reinforce the need to consider random slopes for predictors that vary within a cluster, let’s compare the degrees of freedom and the p-values for the fixed effects parameters with and without the random slope:
<- lmer(dbh ~ agec + Aspect + (1 | site), data = pines)
lmer.ri2 anova(lmer.ri2)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
agec 583.39 583.39 1 144.061 36.4832 1.249e-08 ***
Aspect 82.44 27.48 3 13.972 1.7186 0.209
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
<- lmer(dbh ~ agec + Aspect + (1 + agec | site), data = pines)
lmer.rc2 anova(lmer.rc2)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
agec 488.34 488.34 1 15.462 31.5029 4.418e-05 ***
Aspect 74.26 24.75 3 13.777 1.5968 0.2355
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We see that including the random slope does little to change the degrees of freedom (or p-value) for the test associated with Aspect
, but it has a big effect on the test for age
, resulting in a much more conservative test with larger p-value!
18.12.4 Confidence intervals for fixed effects and variance parameters
We can use the confint
function along with models fit using lmer
to generate confidence intervals using the profile-likelihood method (the default; see Section 10.10), using bootstrapping (by specifying method = "boot"
), or using a large-sample approximation based on assuming the sampling distribution is Normal (method = "Wald"
); the latter approach is only available for fixed-effects coefficients since the sampling distribution of variance parameters tends to be right-skewed and thus, far from Normal. The profile-likelihood and bootstrap confidence intervals are to be preferred, but they can be computationally intensive to compute. By contrast, the Wald method is simple and fast; these intervals are calculated simply as \(\hat{\beta} \pm 1.96 SE(\hat{\beta})\) When calculating profile-likelihood and bootstrap confidence intervals, you will often see warnings related to computational issues (e.g., non-convergence associated with one or more bootstrapped data sets). In those cases, it is useful to also compute Wald intervals to see how they compare.
We compare these three approaches for the random intercepts and slopes model, below. We have suppressed warnings and messages, but if you run the code on your own, you will be able to see them.
<- confint(lmer.rc, method = "profile")
conf.pl <- confint(lmer.rc, method = "boot")
conf.boot <- confint(lmer.rc, method = "Wald")
conf.norm <- data.frame(rbind(conf.pl, conf.boot, conf.norm))
confs names(confs) <- c("LCL", "UCL")
$term <- as.factor(rep(c("sigma[b0]","sigma[b1]", "Cov(b0, b1)",
confs"sigma[epsilon]", "beta[0]", "beta[1]"), 3))
$method <- rep(c("Profile likelihood", "conf.boot", "conf.norm"),
confseach = nrow(conf.pl))
# Note the estimates are given in a slightly different order than the confidence intervals
$estimate <- rep(c(as.data.frame(VarCorr(lmer.rc))$sdcor[c(1,3, 2, 4)],
confsfixef(lmer.rc)), 3)
ggplot(confs, aes(method, estimate)) + geom_point() +
geom_errorbar(aes(ymin=LCL, ymax=UCL), width=0.2) +
facet_wrap(~term, scales="free")+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Examining Figure 18.17, we see that for this particular data set, the three intervals are all fairly similar.
18.12.5 Simulation-based testing: Comparing models with different random-effect structures
As mentioned in the previous section, the sampling distributions for variance parameters tend to be right skewed. Therefore, tests based on a Normal or t-distribution are not likely to be appropriate. We could consider using a likelihood ratio test to compare nested models (e.g., models with and without a random effects), thereby testing whether one or more variance parameters are 0. For example, we could compare our random intercept model with a random intercept and slope model to test:
\[H_0: \sigma^2_{b1} = 0 \text{ versus } H_A: \sigma^2_{b1} \ne 0\]
However, this test will be conservative since, under the null hypothesis, \(\sigma^2_{b1}\) is “on the boundary” of the parameter space – i.e., \(H_0: \sigma^2_{b_1} = 0\), sets the parameter \(\sigma^2_{b_1}\) to the boundary of what is possible for this parameter (variance parameters cannot go negative). When this is the case, p-values and comparisons using AIC need adjustment (see e.g., Self and Liang 1987). José Pinheiro and Bates (2006) note that the p-value for a likelihood ratio test involving a single variance parameter will be approximately twice as large as it should be. They suggest that the sampling distribution associated with the likelihood-ratio \(\chi^2\) statistic comparing models that differ by 1 parameter (set to the boundary of its parameter space) can be approximated as a 50:50 mixture of a \(\chi^2_1\) and a \(\chi^2_2\) distribution. In this case, we could calculate an approximate p-value using: 0.5 + 0.5qchisq(LR, 1)
, where LR is the likelihood-ratio statistic calculated by comparing our two models.
Alternatively, we can calculate a p-value using a simulation-based test. Let’s consider comparing the random intercept and random intercept and slope models. Our null hypothesis would be that the simpler model containing only random intercepts is appropriate for the data. The alternative hypothesis is that the more complicated random intercepts and slope model fits the data better. Recall that with any null hypothesis test, we need a test statistic and a distribution of that test statistic when the null hypothesis is true. We can use the likelihood-ratio statistic as our test statistic, but rather than rely on an assumed \(\chi^2_1\) distribution, we can generate the distribution of the test statistic when the null hypothesis is true using simulations from the null model (Crainiceanu and Ruppert 2004). This will require that we:
- Simulate data under the null (i.e., smaller) model; a parametric bootstrap can be used here to simulate data from the fitted random-intercept model.
- Fit the two nested models (random intercept only model, random intercept and slope model) to the simulated data.
- Calculate the likelihood ratio statistic comparing the two models using the simulated data.
Steps 1-3 will be repeated several times to generate the null distribution of the likelihood ratio statistic. We can then calculate a p-value for the test by comparing the observed likelihood ratio statistic using the actual data to the above null distribution.
The RLRsim
(Scheipl, Greven, and Kuechenhoff 2008) and pbkrtest
(Halekoh and Højsgaard 2014) packages provide functions for conducting simulation-based tests. Below, we demonstrate the PBmodcomp
function in the pbkrtest
package for conducting the hypothesis test.
<- pbkrtest::PBmodcomp(lmer.rc, lmer.ri, nsim = 500) lrsimtest
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00227731 (tol = 0.002, component 1)
summary(lrsimtest)
Bootstrap test; time: 10.09 sec; samples: 500; extremes: 178;
Requested samples: 500 Used samples: 497 Extremes: 178
large : dbh ~ agec + (1 + agec | site)
dbh ~ agec + (1 | site)
stat df ddf p.value
LRT 1.1054 2.00000 0.5754
PBtest 1.1054 0.3594
Gamma 1.1054 0.3619
Bartlett 1.7840 2.00000 0.4098
F 0.5527 2.00000 10.36 0.5915
PBmodcomp
provides several tests using different test statistics, including the likelihood ratio test statistic (given by the LRT
line). The p-values for all of these tests are large, suggesting there is not much evidence to support the more complicated model that also includes random slopes. Nonetheless, it is often best to choose the structure of the random effects to match the experimental design or structure of your data rather than base this choice on a null hypothesis test (Arnqvist 2020). We will further discuss methods for choosing an appropriate model in Section 18.13.
18.12.6 Comparisons using AIC
It is also tempting to use AIC to compare models – e.g., the random intercept model and the model that also includes a random slope:
AIC(lmer.ri, lmer.rc)
df AIC
lmer.ri 4 926.7807
lmer.rc 6 929.3578
The use of AIC for mixed models is, however, not straightforward. Consider counting the number of parameters in the random-intercept model. We might choose to count the 2 fixed effects parameters, the variance of the random intercepts, and the variance of the within-site errors, so 4 parameters. Sometimes referred to as the marginal AIC, this could be appropriate if we are mainly interested in making inference to the population of sites. On the other hand, we might be interested in estimating site-specific response patterns (and a conditional AIC). In that case, our parameters might be closer to 3 + the number of sites -1. These issues, related to the “level of focus”, are discussed in Vaida and Blanchard (2005) and Greven and Kneib (2010). Conditional AICs can be calculated using the cAIC
function in the cAIC4
package (Säfken et al. 2021), which again suggests the random intercept model is preferred:
library(cAIC4)
cAIC(lmer.ri)
Conditional log-likelihood: -442.30
Degrees of freedom: 16.55
Conditional Akaike information criterion: 917.70
cAIC(lmer.rc)
Conditional log-likelihood: -438.36
Degrees of freedom: 21.18
Conditional Akaike information criterion: 919.08
Another complication is that, like likelihood ratio tests, the comparison between the random intercept and random intercept and slope model also involves setting a variance parameter to a value “on the boundary” of the parameter space (as discussed in Section 18.12.5). AIC comparisons can be conservative in this case, leading to simpler models (Greven and Kneib 2010). For more discussion of these topics, I highly recommend visiting Ben Bolker’s GLMM FAQ page.
18.13 Modeling strategies revisited
We saw in Chapter 8 how challenging it can be to select an appropriate model when allowed freedom to choose among several predictors. These challenges are even more pronounced when fitting mixed-effects models, and again, we will find that there are different viewpoints on how best to go about choosing an appropriate model.
18.13.1 Fixed versus random effects
As a first step, analysts must choose whether to model the effect of categorical variables (and their interactions with other continuous variables) as fixed or random effects. Guidance here is often based on:
how many levels (i.e., clusters or groups) are associated with the categorical variable. If there are few clusters (e.g., < 5 or 10), then it may be best to model the categorical variable using fixed effects since it can be really difficult to estimate variance parameters with so few replicates. On the other hand, it may be OK model the variable using random effects if the main reason to do so is to account for a feature of the data (e.g., an experimental design that results in hierarchical data or repeated measures). Although the variance component associated with the categorical variable may be imprecisely estimated, its inclusion as a random effect may still have the desired property of properly controlling the type I error rate when conducting hypothesis tests for other fixed effects. In fact, either approach (including a categorical variable as random or fixed) is likely going to control the type I error rate and is preferable to ignoring clustering in the data and treating the observations as though they were independent (see e.g., Oberpriller, Souza Leite, and Pichler 2021).
whether the focus is on the specific clusters (e.g., sites) that were observed or one wants to make inference to a larger population. In the latter case, mixed-effects models are preferable since they estimate coefficients describing mean effects and their variance across different clusters.
Assuming one chooses to model using random effects, then one is faced with questions about the structure of the random effects as well as which variables to include in the fixed effects part of the model.
18.13.2 Maximal model
As we mentioned previously, too often ecologists default to fitting random-intercept only models. Doing so often results in models that fail to control type I error rates in the face of non-independent observations (Schielzeth and Forstmeier 2009). At the other extreme, is the recommendation to fit a maximal model with as many random coefficients as possible (i.e., for all predictors that vary within a cluster) (Barr et al. 2013). Although this sounds appealing from a type I error perspective, in reality, it is often difficult to fit a model with multiple random coefficients. When faced with convergence problems, this often leads to a haphazard model-selection process in which users drop various random coefficients in an attempt to find a simpler model that converges.
One simple thing you can do to improve the chances that the optimizer used to estimate parameters converges is to center and scale your continuous variables. We did this with age
when fitting mixed models to the pines
data. We would have encountered issues with fitting the random slope model if we had not done this (see below):
lmer(dbh ~ age + (age | site), data = pines)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 1.16634 (tol = 0.002, component 1)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
Linear mixed model fit by REML ['lmerModLmerTest']
Formula: dbh ~ age + (age | site)
Data: pines
REML criterion at convergence: 925.3739
Random effects:
Groups Name Std.Dev. Corr
site (Intercept) 2.67944
age 0.01718 -0.73
Residual 3.96886
Number of obs: 160, groups: site, 20
Fixed Effects:
(Intercept) age
9.45435 0.04629
optimizer (nloptwrap) convergence code: 0 (OK) ; 0 optimizer warnings; 2 lme4 warnings
18.13.3 Top-down model-selection
Largely following Diggle, Liang, and Zeger (1994), Zuur et al. (2009) suggests a “top-down” strategy for choosing an appropriate model. They note that tests for fixed effects will be wrong if the structure of the random effects is too simple. In addition, they note that random effects can soak up variation that could be due to the omission of important fixed effects. Thus, they suggest an iterative approach involving the following 5 steps:
Start with a “full model” that contains as many covariates of interest as possible.
Compare different random effects structures (e.g., models with random intercepts versus random slopes) using AIC or likelihood-ratio tests (see Section 18.12.5). During this step, they suggest using
method = "REML"
since it will give better estimates of variance parameters.Once an appropriate random-effects structure is chosen, they then compare models with different fixed effects, again using AIC and likelihood-ratio tests but with
method = "ML"
. During this stage, they keep the random effects structure constant based on the results from step 2.Once the fixed effects are chosen, they recommend refitting the final model but with
method = "REML"
. At this point, they also recommend evaluating the assumptions of the model (see Section 18.11)
18.13.4 A multi-level bottom-up approach to model selection
Others, including Jack Weiss and Singer et al. (2003) recommend a bottom up approach that begins by fitting the following models (Figure 18.18):
- Pooled model (assuming independence), include level-1 predictors (predictors that vary within clusters),
lm(y ~ x1)
- Unconditional means model or variance components model (no predictors, just random intercepts),
lmer(y ~ 1 +(1 | site))
- Random intercepts (with level 1 predictors ),
lmer(y ~ x1 + (1 | site))
- Random intercepts and slopes (with level 1 predictors),
lmer(y ~ x1 + (1 + x1 | site))
After picking the best of these models, they then suggest adding level-2 predictors (predictors that are constant within clusters).
Let’s apply this approach to our pines
data. Here, we use AIC, not withstanding the concerns raised earlier (see Section 18.12.6).
<- lm(dbh ~ agec, data = pines)
pooled.model <- lmer(dbh ~ 1 + (1 | site), data = pines)
uncond.means.model <- lmer(dbh ~ 1 + agec + (1 | site), data = pines)
randint.model <- lmer(dbh ~ 1 + agec + (agec | site), data = pines)
randcoef.model
# Marginal AIC
AIC(pooled.model, uncond.means.model, randint.model, randcoef.model)
df AIC
pooled.model 3 932.4945
uncond.means.model 3 953.2307
randint.model 4 926.7807
randcoef.model 6 929.3578
# Conditional AIC
cAIC(pooled.model); cAIC(uncond.means.model); cAIC(randint.model); cAIC(randcoef.model)
Conditional log-likelihood: -463.25
Degrees of freedom: 3.00
Conditional Akaike information criterion: 932.49
Conditional log-likelihood: -466.87
Degrees of freedom: 9.53
Conditional Akaike information criterion: 952.80
Conditional log-likelihood: -442.30
Degrees of freedom: 16.55
Conditional Akaike information criterion: 917.70
Conditional log-likelihood: -438.36
Degrees of freedom: 21.18
Conditional Akaike information criterion: 919.08
Both sets of AIC comparisons lead to choosing the random intercept model. We also failed to find evidence for the random intercept and slope model using the simulation-based likelihood ratio test in Section 18.12.5. Thus, we might consider adding level-2 covariates (e.g., Apect
) to the random intercept model.
<- lmer(dbh ~ 1 + agec + Aspect + (1 | site), data = pines) lmer.ri2
We can then compare this model to the model without Aspect
using either hypothesis tests or AIC.
anova(lmer.ri2)
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
agec 583.39 583.39 1 144.061 36.4832 1.249e-08 ***
Aspect 82.44 27.48 3 13.972 1.7186 0.209
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC(lmer.ri, lmer.ri2)
df AIC
lmer.ri 4 926.7807
lmer.ri2 7 920.0064
cAIC(lmer.ri); cAIC(lmer.ri2)
Conditional log-likelihood: -442.30
Degrees of freedom: 16.55
Conditional Akaike information criterion: 917.70
Conditional log-likelihood: -441.08
Degrees of freedom: 17.37
Conditional Akaike information criterion: 916.90
The evidence for the importance of Aspect
is fairly week, despite resulting in a model with lower AIC.
18.14 Induced correlation, marginal model, generalized least squares
As we noted earlier, many ecologists add random intercepts mainly as a way to avoid issues related to “pseudeoreplication” or non-independence (even though random slopes may be needed to guard against inflated type I error rates, Schielzeth and Forstmeier 2009). In this section, we briefly highlight how adding random effects induces correlation among observations that share the same values of the random effect. We will also show an equivalence between the random-intercept model and a marginal model with compound symmetry that can be fit using the gls
function in the nlme
package.
Recall from Section 9.7, that the variance of two random variables, \(X\) and \(Y\) is given by:
\(Var(X + Y) = Var(X)+ Var(Y) + 2Cov(X, Y)\)
In addition, the covariance of \(X\) and \(Y\) is given by: \(Cov(X,Y) = E[XY] - E[X]E[Y]\). If we consider the random intercepts model, we have 2 random terms, \(b_{0i}\) and \(\epsilon_{ij}\):
\[\begin{equation} dbh_{ij} = \beta_0 + b_{0i}+ \beta_1age_{ij} + \epsilon_{ij} \end{equation}\]
Thus, it is easy to derive the following:
\(Var(dbh_{ij}) = Var(b_{0i}+\epsilon_{ij}) = Var(b_{0i}) +Var(\epsilon_{ij}) = \sigma_{b_0}^2 + \sigma_{\epsilon}^2\)
\(Cov(dbh_{ij}, dbh_{ij^\prime}) = \sigma^2_{b_0}\) (for any 2 observations collected at the same site since they will share \(b_{0i}\)); in addition, the correlation between any two observations from the same site = \(Cor(dbh_{ij}, dbh_{ij^\prime})\) = \(\frac{\sigma_{b_0}^2}{\sigma_{b_0}^2+\sigma^2_{\epsilon}}\).
\(Cov(dbh_{ij}, dbh_{i^\prime j}) = 0\) (for any 2 observations taken from 2 different sites since they will not share \(b_{0i}\) or \(\epsilon_{ij}\))
We can also derive the marginal distribution of our response variable by averaging over the distribution of the random effects (see Section 9.8 for a discussion of conditional versus marginal distributions).
Let’s start by writing down equations for the random intercept model in terms of the conditional distribution of \(Y | b\) and the distribution of \(b\) as in Section 18.8.
\[ dbh_{ij} | b_{0i} \sim N(\mu_i, \sigma^2_{\epsilon}) \\ \] \[ \mu_{ij} = (\beta_0 +b_{0i}) + \beta_1age_{ij} \] \[ b_{0i} \sim N(0, \sigma^2_{b_0}) \tag{18.9}\]
If we average over (i.e., integrate out) the random effects, \(b_{0i}\), we get the marginal distribution of \(Y\). This distribution can be derived analytically in the case of linear mixed-effects models, and allows us to write down a likelihood for the data that does not involve the random effects. The marginal likelihood is what R uses when it estimates \(\beta_0\), \(\beta_1\), and the variance parameters, \(\sigma^2_{b0}\) and \(\sigma^2_{\epsilon}\). For the random-intercept model, the marginal distribution for the vector of observations from site \(i\), \(Y_i\), is given by:
\[ Y_{i}\sim N(X_{i}\beta, V_i) \\ \] \[ V_i = \begin{bmatrix} \sigma^2_{\epsilon} + \sigma^2_{b_0} & \sigma^2_{b_0} & \cdots & \sigma^2_{b_0} \\ \sigma^2_{b_0} & \sigma^2_{\epsilon} + \sigma^2_{b_0} & \sigma^2_{b_0} & \sigma^2_{b_0} \\ \vdots & \ddots & \ddots & \vdots \\ \sigma^2_{b_0} & \cdots & \sigma^2_{b_0} & \sigma^2_{\epsilon} + \sigma^2_{b_0} \end{bmatrix} \tag{18.10}\]
Here, \(V_i\) gives the variance covariance matrix for the observations from site \(i\) and has dimension \(n_i \times n_i\). The diagonal elements hold the variances and the off-diagonal elements hold the covariances that we derived above. If we consider all of the data, not just the observations from site \(i\), then we have:
\[\begin{gather} Y \sim N(X\beta, \Sigma) \\ \Sigma = \begin{bmatrix} V_1 & 0 & 0 & 0 \\ 0 & V_2 & 0 & 0 \\ \vdots & \ddots & \ddots & \vdots \\ 0 & \cdots & 0 & V_{40} \end{bmatrix} \end{gather}\]
Here, \(\Sigma\) is the variance/covariance matrix for all of the observations. This matrix is what is referred to as block diagonal. The 0s represent covariances for observations from different sites – these are 0, since we assume observations from different sites are independent. The \(V_i\) contain the variance/covariances for cluster \(i\) (Equation 18.10).
Thus, the random intercept model is equivalent to assuming that all observations from the same site (or cluster) are equally correlated, and that observations from different sites (or clusters) are independent. This particular form of correlation is referred to as compound symmetry. Rather than fitting a random intercepts model, we could have posited this model as one that would be reasonable for our data a priori. Specifically, we could consider the vector of response data and directly model their multivariate mean and variance covariance matrix. The gls
function in the nlme
package will let us do just that.
<-gls(dbh ~ agec, method="REML",
gls.fitcorrelation=corCompSymm(form= ~ 1 | site),
data=pines)
summary(gls.fit)
Generalized least squares fit by REML
Model: dbh ~ agec
Data: pines
AIC BIC logLik
926.7807 939.0311 -459.3904
Correlation Structure: Compound symmetry
Formula: ~1 | site
Parameter estimate(s):
Rho
0.2070437
Coefficients:
Value Std.Error t-value p-value
(Intercept) 16.474305 0.5608219 29.375289 0
agec 2.430468 0.4100210 5.927666 0
Correlation:
(Intr)
agec 0.012
Standardized residuals:
Min Q1 Med Q3 Max
-2.1264342 -0.7593302 -0.2326614 0.4739741 3.0916412
Residual standard error: 4.512474
Degrees of freedom: 160 total; 158 residual
fixef(lmer.ri)
(Intercept) agec
16.474305 2.430471
Note that we get the exact same fixed effects coefficients as when fitting the random intercept model. In addition, the estimate of the intraclass correlation that is returned by gls
= 0.207 is equivalent to \(\frac{\hat{\sigma}^2_{b_0}}{\hat{\sigma}^2_{\epsilon} + \hat{\sigma}^2_{b_0}}\) from the random intercept model, and the residual standard error from the gls
model is equal to \(\sqrt{\hat{\sigma}^2_{\epsilon} + \hat{\sigma}^2_{b_0}}\). We show these equivalences, below.
# Extract variance parameters from the random intercept model
<-as.data.frame(VarCorr(lmer.ri))
variancepars
# rho calculated from random intercept model
1,4]/(variancepars[1,4]+variancepars[2,4]) variancepars[
[1] 0.207045
# residual standard error in marginal model
sqrt(variancepars[1,4]+variancepars[2,4])
[1] 4.512476
As was mentioned in Section 9.11.1, the Normal distribution is unique in that it has separate parameters for the mean and for the variance (and covariance terms). There is no multivariate analogue of other commonly used statistical models (e.g., the Binomial or Poisson distribution) that allow one to directly model the mean and variance/covariance. We will follow up on this point in Chapter 19.
18.15 Random effects specified using multiple grouping variables
You may find that you have multiple categorical variables that could be considered as grouping variables for structuring random effects. For example, if you follow multiple individuals over several years, you might consider adding a random intercept for each individual to model correlation among observations from the same individual or a random intercept for each year to allow for correlation among observations in the same year. In this section, we will also see that it is possible to include random intercepts (and, potentially, random slopes) using more than one grouping variable (e.g., individual and year).
We will consider data from Dickie et al. (2022), who explored associations between wolf home-range size and measures of primary productivity and the extent of linear features (e.g., roads) within individuals’ home ranges. Data from this study are included in the Data4Ecologists
package in the HRData
data set and can be accessed using:
library(Data4Ecologists)
data(HRData)
In addition to having repeated observations on individuals over time (home ranges were estimated seasonally and annually), more than one wolf was observed from some wolf packs (Figure 18.19). Thus, we might consider including AnimalID
, PackID
, and Year
as grouping variables for random effects (i.e., random intercepts or random coefficients).
<- HRData %>% group_by(PackID, Year) %>%
HRsummary count()
ggplot(HRsummary, aes(Year, PackID, size = n)) + geom_point() +
ylab("Pack ID")
HRData
data set from Dickie et al. (2022).
Dickie et al. (2022) considered models with the following fixed effects:
Season
= snow (Nov-April) or snow-free (May-October)StudyArea
(SA
in their original data set) = BC for northeastern British Columbia, WHEC for northern Alberta, RICC for northeastern Alberta, and SK for SaskatchewanDiffDTScaled
= the number of days of monitoring data associated with the home-range estimate, scaled between 0 and 1EVIScaled
= a scaled measure of primary productivity within the estimated home ranges (assumed to influence the abundance of wolf prey)LFD
= density of linear features (LFD
) within the estimated home ranges.
They also considered an interaction between EVIScaled
and LFD
to determine if the effect of linear features depended on resource density. Dickie et al. (2022) compared three different models, each with a different random-effects specification. Although there paper is quite interesting, I find their choice of models a bit strange (see Section 18.15.1). Instead, let’s begin by considering the following model:
<- lmer(log(HRsize) ~ Season + StudyArea + DiffDTScaled + LFD*EVIScaled +
model1 1 | AnimalId) + (1 | Year), REML=TRUE, data = HRData)
(summary(model1)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: log(HRsize) ~ Season + StudyArea + DiffDTScaled + LFD * EVIScaled +
(1 | AnimalId) + (1 | Year)
Data: HRData
REML criterion at convergence: 718.4
Scaled residuals:
Min 1Q Median 3Q Max
-3.1974 -0.5314 -0.0044 0.5087 2.6537
Random effects:
Groups Name Variance Std.Dev.
AnimalId (Intercept) 0.20867 0.4568
Year (Intercept) 0.03643 0.1909
Residual 0.28067 0.5298
Number of obs: 359, groups: AnimalId, 142; Year, 7
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 9.50774 0.41954 232.87652 22.662 < 2e-16 ***
SeasonSnowFree -0.48662 0.07464 242.09915 -6.519 4.07e-10 ***
StudyAreaRICC -0.94537 0.15550 127.37453 -6.079 1.30e-08 ***
StudyAreaSK -1.27522 0.27456 165.51587 -4.645 6.91e-06 ***
StudyAreaWHEC -1.30407 0.18430 151.99775 -7.076 5.10e-11 ***
DiffDTScaled 0.40872 0.10629 293.98415 3.845 0.000148 ***
LFD -4.87658 1.80615 328.18141 -2.700 0.007294 **
EVIScaled -2.47882 0.57135 269.65610 -4.339 2.03e-05 ***
LFD:EVIScaled 5.09616 2.83954 334.21396 1.795 0.073603 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) SsnSnF SARICC StdASK SAWHEC DffDTS LFD EVIScl
SeasonSnwFr -0.015
StudyArRICC -0.352 0.004
StudyAreaSK -0.842 0.036 0.542
StudyArWHEC -0.388 -0.050 0.647 0.542
DiffDTScald -0.068 -0.507 -0.042 -0.006 -0.051
LFD -0.698 0.008 0.102 0.525 0.098 -0.017
EVIScaled -0.849 -0.005 -0.005 0.585 0.036 -0.002 0.713
LFD:EVIScld 0.601 0.002 0.025 -0.370 0.040 0.005 -0.963 -0.729
Writing down a set of equations for the model is challenging due to the incorporation of multiple categorical variables, including our grouping variables for the random effects (Year
and AnimalId
) and other categorical predictors modeled using fixed effects (Season
, StudyArea
). We will use indicator variables as in Chapter 3 to represent fixed effects predictors associated with the categorical variables Season
and StudyArea
. Let \(Y_{ij}\) be the home-range size for \(j^{th}\) observation associated with individual \(i\). Further, let \(b_{i}\) and \(\gamma_{k(ij)}\) represent random intercepts associated with individual \(i\) and year \(k\) (where the \(k(ij)\) subscript indicates the \(j^{th}\) observation of individual \(i\) is from year \(k\)). One option for describing the model is as follows:
\[\begin{gather} Y_{ij} | b_{i}, \gamma_{k(ij)} \sim N(\mu_{ij}, \sigma^2_{\epsilon}) \\ \mu_{ij} = \beta_0 + b_{i} + \gamma_{k(ij)} + \beta_1I(Season = SnowFree)_{ij} + \beta_2I(StudyArea = RICC)_{ij} + \nonumber \\ \beta_3I(StudyArea = SK)_{ij} + \beta_4I(StudyArea = WHEC)_{ij} + \beta_5DiffDTScaled_{ijkl} + \nonumber \\ \beta_6 LFD_{ijkl} + \beta_7EVIScaled_{ij} + \beta_8LFD_{ij}\cdot EVIScaled_{ij} \nonumber \\ b_{i} \sim N(0, \sigma^2_{b}) \nonumber \\ \gamma_{k(ij)} \sim N(0, \sigma^2_{\gamma}), \nonumber \end{gather}\]
If we look at the Random effects
section near the top of the output returned by the summary
function, we see that we have 3 variance components, one for AnimalId
, one for Year
, and one for Residual
. The first two variance components correspond to \(\sigma^2_b\) an \(\sigma^2_{\gamma}\) and quantify variation among animals and years that is not attributable to the other (fixed effect) variables included in the model. The last variance component is our standard residual variance, \(\sigma^2_{\epsilon}\).
One way to understand this model is to consider the induced correlation implied by the random effects. As discussed in Section 18.14, observations that share one or more random effects will be correlated whereas observations that do not share any random effects will be independent of one another. Thus, we can infer the following:
- Observations on the same animal will be correlated due to sharing a random intercept, \(b_i\), associated with the
(1 | AnimalId)
term. - Observations from the same year will be correlated due to sharing a random intercept, \(\gamma_k\) associated with the
(1 | Year)
. - Observations on the same animal and in the same year will share random intercepts from both terms. Thus, these observations will be more correlated than observations on the same animal but in different years or observations in the same year but on different animals.
- Observations will only be independent if they are on different animals and taken in different years.
Alternatively, we could consider a model with random intercepts for PackID
and Year
.
<- lmer(log(HRsize) ~ Season + StudyArea + DiffDTScaled + LFD*EVIScaled +
model2 1 | PackID) + (1 | Year), REML=TRUE, data = HRData) (
Think-Pair-Share: What are the induced correlations that arise from this model specification?
In principle, it would also be possible to consider a model that includes random intercepts for AnimalId
, PackID
, and Year
, but lmer
provides a warning that one or more parameters are on the boundary, causing a singular fit.
<- lmer(log(HRsize) ~ Season + StudyArea + DiffDTScaled + LFD*EVIScaled +
model3 1 | AnimalId) + (1 | PackID ) + (1 | Year), REML=TRUE, data = HRData) (
boundary (singular) fit: see help('isSingular')
If we inspect the output of model 3
, we see that it returns an estimate of the variance attributable to AnimalId
that is essentially 0.
summary(model3)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: log(HRsize) ~ Season + StudyArea + DiffDTScaled + LFD * EVIScaled +
(1 | AnimalId) + (1 | PackID) + (1 | Year)
Data: HRData
REML criterion at convergence: 672.5
Scaled residuals:
Min 1Q Median 3Q Max
-3.12454 -0.49717 -0.01873 0.51040 2.81691
Random effects:
Groups Name Variance Std.Dev.
AnimalId (Intercept) 0.00000 0.0000
PackID (Intercept) 0.19393 0.4404
Year (Intercept) 0.03662 0.1914
Residual 0.28908 0.5377
Number of obs: 359, groups: AnimalId, 142; PackID, 62; Year, 7
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 9.42538 0.45339 185.58848 20.789 < 2e-16 ***
SeasonSnowFree -0.47648 0.07333 301.02719 -6.498 3.37e-10 ***
StudyAreaRICC -1.02732 0.20353 62.31087 -5.048 4.15e-06 ***
StudyAreaSK -1.27402 0.31023 119.41966 -4.107 7.38e-05 ***
StudyAreaWHEC -1.32111 0.23965 68.60989 -5.513 5.79e-07 ***
DiffDTScaled 0.41984 0.09964 317.57043 4.213 3.28e-05 ***
LFD -4.43188 1.86342 302.34969 -2.378 0.018011 *
EVIScaled -2.27967 0.60234 229.18118 -3.785 0.000197 ***
LFD:EVIScaled 4.13155 2.90450 315.60163 1.422 0.155879
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) SsnSnF SARICC StdASK SAWHEC DffDTS LFD EVIScl
SeasonSnwFr -0.022
StudyArRICC -0.355 0.011
StudyAreaSK -0.845 0.050 0.521
StudyArWHEC -0.391 -0.023 0.576 0.527
DiffDTScald -0.046 -0.485 -0.059 -0.042 -0.070
LFD -0.679 -0.005 0.111 0.515 0.122 -0.015
EVIScaled -0.832 -0.018 -0.007 0.556 0.039 0.009 0.675
LFD:EVIScld 0.574 0.027 -0.005 -0.352 0.002 -0.013 -0.956 -0.695
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
This likely occurs because there were very few animals observed from each pack, so the AnimalId
and PackID
variables are largely accounting for similar dependencies in the data.
18.15.1 Aside: Digging deeper into models fit by Dickie et al. (2022)
Rather than use AnimalId
, WolfID
, or Year
directly as grouping variables to structure random effects, Dickie et al. (2022) created new variables, PackYear
and AnimalYear
, by pasting together PackID
and Year
and AnimalId
and Year
, respectively.
<- HRData %>% mutate(AnimalYear = paste(HRData$AnimalId, HRData$Year, sep=""),
HRData PackYear = paste(HRData$PackID, HRData$Year, sep=""))
They then compared the following three models using AIC to select an appropriate random-effects structure for capturing the correlation in the data (see this github repository here and a zenendo repository here)12:
<- lmer(log(HRsize) ~ Season + StudyArea + DiffDTScaled + LFD*EVIScaled +
model4 1 | AnimalYear), REML=TRUE, data = HRData)
(<- lmer(log(HRsize) ~ Season + StudyArea + DiffDTScaled + LFD*EVIScaled +
model5 1 | PackYear), REML=TRUE, data = HRData)
(<- lmer(log(HRsize) ~ Season + StudyArea + DiffDTScaled + LFD*EVIScaled +
model6 1 | AnimalYear/PackID), REML=TRUE, data = HRData) (
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues
Warning: Model failed to converge with 1 negative eigenvalue: -4.5e-04
The first model includes random intercepts for each unique combination of AnimalId
and Year
and the second model includes random intercepts for each unique combination of PackID
and Year
. The last model is equivalent to a specification that includes two sets of random intercepts, one for each unique animal and year combination (i.e., AnimalYear
) and one for each unique combination of AnimalYear
and PackID
. We demonstrate this equivalence below by fitting an alternative specification of this model and then showing that the AIC, variance components, and all fixed effects parameters are equal.
<- lmer(log(HRsize) ~ Season + StudyArea + DiffDTScaled + LFD*EVIScaled + (1|AnimalYear)
model6b + (1 | AnimalYear:PackID), REML=TRUE, data = HRData)
AIC(model6, model6b)
df AIC
model6 12 799.056
model6b 12 799.056
all.equal(fixef(model6), fixef(model6b))
[1] "Mean relative difference: 4.029839e-07"
print(VarCorr(model6), comp = "Variance")
Groups Name Variance
PackID:AnimalYear (Intercept) 0.072782
AnimalYear (Intercept) 0.152422
Residual 0.294432
print(VarCorr(model6b), comp = "Variance")
Groups Name Variance
AnimalYear (Intercept) 0.072767
AnimalYear:PackID (Intercept) 0.152439
Residual 0.294430
Again, it is helpful to consider the implied correlation structure in these additional models (model4
, model5
, and model6b
).
In model 4, any two observations from the same animal and year will be correlated, whereas observations from different animals or from the same individual but in different years will be assumed to be independent.
In model 5, any two observations from the same wolf pack and year will be correlated, whereas observations from different wolf packs or from the same pack but in different years will be assumed to be independent.
In model 6, any two observations from the same animal and same year will be correlated due to sharing a random intercept from the
(1 | AnimalYear)
term. In addition, observations from the same animal, year, and pack will share a random intercept from the(1 | AnimalYear:PackID)
term. Observations from the same individual, year, and pack will have the highest degree of correlation because these observations will share two random intercepts, followed by observations from the same individual and year (but different pack). Lastly, observations from different individuals or from the same individual but in different years will be assumed to be independent.
After considering this last model and the implied correlation structure, you may wonder whether individuals actually switch packs? If not, it seems likely that the second random intercept term in model6
, i.e., (1 | AnimalYear:PackID)
, would be unnecessary. Interestingly, it turns out that there are no observations where an individual wolf was associated with two different packs in the same year.
<- HRData %>% group_by(AnimalYear) %>% dplyr::summarize(npacks = length(unique(PackID)))
Npacks table(Npacks$npacks)
1
289
Thus, the two random intercept terms in the third model turn out to be redundant. We can show this by demonstrating the equivalence between models with (1 | AnimalYear:PackID)
and (1 | AnimalYear)
.
<- lmer(log(HRsize) ~ Season + StudyArea + DiffDTScaled + LFD*EVIScaled +
model4 1 | AnimalYear), REML=TRUE, data = HRData)
(<- lmer(log(HRsize) ~ Season + StudyArea + DiffDTScaled + LFD*EVIScaled +
model4b 1 | AnimalYear:PackID), REML=TRUE, data = HRData)
(all.equal(fixef(model4), fixef(model4b))
[1] TRUE
AIC(model4, model4b)
df AIC
model4 11 797.056
model4b 11 797.056
print(VarCorr(model4), comp = "Variance")
Groups Name Variance
AnimalYear (Intercept) 0.22521
Residual 0.29443
print(VarCorr(model4b), comp = "Variance")
Groups Name Variance
AnimalYear:PackID (Intercept) 0.22521
Residual 0.29443
Given this equivalence, the model convergence warning that is given when fitting model6b
makes sense, yet it seems really odd that lmer
ends up converging when specifying the model using (1 | AnimalYear/PackID)
as in model6
. It is also not clear how lmer
determines how much of the variance to associate with AnimalYear
and how much to associate with AnimalYear:PackID
since these terms provide equivalent grouping structures. We do see that the estimated variance components, 0.07276 and 0.15245, add up to the variance component in our model that includes only one of the two terms.
print(VarCorr(model4), comp = "Variance")
Groups Name Variance
AnimalYear (Intercept) 0.22521
Residual 0.29443
print(VarCorr(model6), comp = "Variance")
Groups Name Variance
PackID:AnimalYear (Intercept) 0.072782
AnimalYear (Intercept) 0.152422
Residual 0.294432
18.16 Implementing mixed-effects models in JAGS
18.16.1 Random intercept model
Let’s begin by loading packages for communicating with JAGS and summarizing JAGS output:
library(R2jags)
library(MCMCvis)
library(mcmcplots)
In this section, we will go back to considering the pines
data set. We begin by implementing the random intercept model, which is fairly straightforward in JAGS. We have one additional variance parameter, \(\sigma_{b0}\), for which we need to specify a prior; we use a uniform distribution similar to what we have used when specifying a prior for \(\sigma_{\epsilon}\). In the JAGS code, below, we write the model in terms of an overall mean intercept and deviations from this mean for each site, \(b_{0i}\).
<-function(){
jags.lme
# Priors for the site-specific deviations from the mean intercept
for (i in 1:n.groups){
~ dnorm(0, tau.b0) # Random intercepts
b0[i]
}
# Priors for parameters describing the intercepts and
# their variability
~ dnorm(0, 0.001) # Mean intercept
beta0 ~ dunif(0, 100) # SD of the random intercepts
sigma.b0 <- 1 / (sigma.b0 * sigma.b0) # precision of intercepts
tau.b0
# Priors for fixed effect regression parameters
~ dnorm(0, 0.001) # Common slope agec
beta1 <- 1 / ( sigma.eps * sigma.eps) # Residual precision
tau.eps ~ dunif(0, 100) # Residual standard deviation
sigma.eps
# Calculate site-specific intercepts as derived parameters
for (i in 1:n.groups){
<- beta0 + b0[i]
beta0i[i]
}
# Likelihood
for (i in 1:nobs) {
~ dnorm(mu[i], tau.eps) # The random variable
dbh[i] <- beta0 + b0[site[i]] + beta1*agec[i] # Expectation
mu[i]
} }
Alternatively, we could have written down the model in terms of the site-specific intercepts, \(\beta_{0i} = \beta_0 + b_{oi}\), noting that \(\beta_{0i} \sim N(\beta_0, \sigma^2_{b0})\) as shown below. With this specification, we do not have to estimate the \(b_{0i}\), but we could add a line of code to calculate them as derived parameters (i.e., \(b_{0i} = \beta_{0i} - \beta_0\)). We illustrate this alternative way to fit the model, below. Although these are equivalent ways to specify the same model, the way in which a model is specified can impact how well the MCMC algorithms perform, and in this case, the earlier specification is expected to lead to better mixing (i.e., the chains do a better job of efficiently exploring the posterior distribution) - see this blog post by Richard McElreath.
<-function(){
jags.lme.alt
# Priors for the intercepts
for (i in 1:n.groups){
~ dnorm(beta0, tau.b0) # Random intercepts
alpha[i]
}
# Priors for parameters describing the intercepts and
# their variability
~ dnorm(0, 0.001) # Mean intercept
beta0 ~ dunif(0, 100) # SD of the random intercepts
sigma.b0 <- 1 / (sigma.b0 * sigma.b0)
tau.b0
# Priors for fixed effect regression parameters
~ dnorm(0, 0.001) # Common slope agec
beta1 <- 1 / ( sigma.eps * sigma.eps) # Residual precision
tau.eps ~ dunif(0, 100) # Residual standard deviation
sigma.eps
# Likelihood
for (i in 1:nobs) {
~ dnorm(mu[i], tau.eps) # The random variable
dbh[i] <- alpha[site[i]] + beta1*agec[i] # Expectation
mu[i]
} }
We then gather our data for model fitting, creating the site indicator variables, MCMC settings, and call JAGS.
# Bundle data
<- list(dbh = pines$dbh, agec = as.numeric(pines$agec),
jags.data site = as.numeric(as.factor(pines$site)),
n.groups=length(unique(pines$site)),
nobs = nrow(pines))
# Parameters to estimate
<- c("beta0i", "beta0", "beta1", "sigma.b0", "sigma.eps")
parameters
# Start Gibbs sampling
<- jags.parallel(data=jags.data,
out.ri parameters.to.save=parameters,
model=jags.lme,
n.thin=1, n.chains=3, n.burnin=1000, n.iter=10000)
Next, we inspect the output and compare it to the results we obtained from lmer
MCMCsummary(out.ri, params=c("beta0", "beta1", "sigma.b0", "sigma.eps"), round=3)
mean sd 2.5% 50% 97.5% Rhat n.eff
beta0 16.468 0.604 15.275 16.465 17.687 1.00 27904
beta1 2.438 0.483 1.503 2.431 3.377 1.00 7369
sigma.b0 2.173 0.680 0.946 2.132 3.652 1.01 2580
sigma.eps 4.067 0.254 3.604 4.054 4.599 1.00 8944
summary(lmer.ri)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: dbh ~ agec + (1 | site)
Data: pines
REML criterion at convergence: 918.8
Scaled residuals:
Min 1Q Median 3Q Max
-1.9218 -0.7456 -0.0585 0.6020 2.8824
Random effects:
Groups Name Variance Std.Dev.
site (Intercept) 4.216 2.053
Residual 16.147 4.018
Number of obs: 160, groups: site, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 16.4743 0.5608 13.8574 29.375 7.02e-14 ***
agec 2.4305 0.4100 108.5317 5.928 3.70e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
agec 0.012
We see that the two sets of results are quite similar. An advantage of the Bayesian implementation, however, is that we can easily estimate uncertainty associated with both site-specific parameters \((\beta_{0i} = \beta_0 + b_{0i}\)) as well as the population-level parameters (\(\beta_0\)).
MCMCsummary(out.ri, params="beta0i", round=3)
mean sd 2.5% 50% 97.5% Rhat n.eff
beta0i[1] 17.987 1.243 15.637 17.965 20.504 1 11592
beta0i[2] 15.344 1.152 13.075 15.362 17.551 1 21458
beta0i[3] 18.806 1.409 16.153 18.772 21.618 1 7549
beta0i[4] 19.179 1.524 16.326 19.144 22.248 1 5820
beta0i[5] 16.661 1.239 14.268 16.636 19.189 1 22841
beta0i[6] 18.176 1.554 15.293 18.116 21.350 1 13205
beta0i[7] 15.960 1.159 13.619 15.979 18.205 1 19942
beta0i[8] 18.265 1.222 15.931 18.243 20.711 1 9480
beta0i[9] 15.652 1.238 13.165 15.676 18.034 1 25608
beta0i[10] 16.194 1.162 13.877 16.205 18.446 1 23245
beta0i[11] 18.496 1.456 15.798 18.452 21.447 1 9248
beta0i[12] 14.034 1.323 11.415 14.057 16.522 1 7445
beta0i[13] 14.738 1.297 12.141 14.774 17.172 1 12513
beta0i[14] 16.151 1.238 13.741 16.157 18.580 1 27000
beta0i[15] 16.146 1.143 13.910 16.145 18.418 1 25592
beta0i[16] 17.988 1.297 15.555 17.960 20.596 1 12135
beta0i[17] 14.421 1.296 11.809 14.445 16.866 1 9533
beta0i[18] 14.606 1.207 12.166 14.626 16.878 1 10863
beta0i[19] 16.393 1.144 14.135 16.394 18.651 1 27536
beta0i[20] 14.234 1.352 11.515 14.264 16.769 1 7685
We can also easily inspect and quantify uncertainty in our variance parameters using their estimated posterior distributions (Figure 18.20).
denplot(out.ri, parms=c("sigma.b0", "sigma.eps"))
18.16.2 Random intercept and slope model
When specifying mixed models that include multiple random effects (e.g., site-specific intercepts and slopes), we have to consider how these parameters vary and co-vary. In the simple random intercept and slope model from Section 18.7.2, we had 4 variance parameters: \(\sigma_{b_0}, \sigma_{b_1}, Cov(b_0, b_1)\) (or \(cor(b_0,b_1))\), and \(\sigma_{\epsilon}\). We will need priors for each of these parameters. One option is to specify a uniform distribution for \(cor(b_0, b_1)\) and priors for each of \(\sigma_{b0}, \sigma_{b1}\) and \(\sigma_{\epsilon}\). Together, these priors specify a prior distribution for the \(2 \times 2\) variance/covariance matrix of the random effects, \(\Sigma_b\):
\(\Sigma_b = \begin{bmatrix} \sigma^2_{b_0} & Cov(b_0, b_1)\\ Cov(b_0, b_1) & \sigma^2_{b1} \end{bmatrix}\)
We can then specify the distribution of the random effects using dmnorm
in JAGS, which refers to the multivariate normal distribution. We specify the mean of this distribution using:
B
= \(\begin{bmatrix}
\beta_{0i} = \beta_0 + b_{0i}\\
\beta_{1i} = \beta_1 + b_{1i}
\end{bmatrix}\)
We also need to specify the precision matrix:
Tau.B
= \(\Sigma_b^{-1}\).
We demonstrate how to implement this model using JAGS, below:
<-function(){
jags.lme.rc
# Site-specific parameters
for (i in 1:n.groups){
# To allow for correlation between alpha[i] and beta1[i], we need to model
# their joint (multivariate) distribution
1:2]~ dmnorm(B.hat[i,], Tau.B[,]) # distribution of the vector (beta0[i], beta1[i])
B[i,1]<-beta0 # mean for intercepts
B.hat[i,2]<-beta1 # mean for slopes
B.hat[i,<- B[i,1] # Random intercepts
beta0i[i] <- B[i,2] # Random slopes for age
beta1i[i]
}
# Hyperpriors for mean intercept and slope
~ dnorm(0, 0.001)
beta0 ~ dnorm(0, 0.001)
beta1
# Prior for parameters in Sigma = var/cov matrix
# of the slope/intercept parameters
~ dunif(0,100) # sd intercepts
sigma.b0 ~ dunif(0,100) # sd of slopes
sigma.b1 ~ dunif(-1,1) # correlation among intercepts and slopes
cor.b0.b1 1,1]<-pow(sigma.b0,2)
Sigma.B[2,2]<-pow(sigma.b1,2)
Sigma.B[1,2]<-cor.b0.b1*sigma.b0*sigma.b1
Sigma.B[2,1]<-Sigma.B[1,2]
Sigma.B[
# Tau = inverse of Sigma (analogous to precision for univariate normal distribution)
1:2,1:2]<-inverse(Sigma.B[,])
Tau.B[
# Prior for within-site errors
<- 1 / ( sigma.eps * sigma.eps) # Residual precision
tau.eps ~ dunif(0, 100) # Residual standard deviation
sigma.eps
# Likelihood
for (i in 1:nobs) {
~ dnorm(mu[i], tau.eps)
dbh[i] <- beta0i[site[i]] + beta1i[site[i]]*agec[i]
mu[i]
}
}
# Parameters to estimate
<- c("beta0", "beta1", "betaoi", "beta1i", "sigma.b0",
parameters "sigma.b1", "sigma.eps", "cor.b0.b1")
# Start Gibbs sampling
<- jags.parallel(data = jags.data,
out.rc parameters.to.save=parameters,
model=jags.lme.rc,
n.thin=1, n.chains=3, n.burnin=100, n.iter=10000)
Below, we compare the output from JAGS and lmer
for the random intercept and slope model:
MCMCsummary(out.rc, params=c("beta0", "beta1", "sigma.b0",
"sigma.b1", "cor.b0.b1", "sigma.eps"), round=3)
mean sd 2.5% 50% 97.5% Rhat n.eff
beta0 16.455 0.610 15.287 16.441 17.712 1.00 6696
beta1 2.385 0.599 1.224 2.369 3.571 1.00 1131
sigma.b0 1.990 0.771 0.455 1.977 3.606 1.00 1100
sigma.b1 1.123 0.697 0.074 1.063 2.673 1.01 624
cor.b0.b1 0.151 0.477 -0.847 0.196 0.913 1.00 836
sigma.eps 4.032 0.260 3.562 4.019 4.580 1.00 5515
summary(lmer.rc)
Linear mixed model fit by REML ['lmerMod']
Formula: dbh ~ agec + (1 + agec | site)
Data: pines
REML criterion at convergence: 917.4
Scaled residuals:
Min 1Q Median 3Q Max
-1.90307 -0.72242 -0.08194 0.58100 2.93686
Random effects:
Groups Name Variance Std.Dev. Corr
site (Intercept) 3.740 1.934
agec 1.122 1.059 0.25
Residual 15.631 3.954
Number of obs: 160, groups: site, 20
Fixed effects:
Estimate Std. Error t value
(Intercept) 16.5258 0.5601 29.506
agec 2.4502 0.4934 4.966
Correlation of Fixed Effects:
(Intr)
agec 0.157
We see that posterior means for the variance parameters tend to be larger than the estimates from lmer
. The posterior medians are closer to the frequentist estimates. The estimated correlation between \(\beta_{0i}\) and \(\beta_{1i}\) also differs considerably between the two methods, but the 95% credible interval for this parameter is quite wide (-0.847, 0.913). Therefore, the data are not very informative with respect to this parameter.
This brings up an important point – if we include additional random effects (e.g., random intercepts associated with additional grouping variables or random slopes associated with additional level-1 covariates), then our covariance matrix describing the random effects will quickly grow in size, requiring the estimation of additional variance and covariance parameters. If, for example, we had a second level-1 covariate and wanted to allow each site to have its own coefficient, we would need to estimate 3 variance parameters (1 for the intercept and 2 for the level-1 coefficients) and 3 covariance parameters describing how the intercept co-varies with the level-1 coefficients and how the 2 level-1 coefficients co-vary. Estimation of these variance parameters can be challenging, unless we have a large number of groups (in this case, sites). Bayesian applications also get more complicated as they require specification of a valid prior for the variance-covariance matrix (see e.g., Chung et al. 2015).
One way to simplify the model is to assume the random intercepts vary independently from the random slopes. Using lmer
, we can fit this model using the following notation:
<- lmer(dbh ~ agec + (1 | site) + (0 + agec | site), data=pines)
lmer.rc.ind summary(lmer.rc.ind)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: dbh ~ agec + (1 | site) + (0 + agec | site)
Data: pines
REML criterion at convergence: 917.6
Scaled residuals:
Min 1Q Median 3Q Max
-1.92349 -0.73821 -0.08194 0.57261 2.87679
Random effects:
Groups Name Variance Std.Dev.
site (Intercept) 3.539 1.881
site.1 agec 1.129 1.063
Residual 15.678 3.960
Number of obs: 160, groups: site, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 16.4512 0.5506 14.0053 29.878 4.38e-14 ***
agec 2.4094 0.4898 15.2072 4.919 0.000178 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
agec 0.037
Here, the (0 + agec | site)
tells R that we want each site to have its own coefficient for agec
, but we want the coefficients to vary independently of the intercept. Fitting this model in JAGS is rather simple and only requires priors for the three variance parameters, \(\sigma_{b_0}\), \(\sigma_{b_1}\), and \(\sigma_{\epsilon}\).
18.17 References
We will later refer to these as random effects, or more specifically, random intercepts.↩︎
For some interesting viewpoints on this larger topic, see this dynamic ecology blog and counterpoints here and here.↩︎
Because
Log_Water_Se
is constant within each lake, we won’t be able to estimate both the coefficient forLog_Water_Se
and separate coefficients for all lakes. If you try to fit this model, you will find that one of the estimated lake-level coefficients isNA
↩︎In Section 18.13, we will discuss other approaches that are often recommended for choosing an appropriate mixed model.↩︎
It is quite common for users to just say that they fit a “random effect” for variable \(x\); almost always, this means they included a random intercept associated with a categorical variable, \(x\)↩︎
In general, \(Cov(X,Y) = E[XY]- E[X]E[Y]\).↩︎
Again, a random trial consists of collecting a data set of the same size as this one, generated in the same way, assuming all assumptions of the model are met, and then analyzed the resulting data using the same mixed-effects model↩︎
Note, the
re.form
argument can also be used to include some, but not all, random effects when models are specified to have multiple random effects (see e.g., Section 18.15.↩︎Note,
predict
is a generic function for generating predictions from different models; behind the scenes, when we usepredict
with a model fit usinglmer
orglmer
, it actually calls thepredict.merMod
function↩︎Another option that might be worth exploring is the
predictInterval
function in themerTools
package (Knowles and Frederick 2023).↩︎Note, the
apply
function applies a function to the margins of an array or matrix. Here, we apply thequantile
function to the columns ofboot.pred$t
by specifying2
for the second argument (using1
instead would have applied thequantile
function to the rows ofboot.pred$t
.↩︎Note that their
est
variable has been renamed asHRest
and theirSA
variable has been renamedStudyArea
in theData4Ecologists
package.↩︎