Run and Interpret a Mixed Model in R

A mixed model (also called a mixed-effects model) is used when the observations (i.e. rows) in the data are not independent. This can occur, for example, when measurements are taken from the same individuals multiple times or when observations can be grouped into clusters, such as families, co-workers, etc.

In this tutorial, we will run and interpret a mixed model, using the “mammals sleep” dataset (msleep) available from the ggplot2 package (included in tidyverse), to explore the relationship between brain weight in kilograms (brainwt) and the amount of time spent awake (awake), taking into consideration the animal’s classification (order).

1. Loading the data

First, we select the variables of interest and remove observations with missing values:

library(tidyverse)

msleep <- 
  msleep |> 
  select(brainwt, order, awake) |> 
  drop_na()

msleep
## A tibble: 56 × 3
#   brainwt order        awake
#     <dbl> <chr>        <dbl>
# 1 0.0155  Primates       7  
# 2 0.00029 Soricomorpha   9.1
# 3 0.423   Artiodactyla  20  
# 4 0.07    Carnivora     13.9
# 5 0.0982  Artiodactyla  21  
# 6 0.115   Artiodactyla  18.7
# 7 0.0055  Rodentia      14.6
# 8 0.0064  Rodentia      11.5
# 9 0.001   Soricomorpha  13.7
#10 0.0066  Rodentia      15.7
## ℹ 46 more rows
## ℹ Use `print(n = ...)` to see more rows

2. Why an ordinary linear regression will not work in this case

In theory, studying the effect of brain weight on the amount of time spent awake, while taking into consideration the order of the animal can be done with the following linear regression model:

awake = β0 + β1 brainwt + β2 order + ε

In this model, the coefficient β1 will reflect the relationship between brain weight and time spent awake, while controlling for the animal’s order. Except that in this case, the variable order has 17 categories with most categories containing just a few observations:

msleep |> count(order)
## A tibble: 17 × 2
#   order               n
#   <chr>           <int>
# 1 Afrosoricida        1
# 2 Artiodactyla        5
# 3 Carnivora           7
# 4 Chiroptera          2
# 5 Cingulata           2
# 6 Didelphimorphia     1
# 7 Diprotodontia       1
# 8 Erinaceomorpha      2
# 9 Hyracoidea          3
#10 Lagomorpha          1
#11 Monotremata         1
#12 Perissodactyla      3
#13 Primates            9
#14 Proboscidea         2
#15 Rodentia           10
#16 Scandentia          1
#17 Soricomorpha        5

A simple solution would be to remove order from the model. But then we would miss out on correlations inside each order that may not be detectable in the overall data. In this case, a mixed model can help us take into account the animal’s order.

3. Considerations before running a mixed model

Before we run the mixed model, let’s visualize the relationship between brain weight and hours spent awake:

msleep |> 
  ggplot(aes(x = brainwt, y = awake)) + 
  geom_point()

Output:

scatterplot showing the relationship between brain weight and hours spent awake

Most animals have a small brain weight, so let’s take the log of brain weight and visualize this relationship for different animal orders:

msleep |> 
  ggplot(aes(x = brainwt, y = awake)) + 
  geom_point(aes(color = order)) +
  geom_smooth(aes(color = order), se = FALSE, method = "lm") +
  geom_smooth(se = FALSE, method = "lm", color = "black") +
  scale_x_log10()

Output:

scatterplot showing the relationship between brain weight and time spent awake for different animal orders

The black line shows the relationship between brain weight and hours awake for all animals in the sample, and the colored lines represent this same relationship inside each animal order.

Looking at this graph, we will suppose that these regression lines have (almost) the same slopes but different intercepts, so we will use a mixed model with a random intercept. In other words, we are assuming that order only changes the intercept of the regression line but does not affect its steepness.

4. Running a mixed model

To add a random intercept according to the animal order, we use the lmer() function instead of lm() and add the term (1|order) to the formula as follows:

(Note that for a categorical outcome, we use the same syntax but with a glmer() function instead of glm())

# remember that we decided to use log(brainwt), see the plot above
model <- lmer(awake ~ log(brainwt) + (1|order), data = msleep)

summary(model)
#Linear mixed model fit by REML ['lmerMod']
#Formula: awake ~ log(brainwt) + (1 | order)
#   Data: msleep
#
#REML criterion at convergence: 279.4
#
#Scaled residuals: 
#    Min      1Q  Median      3Q     Max 
#-2.3161 -0.4295  0.1024  0.4773  2.0138 
#
#Random effects:
# Groups   Name        Variance Std.Dev.
# order    (Intercept) 12.737   3.569   
# Residual              4.919   2.218   
#Number of obs: 56, groups:  order, 17
#
#Fixed effects:
#             Estimate Std. Error t value
#(Intercept)   17.0206     1.3922  12.226
#log(brainwt)   0.8649     0.2448   3.534
#
#Correlation of Fixed Effects:
#            (Intr)
#log(branwt) 0.732 

Interpreting the output of the mixed model

As you can see, this summary is divided into 2 parts: Random effects and Fixed effects.

The random effects part shows how much the intercept changes with order. In this case, the standard deviation is 3.569 which is a considerable change in the intercept (which is equal to 17.0206) between different orders of animals.

The fixed effects part shows the coefficients of the model. These coefficients have the same interpretation as an ordinary linear regression model (see Interpret Log Transformations in Linear Regression).

Note that the model did not calculate p-values since the groups within order have different sample sizes and for some categories we have only 1 observation. So, to interpret the significance of the effect of log(brainwt), we will calculate the confidence interval for this variable:

confint(model)
#Computing profile confidence intervals ...
#                  2.5 %    97.5 %
#.sig01        2.1484675  5.310200
#.sigma        1.7854560  2.818833
#(Intercept)  14.1711288 19.781425
#log(brainwt)  0.3736985  1.370997

The 95% confidence interval of log(brainwt), [0.37, 1.37], does not contain zero, therefore the effect of brain weight on the number of hours spent awake is statistically significant.

Further reading