Hierarchical Models in R – biased-algorithms.com

“Everything is connected… but some things are connected in a special way.”
This quote from a systems theory philosopher can apply perfectly to how data is structured in the real world. You see, in many situations, the data you’re working with doesn’t exist in isolation—there’s a hierarchy at play. This is where hierarchical models step in.

So, what exactly are hierarchical models?
Think of them as the go-to statistical approach for dealing with multilevel or nested data. Imagine you’re studying students’ test scores across different schools. Naturally, some schools will perform better than others due to factors like funding or teacher experience. But within each school, individual students also vary in performance. This is the kind of complexity that hierarchical models handle. They allow you to capture the variations both between groups (schools) and within groups (students).

You might be wondering: Why not just use a standard regression model?
Here’s the deal: traditional models tend to assume that all your data points are independent of each other. But when your data is nested, this assumption is shattered. Ignoring this hierarchy can lead to misleading results, which is why hierarchical models are essential.

Now, let’s get practical for a moment. Where do hierarchical models shine?
You’ll find them in fields like healthcare (patients nested within hospitals), education (students nested within classrooms), and social sciences (individuals nested within communities). Whether it’s comparing patient outcomes across hospitals or tracking student progress within different schools, hierarchical models provide more nuanced insights than flat, one-level models ever could.

Understanding Hierarchical Structures

Here’s something I’m sure you’ll agree with: life itself is hierarchical. From cells forming tissues to people forming societies, everything tends to be part of something bigger. And guess what? Your data behaves the same way!

Types of Hierarchical Structures

You’ll often encounter two main types of hierarchical models: two-level models and multilevel models. Let me break it down for you:

Two-Level Models:
This is the simplest form of a hierarchical structure. You have one level of grouping. For instance, if you’re analyzing students nested within schools, the students are the first level, and the schools are the second level.
A typical example in R might look like this:

model <- lmer(score ~ student_characteristics + (1 | school), data = data)

In this example, we’re saying that students’ test scores depend on individual characteristics, but we’re also allowing the school effect (which may include factors like quality of education) to vary across schools.

Multilevel Models:
Now, let’s crank things up a notch. What if you have data that’s nested even further? Imagine students nested within classes, which are nested within schools. You’ve now got three levels to deal with! This is where multilevel models come into play.
For example, you might want to analyze how both classroom-specific and school-specific factors influence students’ performance. R can handle this too, with models like:

model <- lmer(score ~ student_characteristics + (1 | class/school), data = data)

Here, you’re allowing the data to be influenced by both class-level and school-level differences.

Random vs Fixed Effects

Here’s something that might seem tricky at first, but I promise, it’s simpler than it sounds: random effects and fixed effects.

Fixed Effects:
These are variables where you’re interested in estimating specific group-level effects. Think of it like this: you want to know how much gender, for instance, influences test scores across all students. Gender is a fixed effect because you expect it to affect everyone in a similar way.
Random Effects:
Now, suppose you believe that the effect of being in a particular school varies randomly from school to school. That’s where random effects come in. You’re not trying to estimate a specific value for each school, but rather allowing for the possibility that schools differ in random ways. For example, if you think that classroom size has a different impact on test scores in different schools, you’d treat school as a random effect.

Here’s an example to clarify:
Let’s say you’re studying how teaching methods (fixed effect) influence student outcomes, but you also believe that the effectiveness of those methods might vary across different schools (random effect). You’d model it like this in R:

model <- lmer(score ~ teaching_method + (1 | school), data = data)

And there you have it—a clear breakdown of how hierarchical structures work. Whether you’re working with two levels of data or several, understanding these concepts will allow you to better capture the complex relationships within your dataset.

Likelihood Estimation vs Bayesian Estimation

Alright, now that you’ve got a handle on the formula, let’s talk about how we actually fit these models. There are two main methods: Likelihood Estimation and Bayesian Estimation.

Likelihood Estimation:
In a nutshell, this method works by maximizing the likelihood that the model’s predictions match the observed data. It’s the more traditional, frequentist approach, and the popular R package lme4 uses this method. The output you get from lmer() in R is based on likelihood estimation.You might use likelihood estimation when you’re looking for a straightforward approach to hierarchical modeling, particularly if you’re comfortable with frequentist methods. It’s computationally efficient and works well for many standard use cases.
Bayesian Estimation:
If you’re the kind of person who likes to bake prior beliefs into your models, Bayesian estimation is the way to go. Rather than just maximizing likelihood, Bayesian methods incorporate prior distributions for your parameters and update those priors with the data you observe. This gives you a posterior distribution, which provides richer information about the possible values of your model’s parameters.The brms or rstanarm packages in R are popular choices for Bayesian hierarchical models. Bayesian methods can be particularly useful if you’re dealing with small datasets or want to incorporate prior knowledge into your model. However, they do tend to be more computationally demanding than likelihood-based methods.

So, when should you use which?

If you’re after speed and you’re comfortable with classical statistical methods, likelihood estimation is a solid choice.
If you prefer a probabilistic approach and want to quantify uncertainty with more detail, Bayesian methods might be worth the extra computational effort.

Fitting Hierarchical Models in R

Now that we’ve covered the theoretical side, let’s roll up our sleeves and get to the practical part—fitting hierarchical models in R. Trust me, with the right tools, this is going to be easier than you might think.

Required Packages

Here are the key packages you’ll need in your toolkit:

lme4: This is the workhorse for frequentist hierarchical models. You’ll use the lmer() function to fit linear mixed-effects models.
brms or rstanarm: If you’re leaning toward Bayesian models, these two are your go-to packages. They wrap around Stan, a powerful probabilistic programming language, making it easier to fit Bayesian models in R.
Helper packages:
- broom.mixed: Useful for tidying up model outputs.
- tidybayes: A great package for working with Bayesian model outputs and visualizing results.

Step-by-Step Implementation

Let’s walk through how you’d fit a basic two-level hierarchical model.

1. Data Preparation

Before you even start fitting a model, it’s crucial to ensure your data is structured correctly. Typically, you’ll organize your data in long format—this means every row corresponds to an individual observation, with group-level identifiers included as a separate column.

Example:

# Long format example: each row is a student, with 'school' as a group identifier
data <- data.frame(student_id = 1:100, school = rep(1:10, each = 10), score = rnorm(100))

2. Fitting a Basic Two-Level Model

Now, let’s fit the model! Here’s how you’d do it using the lme4 package:

library(lme4)
# Fitting a basic two-level hierarchical model
model <- lmer(score ~ study_time + (1 | school), data = data)
summary(model)

In this example, study_time is the predictor, and school is the group-level variable. The (1 | school) syntax tells R that the model should include a random intercept for each school.

3. Interpreting the Results

Once you run the model, you’ll get output like this:

Fixed effects: These are your coefficients for predictors like study_time. They tell you how much the response variable (test score) changes for each unit increase in the predictor, holding everything else constant.
Random effects: These represent the variability between groups (in this case, schools). Specifically, the random intercept shows how much each school’s average test score deviates from the overall average.

4. Extending to More Complex Models

Once you’ve mastered the basics, you can start adding more complexity to your models.

Random slopes: If you think the relationship between study_time and score varies across schools (e.g., study time is more effective in some schools than others), you can model that by adding a random slope:

model <- lmer(score ~ study_time + (study_time | school), data = data)

Three-level models: What if you have students nested within classrooms nested within schools? No problem. You can add another level:

model <- lmer(score ~ study_time + (1 | class/school), data = data)

By now, you should have a solid grasp on how to fit and interpret hierarchical models in R. Once you’ve nailed the two-level model, the sky’s the limit—you can keep adding complexity based on your data’s structure.

Bayesian Hierarchical Models in R

“In the face of uncertainty, Bayesian statistics are your best friend.” That’s what many statisticians will tell you, and it’s especially true when working with hierarchical models.

Why the Bayesian Approach?

Let’s cut right to the chase: Bayesian methods give you a lot more flexibility than frequentist approaches, especially when dealing with small datasets or when you want to incorporate prior knowledge into your model.

Here’s an example: imagine you’re analyzing patient recovery times across hospitals, but you already have strong evidence from previous studies that smaller hospitals tend to have shorter recovery times. With a Bayesian approach, you can explicitly include that prior knowledge into your model, making it more robust. Not only does this make your model more reliable, but it also allows you to quantify uncertainty in a more detailed way—using posterior distributions instead of simple point estimates.

But you might be wondering: When exactly should I use Bayesian methods?
Here’s the deal: if you have a large dataset and aren’t particularly concerned with priors, the frequentist approach works just fine. But when your data is limited or you have strong prior information that you’d like to bake into the analysis, Bayesian methods really shine.

Fitting a Bayesian Model

Now let’s roll up our sleeves and fit a Bayesian hierarchical model using the brms package in R. If you’ve already used lme4 for frequentist models, you’ll find brms syntax quite familiar, with the added bonus of Bayesian functionality under the hood.

Here’s a simple example:

library(brms)

# Fitting a Bayesian hierarchical model
model <- brm(y ~ x + (1 | group), data = dataset)
summary(model)

In this model:

y is the response variable (just like before).
x is the predictor variable.
(1 | group) tells the model to include a random intercept for each group (in this case, similar to schools or hospitals).

What makes this Bayesian?
Under the hood, brms uses Stan, a probabilistic programming language, to estimate the posterior distributions of your parameters. Instead of just giving you a single estimate for the coefficients, Bayesian models give you a full distribution of possible values.

Interpreting Bayesian Output

Once you run the summary() function, you’ll notice a few things that look different from the frequentist output:

Posterior distributions: For each parameter (like the intercept or slope), you’ll get a distribution that reflects the range of possible values given the data and prior information.
Credible intervals: In frequentist models, you get confidence intervals, but in Bayesian models, you get credible intervals. This might seem like a small distinction, but it’s quite important. A credible interval gives you the range within which the parameter is likely to fall with a certain probability. For instance, a 95% credible interval means there’s a 95% probability that the parameter lies within that range.

Here’s something cool: Bayesian intervals are often more intuitive to interpret. When you say, “There’s a 95% probability that the true effect of x lies between 0.3 and 0.7,” most people get that right away!

Bayesian models also allow you to visualize posterior predictive checks, where you can see how well your model’s predictions match the actual data. This gives you an immediate visual sense of whether your model fits well or not.

Model Diagnostics and Validation

Now that you’ve fitted your model, it’s time for the all-important step of checking assumptions and making sure your model is up to scratch. I always tell people, “A model unchecked is a model misunderstood.”

Checking Assumptions

Let’s start with the basics. Even though hierarchical models can account for more complexity, they still rely on a few fundamental assumptions, such as:

Linearity: The relationship between the predictor and response should be linear. This assumption is generally straightforward, but it’s important to visualize your data to make sure.
Normality of residuals: Just like in simple regression, you want your residuals (the differences between your observed and predicted values) to be normally distributed.
Homoscedasticity: This fancy word just means that the variability of residuals should be constant across levels of the predictor. If your residuals fan out like a cone, you’ve got a problem!

How do you check these? Simple! Plot your residuals and run diagnostic checks.

Goodness-of-Fit

You might be wondering: How do I know if my hierarchical model fits well?

Here’s where AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion) come into play. Both of these metrics give you a way to compare models, with lower values indicating better fit. You can think of them as penalizing your model for complexity—so, they strike a balance between fit and simplicity.

If you’re working in a Bayesian framework, you can also use the WAIC (Watanabe-Akaike Information Criterion), which is tailored for Bayesian models.

Cross-validation for Hierarchical Models

When it comes to evaluating your model, cross-validation is an absolute must. For hierarchical models, you should consider group-level cross-validation. This means you can leave entire groups out during training and test the model’s ability to generalize to new groups (e.g., leave out one school or hospital at a time).

Here’s how you can implement cross-validation using brms for Bayesian models:

# Perform leave-one-group-out cross-validation
loo(model)

This gives you an idea of how well your model generalizes beyond the groups used in the training data.

Residual Plots and Variance Decomposition

You’ve run your model, and it’s time to check the residuals. But, unlike in simple regression, you’ll need to look at both individual-level and group-level residuals.

Individual-level residuals: These tell you how well your model fits the observations within each group.
Group-level residuals: These reveal how much variation is left between groups after accounting for your predictors.

Here’s a quick example of how to check residuals in R:

# Residual plot for individual-level residuals
plot(resid(model))

# Variance decomposition to see how much variance is explained by group
VarCorr(model)

Posterior Predictive Checks (For Bayesian Models)

Now, if you’re working with a Bayesian model, you get an extra, super-powerful tool: posterior predictive checks. This lets you simulate data from the model and compare it directly with your observed data. It’s a great way to visualize how well your model is capturing the underlying patterns.

Here’s an example:

# Posterior predictive checks in brms
pp_check(model)

This plot will show you the distribution of simulated data versus actual data, giving you a visual sense of whether the model’s predictions align well with reality.

In summary, the diagnostics and validation phase is where you make sure your model isn’t just looking good on the surface but is actually capturing the complexity and structure of the data appropriately. Whether you’re using a frequentist or Bayesian approach, it’s critical to check assumptions, evaluate goodness-of-fit, and visualize residuals.

Dealing with Complex Hierarchical Structures

Here’s something that might surprise you: not all data fits neatly into simple hierarchies. In the real world, you often run into overlapping groups or non-normal data distributions, and hierarchical models can help handle these more complex scenarios.

Crossed Random Effects

Let’s say you’re studying students, but now things get complicated: instead of just students nested in schools, these students are also nested in classrooms, which could overlap across different schools. For example, some students might attend special classes that include kids from multiple schools. In this case, students aren’t purely nested within a single school or classroom—they belong to both.

This type of structure is called crossed random effects. It’s like having multiple grouping factors, and individuals are members of multiple groups simultaneously. You need a model that can account for variability across both schools and classrooms.

Here’s how you can model crossed random effects in R using lme4:

library(lme4)
# Crossed random effects: random intercepts for both school and classroom
model <- lmer(score ~ study_time + (1 | school) + (1 | classroom), data = dataset)

In this example, we’re allowing both schools and classrooms to contribute their own random effects, capturing the fact that students are influenced by both.

Non-Normal Distributions

Now, let’s tackle another wrinkle: non-normal data. Imagine you’re working with binary data (like a student either passing or failing a test) or count data (such as the number of times a student raises their hand in class). In these cases, the typical assumption of normality for the outcome variable just doesn’t hold.

This might make you wonder: Can hierarchical models handle this?

Absolutely! Enter the generalized linear mixed model (GLMM), which extends hierarchical models to deal with non-normal outcomes. Instead of assuming that the residuals are normally distributed, GLMMs use other distributions, such as the binomial distribution for binary data or the Poisson distribution for count data.

You can use the glmer() function from the lme4 package to fit GLMMs. Here’s how you’d fit a model for binary outcomes:

library(lme4)
# Fitting a GLMM for binary data (e.g., pass/fail)
model <- glmer(pass_fail ~ study_time + (1 | school), data = dataset, family = binomial)

Here, the family = binomial argument tells R that we’re dealing with binary outcomes. Similarly, if you were dealing with count data, you’d use family = poisson.

Hierarchical Models with Interaction Terms

You might be wondering: Can hierarchical models handle more complex relationships, like interactions between predictors?

The short answer is: Yes, absolutely.

In hierarchical models, you can explore how different predictors interact with each other, both at the individual level and the group level. For instance, you might want to know if the effect of study time on test scores depends on whether the student attends a high-performing or low-performing school. This is where interaction terms come into play.

Here’s a simple interaction model in R:

library(lme4)
# Interaction between individual-level predictor (study_time) and group-level predictor (school_performance)
model <- lmer(score ~ study_time * school_performance + (1 | school), data = dataset)

In this model, the study_time * school_performance term tells R to estimate how the effect of study time changes depending on the school’s performance. Maybe students in high-performing schools benefit more from studying, or maybe it’s the opposite!

By modeling interactions, you can uncover more nuanced relationships in your data and account for how group-level characteristics (like school performance) influence individual-level outcomes (like test scores).

Hierarchical Models for Longitudinal Data

Let’s talk about time. Longitudinal data involves repeated measurements over time—for example, tracking a patient’s recovery process across multiple weeks or measuring a student’s performance over the school year.

Why use hierarchical models for this?
Here’s the deal: repeated measurements from the same individual (or group) are inherently nested within that individual. Ignoring this structure would lead to problems because the data points aren’t independent of each other—they’re correlated over time. Hierarchical models allow you to account for these correlations, capturing both individual growth patterns and group-level trends.

Fitting Longitudinal Models

Let’s say you’re analyzing students’ test scores across four different time points. You can model this longitudinal structure using hierarchical models by treating each student as a random effect, which captures their individual trajectory over time.

Here’s how you might do it in R:

library(lme4)
# Longitudinal model: test scores measured at multiple time points for each student
model <- lmer(score ~ time + (1 | student), data = dataset)

In this model:

time is your predictor variable (the different time points when the test scores were measured).
(1 | student) tells R to include a random intercept for each student, allowing each student to have their own baseline score.

You can also get more complex by modeling random slopes—that is, letting the effect of time vary from student to student. Maybe some students improve quickly, while others improve more slowly. Here’s how you can do that:

model <- lmer(score ~ time + (time | student), data = dataset)

By allowing both random intercepts and random slopes, you’re acknowledging that not only do students start off at different levels, but they also progress at different rates.

Longitudinal models are powerful because they let you account for both the overall trend and the individual differences in how that trend unfolds over time.

By now, you should have a solid understanding of how to deal with complex hierarchical structures, interactions, and longitudinal data. These concepts open up a world of possibilities, allowing you to explore relationships that go beyond simple groupings and account for overlapping groups, non-normal data, and changes over time.

Practical Example with the `sleepstudy` Dataset

“Data is just the beginning—what you do with it is what counts.”
To give you a hands-on feel of hierarchical models, we’ll use a dataset that is well known in the mixed models world: the sleepstudy dataset from the lme4 package. This dataset explores the effects of sleep deprivation on reaction time for participants over a series of days.

Step 1: Load the Data

First, let’s get the dataset into your R environment. If you haven’t worked with lme4 before, don’t worry—this will be straightforward:

# Load necessary packages
library(lme4)
data("sleepstudy")

# Inspect the dataset
head(sleepstudy)

Here’s what you’re looking at:

Reaction: The dependent variable, which represents reaction times in milliseconds.
Days: The number of days of sleep deprivation. This is our predictor.
Subject: Each row is nested within a subject, meaning multiple observations are taken for each subject across days.

You might be thinking: Why not just run a regular regression on this data?
Well, here’s the deal: the measurements are repeated for each subject, which means the data points are not independent. We need a hierarchical model to account for this.

Step 2: Fit a Basic Two-Level Hierarchical Model

Let’s start with a basic two-level model where we allow each subject to have their own random intercept (baseline reaction time), but assume that the effect of days of sleep deprivation is the same across all subjects:

# Fit a basic mixed-effects model
model <- lmer(Reaction ~ Days + (1 | Subject), data = sleepstudy)

# Summarize the results
summary(model)

Here’s what’s happening in this model:

Reaction ~ Days: We’re modeling how reaction times change as the number of days of sleep deprivation increases.
(1 | Subject): We’re allowing each subject to have a different baseline reaction time by including a random intercept for the subject.

Step 3: Interpreting the Results

You’ll get a table with both fixed effects and random effects. Let’s break it down:

Fixed effects:
The output will show the estimated effect of Days on Reaction. For instance, you might see something like:

Fixed effects:
Estimate Std. Error t value
(Intercept) 251.41    9.69     25.94
Days        10.47     1.55     6.74

The intercept (251.41) is the estimated reaction time when Days = 0 (i.e., on the first day).
The slope for Days (10.47) tells you that for each additional day of sleep deprivation, the reaction time increases by about 10 milliseconds.

Random effects:
The random effects table will show how much variation there is between subjects. For instance, you’ll see something like:

Random effects:
Groups   Name        Variance Std.Dev.
Subject  (Intercept) 6120.52  78.24
Residual             654.94   25.59

- The variance for the random intercept (6120.52) shows how much subjects differ from each other in their baseline reaction times.
- The residual variance (654.94) tells you how much individual reaction times vary within subjects, after accounting for the number of days.

Step 4: Extending the Model with Random Slopes

Now, you might be wondering: What if the effect of sleep deprivation varies from subject to subject? Some people might get really slow after just one day, while others could hold up for several days before their reaction times degrade.

We can model this by allowing each subject to have a random slope for the effect of Days:

# Fit a model with random slopes
model_random_slope <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)

# Summarize the results
summary(model_random_slope)

Here, the (Days | Subject) term tells R to estimate a separate slope for each subject, meaning each subject’s reaction time changes at a different rate depending on the number of days of sleep deprivation.

Step 5: Interpreting the Random Slopes

Now, you’ll see additional information in the random effects section. It will tell you how much variability there is between subjects in both their intercepts (baseline reaction times) and their slopes (how much their reaction times change with each day of sleep deprivation):

Random effects:
Groups   Name        Variance Std.Dev. Corr
Subject  (Intercept)  6120.52  78.24
         Days         35.07    5.92    -0.45
Residual             654.94    25.59

Intercept variance (6120.52): This is how much subjects’ baseline reaction times differ from each other.
Slope variance (35.07): This tells you that there’s variation between subjects in how much their reaction times change per day of sleep deprivation.
Correlation between intercepts and slopes (-0.45): This tells you that subjects who start with higher reaction times (larger intercepts) tend to have smaller increases in reaction times as they become more sleep deprived (negative correlation).

Step 6: Visualizing the Results

Now, let’s make things visual. One of the best ways to understand a hierarchical model is by plotting the individual trajectories and seeing how they vary.

You can use ggplot2 to plot the fitted lines for each subject:

library(ggplot2)
# Plotting individual trajectories
ggplot(sleepstudy, aes(x = Days, y = Reaction, group = Subject)) +
  geom_point() +
  geom_line(aes(y = predict(model_random_slope)), color = 'blue') +
  facet_wrap(~ Subject)

This plot will show you each subject’s individual reaction times across days, as well as the fitted lines from your hierarchical model. You’ll see how the slopes and intercepts vary from subject to subject, giving you a visual sense of how well the model captures the data.

Step 7: Presenting the Findings

When presenting your findings, make sure to highlight the key takeaways from both the fixed effects and random effects:

Fixed Effects: The overall relationship between sleep deprivation and reaction time is significant, with reaction times increasing by about 10 milliseconds for each day of sleep deprivation.
Random Effects: There is considerable variability between subjects, both in their baseline reaction times and in how much their reaction times change across days.

By visualizing the results, you give your audience a clear understanding of how the model captures individual differences—an essential part of understanding hierarchical data.

Conclusion

“The beauty of statistics is that it turns complexity into clarity.”
And that’s exactly what hierarchical models do—they help you navigate the complexity of nested data and draw meaningful insights from it. Whether you’re dealing with students nested in schools, patients nested in hospitals, or repeated measures over time, hierarchical models allow you to capture both the individual-level and group-level variations that matter most.

In this blog, we’ve explored hierarchical models from the ground up, diving into the mathematical formulation, fitting models in R, and interpreting the results using both frequentist and Bayesian approaches. Along the way, we tackled complex structures, such as crossed random effects and non-normal distributions, and even ventured into longitudinal data analysis. Through a practical example with the sleepstudy dataset, you’ve seen firsthand how these models can be applied to real-world data and provide rich insights.

If there’s one thing I hope you take away from this, it’s that hierarchical models are not just an academic exercise—they are a powerful tool for uncovering patterns in the messy, structured world around us. Armed with the knowledge from this blog, you’re now equipped to tackle more sophisticated analyses, whether you’re handling simple two-level data or digging into complex multilevel structures.

As you continue your journey with hierarchical models, remember that the true strength of these models lies in their flexibility and ability to adapt to your data’s structure. And when applied thoughtfully, they can unlock deeper insights, guiding your decisions with a clearer, more nuanced understanding of your data.