GLMM Maternal Mortality Federally- Summer 2025

This is a Report Template

Author

Carolyn Herrera & Catherine Funte (Advisor: Dr. Cohen)

Published

July 21, 2025

Slides: slides.html ( Go to slides.qmd to edit)

Important

Remember: Your goal is to make your audience understand and care about your findings. By crafting a compelling story, you can effectively communicate the value of your data science project.

Carefully read this template since it has instructions and tips to writing!

Introduction

Generalized Linear Mixed Models (GLMMs) are a flexible class of statistical models that combine the features of two powerful tools: Generalized Linear Models (GLMs) and Mixed-Effects Models(Agresti 2015). Like GLMs, GLMMs can model non-normal outcome variables, such as binary, count, or proportion data. However, they go a step further by incorporating random effects, which account for variation due to grouping or clustering in the data, correlated observations, and overdispersion.

In practical terms, GLMMs are especially useful when data points are not independent, such as when students are nested within schools, patients are treated within hospitals, or repeated measures are taken from the same subject over time. For example, Thall wrote that issues with longitudinal clinical trial basic count data from repeated measures taken from the same subject over time will have problems detecting comparable between subject outcomes because it can be difficult to determine if outcomes are time dependent or due to treatment groups, thus a general linear mixed model method may be utilized to represent dependence upon each patient, incorporate covariate data, create time as a function, account for variability between patients,and be flexible and tractable (Thall 1988). The random effects help model the correlation within clusters and allow for unobserved heterogeneity—differences that are not captured by the measured covariates.

GLMMs are good for:

  • Handling hierarchical or grouped data (e.g., students within classrooms, patients within clinics)(Lee and Nelder 1996)

  • Modeling non-normal outcomes, such as:

  • Improving inference by accounting for both fixed effects (predictors of interest) and random effects (random variation across groups)

  • Reducing bias and inflated Type I error rates that can result from ignoring data structure(Thompson et al. 2022)

GLMMs are ideal when your data is both complex in structure and involves non-Gaussian response variables, making them indispensable in fields like medicine, ecology, education, and social sciences. Tawiah et al describes zero-inflated Poisson GLMMs, an extension of Poisson GLMM that allows for overdispersion due to a prevalence of zeros in the data, common in health sector data(Tawiah, Iddi, and Lotsi 2020). The paper compares a Poisson GLM, a zero-inflated Poisson GLM, a Poisson GLMM, and a zero-inflated Poisson GLMM, applied to clustered maternal mortality data. Another paper by Owili et al utilizes a GLMM to investigate the impact of particulate matter on maternal and infant mortality globally (Owili et al. 2020). They use a Poisson link function and take year and country as random effects to account for differences in global data.

We wish to analyze the data of federal maternal mortality deaths via VSRR Provisional Maternal Death Counts and Rates dataset by utilizing a General linear mixed model with Poisson link as it is count data. We wish to see if ethnicity (a fixed effect) has any influence upon maternal death count by year(random effect). Like other public health or clinical data there will be issues such as correlated observations and overdispersion but GLMM will be utilized to parse through the noise and determine if indeed there are some patterns of maternal mortality among mothers of differing ethnicties.

Methods

Math Background

GLMMs can be considered an extension of GLMs, wherein a GLM includes the addition of random effects, or an extension of Linear Mixed Models (LMMs), where a linear model with fixed and random effects is extended for non-normal distributions. Let

  • \(\mathbf{y}\) be a \(Nx1\) column vector outcome variable

  • \(\mathbf{X}\) be a \(Nxp\) matrix for the \(p\) predictor variables

  • \(\boldsymbol{\beta}\) be a \(px1\) column vector of the fixed effects coefficients

  • \(\mathbf{Z}\) is a \(Nxq\) matrix of the \(q\) random effects

  • \(\mathbf{u}\) is a \(qx1\) vector of random effects, and

  • \(\boldsymbol{\epsilon}\) is a \(Nx1\) column vector of the residuals

Then the general equation for the model is given by:

\[\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\mathbf{Z}\mathbf{u}+\boldsymbol{\epsilon}\] (Salinas Ruı́z et al. 2023). GLMMs typically include a link function that relates the response variable \(\mathbf{y}\) to a linear predictor, \(\eta\), which excludes the residuals. So then \[\boldsymbol{\eta}=\mathbf{X}\boldsymbol{\beta}+\mathbf{Z}\boldsymbol{\lambda}\] The link function is \(g(\cdot)\), where \[g(E(\mathbf{y}))=\boldsymbol{\eta}\] where \(E(\mathbf{y})\) is the expectation of . The choice of link function depends on the outcome distribution. For this paper our data demonstrates a Negative Binomial distribution for count data, so we will use a log link function \[g(\cdot)=log_e(\cdot)\]

Assumptions

Before deciding to use a GLMM for our data, we had to check some assumptions (specific to our negative binomial distributed data).

  • The response variable and the predictors have a linear relationship within the levels of random effects.
  • The response variable is assumed to follow a negative binomial distribution, with \(\sigma^2>\mu\).
  • The residuals and random effects are independent.
  • The random effects are assumed to be normally distributed, with mean 0 and variance \(\sigma\).

Analysis and Results

Data Exploration and Visualization

  • Describe your data sources and collection process.

  • Present initial findings and insights through visualizations.

  • Highlight unexpected patterns or anomalies.

A study was conducted to determine how…

Code
# loading packages 
library(tidyverse)
library(knitr)
library(ggthemes)
library(ggrepel)
library(dslabs)
Code
# Load Data
kable(head(murders))
state abb region population total
Alabama AL South 4779736 135
Alaska AK West 710231 19
Arizona AZ West 6392017 232
Arkansas AR South 2915918 93
California CA West 37253956 1257
Colorado CO West 5029196 65
Code
ggplot1 = murders %>% ggplot(mapping = aes(x=population/10^6, y=total)) 

  ggplot1 + geom_point(aes(col=region), size = 4) +
  geom_text_repel(aes(label=abb)) +
  scale_x_log10() +
  scale_y_log10() +
  geom_smooth(formula = "y~x", method=lm,se = F)+
  xlab("Populations in millions (log10 scale)") + 
  ylab("Total number of murders (log10 scale)") +
  ggtitle("US Gun Murders in 2010") +
  scale_color_discrete(name = "Region")+
      theme_bw()

Modeling and Results

  • Explain your data preprocessing and cleaning steps.

  • Present your key findings in a clear and concise manner.

  • Use visuals to support your claims.

  • Tell a story about what the data reveals.

Conclusion

  • Summarize your key findings.

  • Discuss the implications of your results.

References

Agresti, A. 2015. Foundations of Linear and Generalized Linear Models. Wiley Series in Probability and Statistics. Wiley. https://books.google.com/books?id=jlIqBgAAQBAJ.
Candy, Steven G. 2000. “The Application of Generalized Linear Mixed Models to Multi-Level Sampling for Insect Population Monitoring.” Environmental and Ecological Statistics 7 (3): 217–38.
Lee, Youngjo, and John A Nelder. 1996. “Hierarchical Generalized Linear Models.” Journal of the Royal Statistical Society Series B: Statistical Methodology 58 (4): 619–56.
Owili, Patrick Opiyo, Tang-Huang Lin, Miriam Adoyo Muga, and Wei-Hung Lien. 2020. “Impacts of Discriminated PM2. 5 on Global Under-Five and Maternal Mortality.” Scientific Reports 10 (1): 17654.
Salinas Ruı́z, Josafhat, Osval Antonio Montesinos López, Gabriela Hernández Ramı́rez, and Jose Crossa Hiriart. 2023. Generalized Linear Mixed Models with Applications in Agriculture and Biology. Springer Nature.
Tawiah, Kassim, Samuel Iddi, and Anani Lotsi. 2020. “On Zero-Inflated Hierarchical Poisson Models with Application to Maternal Mortality Data.” International Journal of Mathematics and Mathematical Sciences 2020 (1): 1407320.
Thall, Peter F. 1988. “Mixed Poisson Likelihood Regression Models for Longitudinal Interval Count Data.” Biometrics, 197–209.
Thompson, Jennifer A, Clemence Leyrat, Katherine L Fielding, and Richard J Hayes. 2022. “Cluster Randomised Trials with a Binary Outcome and a Small Number of Clusters: Comparison of Individual and Cluster Level Analysis Method.” BMC Medical Research Methodology 22 (1): 222.
Wang, Ke-Sheng, Xuefeng Liu, Muyiwa Ategbole, Xin Xie, Ying Liu, Chun Xu, Changchun Xie, and Zhanxin Sha. 2017. “Generalized Linear Mixed Model Analysis of Urban-Rural Differences in Social and Behavioral Factors for Colorectal Cancer Screening.” Asian Pacific Journal of Cancer Prevention: APJCP 18 (9): 2581.