Section 5.5 — Hierarchical models¶
This notebook contains the code examples from Section 5.5 Hierarchical models from the No Bullshit Guide to Statistics.
See also:
- 03_hierarchical_model.ipynb
- cs109b_lect13_bayes_2_2021.ipynb
- https://github.com/fonnesbeck/pymc_sdss_2024/blob/main/notebooks/Section4-Hierarchical_Models.ipynb
- https://mc-stan.org/users/documentation/case-studies/radon_cmdstanpy_plotnine.html#data-prep
- https://github.com/mitzimorris/brms_feb_28_2023/blob/main/brms_notebook.Rmd
- https://www.pymc.io/projects/examples/en/latest/generalized_linear_models/multilevel_modeling.html
Notebook setup¶
# load Python modules
import os
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Figures setup
plt.clf() # needed otherwise `sns.set_theme` doesn"t work
from plot_helpers import RCPARAMS
RCPARAMS.update({"figure.figsize": (5, 3)}) # good for screen
# RCPARAMS.update({"figure.figsize": (5, 1.6)}) # good for print
sns.set_theme(
context="paper",
style="whitegrid",
palette="colorblind",
rc=RCPARAMS,
)
# High-resolution please
%config InlineBackend.figure_format = "retina"
# Where to store figures
from ministats.utils import savefigure
DESTDIR = "figures/bayes/hierarchical"
WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
<Figure size 640x480 with 0 Axes>
# set random seed for repeatability
np.random.seed(42)
#######################################################
# silence statsmodels kurtosistest warning when using n < 20
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=UserWarning)
Radon dataset¶
https://bambinos.github.io/bambi/notebooks/radon_example.html
- Description: Contains measurements of radon levels in homes across various counties.
- Source: Featured in Andrew Gelman and Jennifer Hill's book Data Analysis Using Regression and Multilevel/Hierarchical Models.
- Application: Demonstrates partial pooling and varying intercepts/slopes in hierarchical modeling.
Loading the data¶
radon = pd.read_csv("../datasets/radon.csv")
radon.shape
(919, 6)
radon.head()
idnum | state | county | floor | log_radon | log_uranium | |
---|---|---|---|---|---|---|
0 | 5081 | MN | AITKIN | ground | 0.788457 | -0.689048 |
1 | 5082 | MN | AITKIN | basement | 0.788457 | -0.689048 |
2 | 5083 | MN | AITKIN | basement | 1.064711 | -0.689048 |
3 | 5084 | MN | AITKIN | basement | 0.000000 | -0.689048 |
4 | 5085 | MN | ANOKA | basement | 1.131402 | -0.847313 |
len(radon["county"].unique())
85
radon[["log_radon", "log_uranium"]].describe().T.round(2)
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
log_radon | 919.0 | 1.22 | 0.85 | -2.30 | 0.64 | 1.28 | 1.79 | 3.88 |
log_uranium | 919.0 | -0.13 | 0.37 | -0.88 | -0.47 | -0.10 | 0.18 | 0.53 |
#######################################################
from ministats.book.figures import plot_counties
sel_counties = [
"LAC QUI PARLE", "AITKIN", "KOOCHICHING", "DOUGLAS",
"HENNEPIN", "STEARNS", "RAMSEY", "ST LOUIS",
]
plot_counties(radon, counties=sel_counties);
# # FIGURES ONLY
# sel_counties = [
# "LAC QUI PARLE", "AITKIN", "KOOCHICHING", "DOUGLAS",
# "HENNEPIN", "STEARNS", "RAMSEY", "ST LOUIS",
# ]
# fig = plot_counties(radon, counties=sel_counties, figsize=(7,3.2))
# filename = os.path.join(DESTDIR, "sel_counties_scatter_only.pdf")
# savefigure(fig, filename)
Example 1: complete pooling model¶
= common linear regression model for all counties
Bayesian model¶
We can pool all the data and estimate one big regression to asses the influence of the floor variable on radon levels across all counties.
$$ Y = \beta_0 + \beta_x \cdot x. $$
The variable $x$ corresponds to the floor
column,
with $0$ representing basement, and $1$ representing ground floor.
By ignoring the county feature, we do not differenciate on counties.
Bambi model¶
import bambi as bmb
priors1 = {
"Intercept": bmb.Prior("Normal", mu=1, sigma=2),
"floor": bmb.Prior("Normal", mu=0, sigma=5),
"sigma": bmb.Prior("HalfStudentT", nu=4, sigma=1),
}
mod1 = bmb.Model(formula="log_radon ~ 1 + floor",
family="gaussian",
link="identity",
priors=priors1,
data=radon)
mod1
Formula: log_radon ~ 1 + floor Family: gaussian Link: mu = identity Observations: 919 Priors: target = mu Common-level effects Intercept ~ Normal(mu: 1.0, sigma: 2.0) floor ~ Normal(mu: 0.0, sigma: 5.0) Auxiliary parameters sigma ~ HalfStudentT(nu: 4.0, sigma: 1.0)
mod1.build()
mod1.graph()
# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example1_complete_pooling_mod1_graph")
# mod1.graph(name=filename, fmt="png", dpi=300)
Model fitting and analysis¶
idata1 = mod1.fit()
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (4 chains in 4 jobs) NUTS: [sigma, Intercept, floor]
Output()
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.
import arviz as az
az.summary(idata1, kind="stats")
mean | sd | hdi_3% | hdi_97% | |
---|---|---|---|---|
Intercept | 1.326 | 0.029 | 1.270 | 1.380 |
floor[ground] | -0.614 | 0.071 | -0.748 | -0.485 |
sigma | 0.823 | 0.019 | 0.789 | 0.860 |
az.plot_posterior(idata1);
# # FIGURES ONLY
# az.plot_posterior(idata1, round_to=2, figsize=(6,1.8));
# filename = os.path.join(DESTDIR, "example1_posterior.pdf")
# savefigure(plt.gcf(), filename)
fig, axs = bmb.interpret.plot_predictions(mod1, idata1, conditional="floor")
means1 = az.summary(idata1, kind="stats")["mean"]
y0 = means1["Intercept"]
y1 = means1["Intercept"] + means1["floor[ground]"]
sns.lineplot(x=[0,1], y=[y0,y1], ax=axs[0]);
midpoint = [0.5, (y0+y1)/2 + 0.03]
slope = means1["floor[ground]"].round(3)
axs[0].annotate("$\\mu_{B_{f}}=%.3f$" % slope, midpoint);
Default computed for conditional variable: floor
# # FIGURES ONLY
# fig, axs = bmb.interpret.plot_predictions(mod1, idata1, conditional="floor",
# fig_kwargs={"figsize":(3,2)})
# means1 = az.summary(idata1, kind="stats")["mean"]
# y0 = means1["Intercept"]
# y1 = means1["Intercept"] + means1["floor[ground]"]
# sns.lineplot(x=[0,1], y=[y0,y1], ax=axs[0]);
# midpoint = [0.5, (y0+y1)/2 + 0.03]
# slope = means1["floor[ground]"].round(3)
# axs[0].annotate("$\\mu_{B_{f}}=%.3f$" % slope, midpoint);
# filename = os.path.join(DESTDIR, "example1_basement_ground_slope.pdf")
# savefigure(plt.gcf(), filename)
Conclusion¶
not using group membership, so we have lots of bias
Example 2: no pooling model¶
= separate intercept for each county
Bayesian model¶
If we treat different counties as independent, so each one gets an intercept term:
$$ Y_j = \beta_{0j} + \beta_{x} x. $$
Bambi model¶
priors2 = {
"county": bmb.Prior("Normal", mu=0, sigma=10),
"floor": bmb.Prior("Normal", mu=0, sigma=30),
"sigma": bmb.Prior("HalfStudentT", nu=4, sigma=1),
}
mod2 = bmb.Model("log_radon ~ 0 + county + floor",
family="gaussian",
link="identity",
priors=priors2,
data=radon)
mod2
Formula: log_radon ~ 0 + county + floor Family: gaussian Link: mu = identity Observations: 919 Priors: target = mu Common-level effects county ~ Normal(mu: 0.0, sigma: 10.0) floor ~ Normal(mu: 0.0, sigma: 30.0) Auxiliary parameters sigma ~ HalfStudentT(nu: 4.0, sigma: 1.0)
mod2.build()
mod2.graph()
# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example2_no_pooling_mod2_graph")
# mod2.graph(name=filename, fmt="png", dpi=300)
Model fitting and analysis¶
idata2 = mod2.fit()
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (4 chains in 4 jobs) NUTS: [sigma, county, floor]
Output()
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 5 seconds.
idata2sel = idata2.sel(county_dim=sel_counties)
az.summary(idata2sel, kind="stats")
mean | sd | hdi_3% | hdi_97% | |
---|---|---|---|---|
county[LAC QUI PARLE] | 2.952 | 0.536 | 1.968 | 3.946 |
county[AITKIN] | 0.843 | 0.381 | 0.098 | 1.530 |
county[KOOCHICHING] | 0.822 | 0.288 | 0.263 | 1.339 |
county[DOUGLAS] | 1.731 | 0.256 | 1.217 | 2.173 |
county[HENNEPIN] | 1.360 | 0.076 | 1.219 | 1.504 |
county[STEARNS] | 1.493 | 0.146 | 1.216 | 1.760 |
county[RAMSEY] | 1.156 | 0.132 | 0.906 | 1.400 |
county[ST LOUIS] | 0.867 | 0.072 | 0.723 | 0.999 |
floor[ground] | -0.718 | 0.073 | -0.854 | -0.579 |
sigma | 0.757 | 0.019 | 0.722 | 0.793 |
axs = az.plot_forest(idata2sel, combined=True, figsize=(6,2.6))
axs[0].set_xticks(np.arange(-0.5,4.6,0.5))
axs[0].set_title(None);
# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "forest_plot_mod2_sel_counties.pdf")
# savefigure(axs[0], filename)
plot_counties(radon, idata_cp=idata1, idata_np=idata2);
# # FIGURES ONLY
# sel_counties = [
# "LAC QUI PARLE", "AITKIN", "KOOCHICHING", "DOUGLAS",
# "HENNEPIN", "STEARNS", "RAMSEY", "ST LOUIS",
# ]
# fig = plot_counties(radon, idata_cp=idata1, idata_np=idata2,
# counties=sel_counties, figsize=(7,3.2))
# filename = os.path.join(DESTDIR, "sel_counties_complete_and_no_pooling.pdf")
# savefigure(fig, filename)
# fig, axs = bmb.interpret.plot_predictions(mod2, idata2, ["floor", "county"]);
# axs[0].get_legend().remove()
# post2 = idata2["posterior"]
# unpooled_means = post2.mean(dim=("chain", "draw"))
# unpooled_hdi = az.hdi(idata2)
# unpooled_means_iter = unpooled_means.sortby("county")
# unpooled_hdi_iter = unpooled_hdi.sortby(unpooled_means_iter.county)
# _, ax = plt.subplots(figsize=(12, 5))
# unpooled_means_iter.plot.scatter(x="county_dim", y="county", ax=ax, alpha=0.9)
# ax.vlines(
# np.arange(len(radon["county"].unique())),
# unpooled_hdi_iter.county.sel(hdi="lower"),
# unpooled_hdi_iter.county.sel(hdi="higher"),
# color="orange",
# alpha=0.6,
# )
# ax.set(ylabel="Radon estimate", ylim=(-2, 4.5))
# ax.tick_params(rotation=90);
Conclusion¶
treating each group independently, so we have lots of variance
Example 3: hierarchical model¶
= partial pooling model = varying intercepts model
Bayesian hierarchical model¶
\begin{align*} Y_j \;\; &\sim \;\; \mathcal{N}(B_0 + B_j + B_x x, \, \Sigma_Y), \\ B_0 \;\; &\sim \;\; \mathcal{N}(\mu_{B_0},\sigma_{B_0}), \\ B_x \;\; &\sim \;\; \mathcal{N}(\mu_{B_J},\sigma_{B_x}), \\ B_j \;\; &\sim \;\; \mathcal{N}(0,\Sigma_{B_j}), \\ \Sigma_Y \;\; &\sim \;\; \mathcal{T}^+(\nu=4, \sigma=?), \\ \Sigma_{B_j} \;\; &\sim \;\; \mathcal{T}^+(\nu=4, \sigma=?). \end{align*}
The partial pooling formula estimates per-county intercepts which drawn
from the same distribution which is estimated jointly with the rest of
the model parameters. The 1
is the intercept co-efficient. The
estimates across counties will all have the same slope.
log_radon ~ 1 + (1|county_id) + floor
There is a middle ground to both of these extremes. Specifically, we may assume that the intercepts are different for each county as in the unpooled case, but they are drawn from the same distribution. The different counties are effectively sharing information through the common prior.
NOTE: some counties have very few sample; the hierarchical model will provide "shrinkage" for these groups, and use global information learned from all counties
radon["log_radon"].describe()
count 919.000000 mean 1.224623 std 0.853327 min -2.302585 25% 0.641854 50% 1.280934 75% 1.791759 max 3.875359 Name: log_radon, dtype: float64
radon.groupby("floor")["log_radon"].describe()
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
floor | ||||||||
basement | 766.0 | 1.326744 | 0.782709 | -2.302585 | 0.788457 | 1.360977 | 1.883253 | 3.875359 |
ground | 153.0 | 0.713349 | 0.999376 | -2.302585 | 0.095310 | 0.741937 | 1.308333 | 3.234749 |
Bambi model¶
priors3 = {
"Intercept": bmb.Prior("Normal", mu=1, sigma=2),
"floor": bmb.Prior("Normal", mu=0, sigma=5),
"1|county": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Exponential", lam=1)),
# "1|county": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("HalfNormal", sigma=2)),
"sigma": bmb.Prior("HalfStudentT", nu=4, sigma=1),
# "sigma": bmb.Prior("Exponential", lam=1),
# "sigma": bmb.Prior("HalfNormal", sigma=10), # from PyStan tutorial
# "sigma": bmb.Prior("Uniform", lower=0, upper=100), # from PyMC example
}
formula3 = "log_radon ~ 1 + (1|county) + floor"
#######################################################
mod3 = bmb.Model(formula=formula3,
family="gaussian",
link="identity",
priors=priors3,
data=radon,
noncentered=False)
mod3
Formula: log_radon ~ 1 + (1|county) + floor Family: gaussian Link: mu = identity Observations: 919 Priors: target = mu Common-level effects Intercept ~ Normal(mu: 1.0, sigma: 2.0) floor ~ Normal(mu: 0.0, sigma: 5.0) Group-level effects 1|county ~ Normal(mu: 0.0, sigma: Exponential(lam: 1.0)) Auxiliary parameters sigma ~ HalfStudentT(nu: 4.0, sigma: 1.0)
mod3.build()
mod3.graph()
# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example3_partial_pooling_mod3_graph")
# mod3.graph(name=filename, fmt="png", dpi=300)
Model fitting and analysis¶
idata3 = mod3.fit()
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (4 chains in 4 jobs) NUTS: [sigma, Intercept, floor, 1|county_sigma, 1|county]
Output()
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 4 seconds.
The group level parameters
idata3sel = idata3.sel(county__factor_dim=sel_counties)
az.summary(idata3sel, kind="stats")
mean | sd | hdi_3% | hdi_97% | |
---|---|---|---|---|
1|county[LAC QUI PARLE] | 0.415 | 0.297 | -0.141 | 0.974 |
1|county[AITKIN] | -0.274 | 0.255 | -0.754 | 0.194 |
1|county[KOOCHICHING] | -0.372 | 0.220 | -0.798 | 0.025 |
1|county[DOUGLAS] | 0.166 | 0.207 | -0.223 | 0.554 |
1|county[HENNEPIN] | -0.098 | 0.086 | -0.263 | 0.061 |
1|county[STEARNS] | 0.020 | 0.145 | -0.257 | 0.287 |
1|county[RAMSEY] | -0.258 | 0.137 | -0.499 | 0.012 |
1|county[ST LOUIS] | -0.572 | 0.088 | -0.728 | -0.402 |
1|county_sigma | 0.332 | 0.045 | 0.249 | 0.414 |
Intercept | 1.462 | 0.054 | 1.361 | 1.565 |
floor[ground] | -0.692 | 0.071 | -0.821 | -0.556 |
sigma | 0.757 | 0.018 | 0.723 | 0.792 |
The intercept offsets for each county are:
# sum( idata3["posterior"]["1|county"].stack(sample=("chain","draw")).values.mean(axis=1) )
# az.plot_forest(idata3, combined=True, figsize=(7,2),
# var_names=["Intercept", "floor", "1|county_sigma", "sigma"]);
#######################################################
axs = az.plot_forest(idata3sel, var_names=["Intercept", "1|county", "floor", "1|county_sigma", "sigma"], combined=True, figsize=(6,3))
# axs = az.plot_forest(idata3sel, combined=True, figsize=(6,3))
axs[0].set_xticks(np.arange(-0.8,1.6,0.2))
axs[0].set_title(None);
# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "forest_plot_mod3_sel_counties.pdf")
# savefigure(plt.gcf(), filename)
# az.plot_forest(idata3, var_names=["1|county"], combined=True);
Compare models¶
Compare complete pooling, no pooling, and partial pooling models
plot_counties(radon, idata_cp=idata1, idata_np=idata2, idata_pp=idata3);
# # FIGURES ONLY
# fig = plot_counties(radon, idata_cp=idata1, idata_np=idata2, idata_pp=idata3, figsize=(7,3.2))
# filename = os.path.join(DESTDIR, "sel_counties_all_models.pdf")
# savefigure(fig, filename)
Conclusions¶
Explanations¶
Prior selection for hierarchical models¶
?
Varying intercepts and slopes model¶
= Group-specific slopes We can also make beta_x group-specific
The varying-slope, varying intercept model adds floor
to the
group-level co-efficients. Now estimates across counties will all have
varying slope.
log_radon ~ floor + (1 + floor | county_id)
#######################################################
formula4 = "log_radon ~ (1 + floor | county)"
priors4 = {
"Intercept": bmb.Prior("Normal", mu=1, sigma=2),
"1|county": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Exponential", lam=1)),
"floor|county": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Exponential", lam=1)),
"sigma": bmb.Prior("HalfStudentT", nu=4, sigma=1),
}
mod4 = bmb.Model(formula=formula4,
family="gaussian",
link="identity",
priors=priors4,
data=radon,
noncentered=False)
mod4
Formula: log_radon ~ (1 + floor | county) Family: gaussian Link: mu = identity Observations: 919 Priors: target = mu Common-level effects Intercept ~ Normal(mu: 1.0, sigma: 2.0) Group-level effects 1|county ~ Normal(mu: 0.0, sigma: Exponential(lam: 1.0)) floor|county ~ Normal(mu: 0.0, sigma: Exponential(lam: 1.0)) Auxiliary parameters sigma ~ HalfStudentT(nu: 4.0, sigma: 1.0)
mod4.build()
mod4.graph()
# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "varying_int_and_slopes_mod4_graph")
# mod4.graph(name=filename, fmt="png", dpi=300)
idata4 = mod4.fit()
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (4 chains in 4 jobs) NUTS: [sigma, Intercept, 1|county_sigma, 1|county, floor|county_sigma, floor|county]
Output()
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 5 seconds.
az.autocorr(idata4["posterior"]["sigma"].values.flatten())[0:10]
array([ 1. , -0.29695825, 0.09938457, 0.00268883, -0.00236847, -0.02257113, -0.001118 , -0.00457042, 0.00551639, -0.00293763])
az.summary(idata4)
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
1|county[AITKIN] | -0.307 | 0.262 | -0.770 | 0.206 | 0.003 | 0.003 | 6503.0 | 3037.0 | 1.00 |
1|county[ANOKA] | -0.470 | 0.112 | -0.690 | -0.269 | 0.002 | 0.001 | 5375.0 | 3360.0 | 1.00 |
1|county[BECKER] | -0.049 | 0.291 | -0.594 | 0.495 | 0.003 | 0.005 | 7426.0 | 3003.0 | 1.00 |
1|county[BELTRAMI] | 0.035 | 0.250 | -0.448 | 0.486 | 0.003 | 0.004 | 7061.0 | 3354.0 | 1.00 |
1|county[BENTON] | -0.054 | 0.253 | -0.551 | 0.403 | 0.003 | 0.004 | 8452.0 | 3073.0 | 1.00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
floor|county[ground, WINONA] | -1.329 | 0.416 | -2.124 | -0.580 | 0.006 | 0.004 | 4827.0 | 3193.0 | 1.00 |
floor|county[ground, WRIGHT] | -0.347 | 0.533 | -1.337 | 0.635 | 0.006 | 0.007 | 6974.0 | 3217.0 | 1.00 |
floor|county[ground, YELLOW MEDICINE] | 0.012 | 0.721 | -1.381 | 1.315 | 0.008 | 0.013 | 8499.0 | 2808.0 | 1.00 |
floor|county_sigma[ground] | 0.727 | 0.108 | 0.523 | 0.923 | 0.003 | 0.002 | 1109.0 | 2154.0 | 1.01 |
sigma | 0.748 | 0.019 | 0.712 | 0.783 | 0.000 | 0.000 | 6623.0 | 3058.0 | 1.00 |
174 rows × 9 columns
# idata4["posterior"]
idata4sel = idata4.sel(county__factor_dim=sel_counties)
var_names = ["Intercept",
"1|county",
"floor|county",
"1|county_sigma",
"floor|county_sigma",
"sigma"]
axs = az.plot_forest(idata4sel, combined=True, var_names=var_names, figsize=(6,4))
axs[0].set_xticks(np.arange(-1.6,1.6,0.2))
axs[0].set_title(None);
# FIGURES ONLY
# filename = os.path.join(DESTDIR, "forest_plot_mod4_sel_counties.pdf")
# savefigure(plt.gcf(), filename)
Comparing models¶
idata1 = mod1.fit(idata_kwargs={"log_likelihood": True})
idata2 = mod2.fit(idata_kwargs={"log_likelihood": True})
idata3 = mod3.fit(idata_kwargs={"log_likelihood": True})
idata4 = mod4.fit(idata_kwargs={"log_likelihood": True})
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (4 chains in 4 jobs) NUTS: [sigma, Intercept, floor]
Output()
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds. Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (4 chains in 4 jobs) NUTS: [sigma, county, floor]
Output()
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 4 seconds. Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (4 chains in 4 jobs) NUTS: [sigma, Intercept, floor, 1|county_sigma, 1|county]
Output()
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 3 seconds. Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (4 chains in 4 jobs) NUTS: [sigma, Intercept, 1|county_sigma, 1|county, floor|county_sigma, floor|county]
Output()
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 4 seconds.
Compare models based on their expected log pointwise predictive density (ELPD).
The ELPD is estimated either by Pareto smoothed importance sampling leave-one-out cross-validation (LOO) or using the widely applicable information criterion (WAIC). We recommend loo. Read more theory here - in a paper by some of the leading authorities on model comparison dx.doi.org/10.1111/1467-9868.00353
radon_models = {
"mod1 (complete pooling)": idata1,
"mod2 (no pooling)": idata2,
"mod3 (varying intercepts)": idata3,
"mod4 (varying int. and slopes)": idata4,
}
compare_results = az.compare(radon_models)
compare_results
rank | elpd_loo | p_loo | elpd_diff | weight | se | dse | warning | scale | |
---|---|---|---|---|---|---|---|---|---|
mod3 (varying intercepts) | 0 | -1074.381588 | 49.901357 | 0.000000 | 0.717711 | 28.177042 | 0.000000 | False | log |
mod4 (varying int. and slopes) | 1 | -1088.868131 | 88.212135 | 14.486543 | 0.204327 | 30.029017 | 7.289633 | True | log |
mod2 (no pooling) | 2 | -1099.123664 | 86.173617 | 24.742076 | 0.000000 | 28.134669 | 6.330524 | True | log |
mod1 (complete pooling) | 3 | -1126.880651 | 3.742634 | 52.499063 | 0.077962 | 25.562509 | 10.647410 | False | log |
az.plot_compare(compare_results);
# # FIGURES ONLY
# ax = az.plot_compare(compare_results, figsize=(9,3))
# ax.set_title(None)
# filename = os.path.join(DESTDIR, "model_comparison_mod1_mod2_mod3_mod4.pdf")
# savefigure(ax, filename)
ELPD and elpd_loo¶
The ELPD is the theoretical expected log pointwise predictive density for a new dataset (Eq 1 in VGG2017), which can be estimated, e.g., using cross-validation. elpd_loo is the Bayesian LOO estimate of the expected log pointwise predictive density (Eq 4 in VGG2017) and is a sum of N individual pointwise log predictive densities. Probability densities can be smaller or larger than 1, and thus log predictive densities can be negative or positive. For simplicity the ELPD acronym is used also for expected log pointwise predictive probabilities for discrete models. Probabilities are always equal or less than 1, and thus log predictive probabilities are 0 or negative.
Frequentist multilevel models¶
We can use statsmodels
to fit multilevel models too.
Varying intercepts model using statsmodels
¶
import statsmodels.formula.api as smf
sm3 = smf.mixedlm("log_radon ~ 0 + floor", # Fixed effects (no intercept and floor as a fixed effect)
groups="county", # Grouping variable for random effects
re_formula="1", # Random effects = intercept
data=radon)
res3 = sm3.fit()
res3.summary().tables[1]
Coef. | Std.Err. | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
floor[basement] | 1.462 | 0.052 | 28.164 | 0.000 | 1.360 | 1.563 |
floor[ground] | 0.769 | 0.074 | 10.339 | 0.000 | 0.623 | 0.914 |
county Var | 0.108 | 0.041 |
# slope
res3.params["floor[ground]"] - res3.params["floor[basement]"]
-0.6929937406558043
# sigma-hat
np.sqrt(res3.scale)
0.7555891484188184
# standard deviation of the variability among county-specific Intercepts
np.sqrt(res3.cov_re.values)
array([[0.32822242]])
# intercept for first country in res3
res3.random_effects["AITKIN"].values
array([-0.27009756])
This is very close to the mean of the random effect coefficient for AITKIN
in the Bayesian model mod3
which was $-0.268$.
Varying intercepts and slopes model using statsmodels
¶
sm4 = smf.mixedlm("log_radon ~ 0 + floor", # Fixed effects (no intercept and floor as a fixed effect)
groups="county", # Grouping variable for random effects
re_formula="1 + floor", # Random effects: 1 for intercept, floor for slope
data=radon)
res4 = sm4.fit()
res4.summary().tables[1]
Coef. | Std.Err. | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
floor[basement] | 1.463 | 0.054 | 26.977 | 0.000 | 1.356 | 1.569 |
floor[ground] | 0.782 | 0.085 | 9.208 | 0.000 | 0.615 | 0.948 |
county Var | 0.122 | 0.049 | ||||
county x floor[T.ground] Cov | -0.040 | 0.057 | ||||
floor[T.ground] Var | 0.118 | 0.120 |
# slope estimate
res4.params["floor[ground]"] - res4.params["floor[basement]"]
-0.6810977572101944
# sigma-hat
np.sqrt(res4.scale)
0.7461559982563548
# standard deviation of the variability among county-specific Intercepts
county_var_int = res4.cov_re.loc["county", "county"]
np.sqrt(county_var_int)
0.3486722110725776
# standard deviation of the variability among county-specific slopes
county_var_slopes = res4.cov_re.loc["floor[T.ground]", "floor[T.ground]"]
np.sqrt(county_var_slopes)
0.3435539748320278
# correlation between Intercept and slope group-level coefficients
county_floor_cov = res4.cov_re.loc["county", "floor[T.ground]"]
county_floor_cov / (np.sqrt(county_var_int)*np.sqrt(county_var_slopes))
-0.3372303607779023
Discussion¶
Alternative notations for hierarchical models¶
- IMPORT FROM Gelman & Hill Section 12.5 printout
- watch the subscripts!
Computational challenges associated with hierarchical models¶
- centred vs. noncentred representations
Benefits of multilevel models¶
- TODO LIST
- Better than repeated measures ANOVA because:
- tells you the direction and magnitude of effect
- can handle more multiple grouping scenarios (e.g. by-item, and by-student)
- works for categorical predictors
Applications¶
- Need for hierarchical models often occurs in social sciences (better than ANOVA)
- Hierarchical models are often used for Bayesian meta-analysis
Exercises¶
Exercise: mod1u¶
Same model as Example 3 but also include the predictor log_uranium
.
import bambi as bmb
covariate_priors = {
"floor": bmb.Prior("Normal", mu=0, sigma=10),
"log_uranium": bmb.Prior("Normal", mu=0, sigma=10),
"1|county": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Exponential", lam=1)),
"sigma": bmb.Prior("Exponential", lam=1),
}
mod1u = bmb.Model(formula="log_radon ~ 1 + floor + (1|county) + log_uranium",
priors=covariate_priors,
data=radon)
mod1u
Formula: log_radon ~ 1 + floor + (1|county) + log_uranium Family: gaussian Link: mu = identity Observations: 919 Priors: target = mu Common-level effects Intercept ~ Normal(mu: 1.2246, sigma: 2.1322) floor ~ Normal(mu: 0.0, sigma: 10.0) log_uranium ~ Normal(mu: 0.0, sigma: 10.0) Group-level effects 1|county ~ Normal(mu: 0.0, sigma: Exponential(lam: 1.0)) Auxiliary parameters sigma ~ Exponential(lam: 1.0)
Exercise: pigs dataset¶
https://bambinos.github.io/bambi/notebooks/multi-level_regression.html
import statsmodels.api as sm
dietox = sm.datasets.get_rdataset("dietox", "geepack").data
dietox
Pig | Evit | Cu | Litter | Start | Weight | Feed | Time | |
---|---|---|---|---|---|---|---|---|
0 | 4601 | Evit000 | Cu000 | 1 | 26.5 | 26.50000 | NaN | 1 |
1 | 4601 | Evit000 | Cu000 | 1 | 26.5 | 27.59999 | 5.200005 | 2 |
2 | 4601 | Evit000 | Cu000 | 1 | 26.5 | 36.50000 | 17.600000 | 3 |
3 | 4601 | Evit000 | Cu000 | 1 | 26.5 | 40.29999 | 28.500000 | 4 |
4 | 4601 | Evit000 | Cu000 | 1 | 26.5 | 49.09998 | 45.200001 | 5 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
856 | 8442 | Evit000 | Cu175 | 24 | 25.7 | 73.19995 | 83.800003 | 8 |
857 | 8442 | Evit000 | Cu175 | 24 | 25.7 | 81.69995 | 99.800003 | 9 |
858 | 8442 | Evit000 | Cu175 | 24 | 25.7 | 90.29999 | 115.200001 | 10 |
859 | 8442 | Evit000 | Cu175 | 24 | 25.7 | 96.00000 | 133.200001 | 11 |
860 | 8442 | Evit000 | Cu175 | 24 | 25.7 | 103.50000 | 151.400002 | 12 |
861 rows × 8 columns
pigsmodel = bmb.Model("Weight ~ Time + (Time|Pig)", dietox)
pigsidata = pigsmodel.fit()
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (4 chains in 4 jobs) NUTS: [sigma, Intercept, Time, 1|Pig_sigma, 1|Pig_offset, Time|Pig_sigma, Time|Pig_offset]
Output()
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 8 seconds.
az.summary(pigsidata, var_names=["Intercept", "Time", "1|Pig_sigma", "Time|Pig_sigma", "sigma"])
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
Intercept | 15.728 | 0.567 | 14.697 | 16.821 | 0.021 | 0.015 | 720.0 | 1307.0 | 1.00 |
Time | 6.943 | 0.083 | 6.779 | 7.085 | 0.003 | 0.002 | 613.0 | 1150.0 | 1.01 |
1|Pig_sigma | 4.547 | 0.435 | 3.755 | 5.344 | 0.012 | 0.009 | 1237.0 | 2082.0 | 1.00 |
Time|Pig_sigma | 0.663 | 0.061 | 0.552 | 0.776 | 0.002 | 0.001 | 1025.0 | 1834.0 | 1.00 |
sigma | 2.458 | 0.064 | 2.332 | 2.571 | 0.001 | 0.001 | 5082.0 | 3282.0 | 1.00 |
Exercise: educational data¶
cf. https://mc-stan.org/users/documentation/case-studies/tutorial_rstanarm.html
1.1 Data example We will be analyzing the Gcsemv dataset (Rasbash et al. 2000) from the mlmRev package in R. The data include the General Certificate of Secondary Education (GCSE) exam scores of 1,905 students from 73 schools in England on a science subject. The Gcsemv dataset consists of the following 5 variables:
- school: school identifier
- student: student identifier
- gender: gender of a student (M: Male, F: Female)
- written: total score on written paper
- course: total score on coursework paper
import pyreadr
# Gcsemv_r = pyreadr.read_r('/Users/ivan/Downloads/mlmRev/data/Gcsemv.rda')
# Gcsemv_r["Gcsemv"].dropna().to_csv("../datasets/gcsemv.csv", index=False)
gcsemv = pd.read_csv("../datasets/gcsemv.csv")
gcsemv.head()
school | student | gender | written | course | |
---|---|---|---|---|---|
0 | 20920 | 27 | F | 39.0 | 76.8 |
1 | 20920 | 31 | F | 36.0 | 87.9 |
2 | 20920 | 42 | M | 16.0 | 44.4 |
3 | 20920 | 101 | F | 49.0 | 89.8 |
4 | 20920 | 113 | M | 25.0 | 17.5 |
import bambi as bmb
m1 = bmb.Model(formula="course ~ 1 + (1 | school)", data=gcsemv)
m1
Formula: course ~ 1 + (1 | school) Family: gaussian Link: mu = identity Observations: 1523 Priors: target = mu Common-level effects Intercept ~ Normal(mu: 73.3814, sigma: 41.0781) Group-level effects 1|school ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 41.0781)) Auxiliary parameters sigma ~ HalfStudentT(nu: 4.0, sigma: 16.4312)
idata_m1 = m1.fit()
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (4 chains in 4 jobs) NUTS: [sigma, Intercept, 1|school_sigma, 1|school_offset]
Output()
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 3 seconds.
az.summary(idata_m1)
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
1|school[20920] | -6.920 | 5.088 | -16.830 | 2.427 | 0.067 | 0.059 | 5673.0 | 3212.0 | 1.00 |
1|school[22520] | -15.667 | 2.147 | -19.559 | -11.496 | 0.045 | 0.032 | 2267.0 | 2796.0 | 1.00 |
1|school[22710] | 6.356 | 3.648 | -0.614 | 13.049 | 0.053 | 0.042 | 4791.0 | 3064.0 | 1.00 |
1|school[22738] | -0.821 | 4.430 | -9.283 | 7.546 | 0.061 | 0.081 | 5328.0 | 2723.0 | 1.00 |
1|school[22908] | -0.756 | 5.915 | -11.248 | 10.665 | 0.077 | 0.096 | 5847.0 | 2911.0 | 1.00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1|school[84707] | 3.857 | 7.661 | -10.353 | 17.952 | 0.109 | 0.118 | 4942.0 | 2715.0 | 1.00 |
1|school[84772] | 8.591 | 3.623 | 1.976 | 15.587 | 0.056 | 0.043 | 4238.0 | 2806.0 | 1.00 |
1|school_sigma | 8.750 | 0.908 | 7.075 | 10.443 | 0.031 | 0.022 | 888.0 | 1132.0 | 1.01 |
Intercept | 73.819 | 1.134 | 71.736 | 76.002 | 0.039 | 0.028 | 834.0 | 1452.0 | 1.01 |
sigma | 13.985 | 0.262 | 13.499 | 14.444 | 0.004 | 0.003 | 5221.0 | 2428.0 | 1.00 |
76 rows × 9 columns
m3 = bmb.Model(formula="course ~ gender + (1 + gender|school)", data=gcsemv)
m3
Formula: course ~ gender + (1 + gender|school) Family: gaussian Link: mu = identity Observations: 1523 Priors: target = mu Common-level effects Intercept ~ Normal(mu: 73.3814, sigma: 53.4663) gender ~ Normal(mu: 0.0, sigma: 83.5292) Group-level effects 1|school ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 53.4663)) gender|school ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 83.5292)) Auxiliary parameters sigma ~ HalfStudentT(nu: 4.0, sigma: 16.4312)
idata_m3 = m3.fit()
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (4 chains in 4 jobs) NUTS: [sigma, Intercept, gender, 1|school_sigma, 1|school_offset, gender|school_sigma, gender|school_offset]
Output()
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 6 seconds.
Exercise: sleepstudy dataset¶
- Description: Contains reaction times of subjects under sleep deprivation conditions.
- Source: Featured in the R package lme4.
- Application: Demonstrates linear mixed-effects modeling with random slopes and intercepts.
Links:
sleepstudy = bmb.load_data("sleepstudy")
sleepstudy
Reaction | Days | Subject | |
---|---|---|---|
0 | 249.5600 | 0 | 308 |
1 | 258.7047 | 1 | 308 |
2 | 250.8006 | 2 | 308 |
3 | 321.4398 | 3 | 308 |
4 | 356.8519 | 4 | 308 |
... | ... | ... | ... |
175 | 329.6076 | 5 | 372 |
176 | 334.4818 | 6 | 372 |
177 | 343.2199 | 7 | 372 |
178 | 369.1417 | 8 | 372 |
179 | 364.1236 | 9 | 372 |
180 rows × 3 columns
Links¶
EXTRA MATERIAL¶