Section 5.5 — Hierarchical models¶

This notebook contains the code examples from Section 5.5 Hierarchical models from the No Bullshit Guide to Statistics.

Notebook setup¶

In [1]:

Copied!





# load Python modules
import os
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# load Python modules
import os
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:

Copied!





# Figures setup
plt.clf()  # needed otherwise `sns.set_theme` doesn"t work
from plot_helpers import RCPARAMS
RCPARAMS.update({"figure.figsize": (5, 3)})   # good for screen
# RCPARAMS.update({"figure.figsize": (5, 1.6)})  # good for print
sns.set_theme(
    context="paper",
    style="whitegrid",
    palette="colorblind",
    rc=RCPARAMS,
)

# High-resolution please
%config InlineBackend.figure_format = "retina"

# Where to store figures
from ministats.utils import savefigure
DESTDIR = "figures/bayes/hierarchical"
# Figures setup
plt.clf()  # needed otherwise `sns.set_theme` doesn"t work
from plot_helpers import RCPARAMS
RCPARAMS.update({"figure.figsize": (5, 3)})   # good for screen
# RCPARAMS.update({"figure.figsize": (5, 1.6)})  # good for print
sns.set_theme(
    context="paper",
    style="whitegrid",
    palette="colorblind",
    rc=RCPARAMS,
)

# High-resolution please
%config InlineBackend.figure_format = "retina"

# Where to store figures
from ministats.utils import savefigure
DESTDIR = "figures/bayes/hierarchical"

WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.

<Figure size 640x480 with 0 Axes>

In [3]:

Copied!

# set random seed for repeatability
np.random.seed(42)
#######################################################
# set random seed for repeatability
np.random.seed(42)
#######################################################

In [4]:

Copied!





# silence statsmodels kurtosistest warning when using n < 20
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=UserWarning)
# silence statsmodels kurtosistest warning when using n < 20
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=UserWarning)

Definitions¶

Hierarchical (multilevel) models¶

Model formula¶

In [ ]:

Radon dataset¶

https://bambinos.github.io/bambi/notebooks/radon_example.html

Description: Contains measurements of radon levels in homes across various counties.
Source: Featured in Andrew Gelman and Jennifer Hill's book Data Analysis Using Regression and Multilevel/Hierarchical Models.
Application: Demonstrates partial pooling and varying intercepts/slopes in hierarchical modeling.

Loading the data¶

In [5]:

Copied!

radon = pd.read_csv("../datasets/radon.csv")
radon.shape
radon = pd.read_csv("../datasets/radon.csv")
radon.shape

Out[5]:

(919, 6)

In [6]:

Copied!

radon.head()
radon.head()

Out[6]:

	idnum	state	county	floor	log_radon	log_uranium
0	5081	MN	AITKIN	ground	0.788457	-0.689048
1	5082	MN	AITKIN	basement	0.788457	-0.689048
2	5083	MN	AITKIN	basement	1.064711	-0.689048
3	5084	MN	AITKIN	basement	0.000000	-0.689048
4	5085	MN	ANOKA	basement	1.131402	-0.847313

In [7]:

Copied!

len(radon["county"].unique())
len(radon["county"].unique())

Out[7]:

In [8]:

Copied!

radon[["log_radon", "log_uranium"]].describe().T.round(2)
radon[["log_radon", "log_uranium"]].describe().T.round(2)

Out[8]:

	count	mean	std	min	25%	50%	75%	max
log_radon	919.0	1.22	0.85	-2.30	0.64	1.28	1.79	3.88
log_uranium	919.0	-0.13	0.37	-0.88	-0.47	-0.10	0.18	0.53

In [9]:

Copied!





#######################################################
from ministats.book.figures import plot_counties
sel_counties = [
  "LAC QUI PARLE", "AITKIN", "KOOCHICHING", "DOUGLAS",
  "HENNEPIN", "STEARNS", "RAMSEY", "ST LOUIS",
]
plot_counties(radon, counties=sel_counties);
#######################################################
from ministats.book.figures import plot_counties
sel_counties = [
  "LAC QUI PARLE", "AITKIN", "KOOCHICHING", "DOUGLAS",
  "HENNEPIN", "STEARNS", "RAMSEY", "ST LOUIS",
]
plot_counties(radon, counties=sel_counties);

No description has been provided for this image

In [10]:

Copied!





# # FIGURES ONLY
# sel_counties = [
#   "LAC QUI PARLE", "AITKIN", "KOOCHICHING", "DOUGLAS",
#   "HENNEPIN", "STEARNS", "RAMSEY", "ST LOUIS",
# ]
# fig = plot_counties(radon, counties=sel_counties, figsize=(7,3.2))
# filename = os.path.join(DESTDIR, "sel_counties_scatter_only.pdf")
# savefigure(fig, filename)
# # FIGURES ONLY
# sel_counties = [
#   "LAC QUI PARLE", "AITKIN", "KOOCHICHING", "DOUGLAS",
#   "HENNEPIN", "STEARNS", "RAMSEY", "ST LOUIS",
# ]
# fig = plot_counties(radon, counties=sel_counties, figsize=(7,3.2))
# filename = os.path.join(DESTDIR, "sel_counties_scatter_only.pdf")
# savefigure(fig, filename)

Example 1: complete pooling model¶

= common linear regression model for all counties

Bayesian model¶

We can pool all the data and estimate one big regression to asses the influence of the floor variable on radon levels across all counties.

$$ Y = \beta_0 + \beta_x \cdot x. $$

The variable $x$ corresponds to the floor column, with $0$ representing basement, and $1$ representing ground floor.

By ignoring the county feature, we do not differenciate on counties.

Bambi model¶

In [11]:

Copied!





import bambi as bmb

priors1 = {
    "Intercept": bmb.Prior("Normal", mu=1, sigma=2),
    "floor": bmb.Prior("Normal", mu=0, sigma=5),
    "sigma": bmb.Prior("HalfStudentT", nu=4, sigma=1),
}

mod1 = bmb.Model(formula="log_radon ~ 1 + floor",
                 family="gaussian",
                 link="identity",
                 priors=priors1,
                 data=radon)
mod1
import bambi as bmb

priors1 = {
    "Intercept": bmb.Prior("Normal", mu=1, sigma=2),
    "floor": bmb.Prior("Normal", mu=0, sigma=5),
    "sigma": bmb.Prior("HalfStudentT", nu=4, sigma=1),
}

mod1 = bmb.Model(formula="log_radon ~ 1 + floor",
                 family="gaussian",
                 link="identity",
                 priors=priors1,
                 data=radon)
mod1

Out[11]:

       Formula: log_radon ~ 1 + floor
        Family: gaussian
          Link: mu = identity
  Observations: 919
        Priors: 
    target = mu
        Common-level effects
            Intercept ~ Normal(mu: 1.0, sigma: 2.0)
            floor ~ Normal(mu: 0.0, sigma: 5.0)
        
        Auxiliary parameters
            sigma ~ HalfStudentT(nu: 4.0, sigma: 1.0)

In [12]:

Copied!





mod1.build()
mod1.graph()

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example1_complete_pooling_mod1_graph")
# mod1.graph(name=filename, fmt="png", dpi=300)
mod1.build()
mod1.graph()

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example1_complete_pooling_mod1_graph")
# mod1.graph(name=filename, fmt="png", dpi=300)

Out[12]:

Model fitting and analysis¶

In [13]:

Copied!

idata1 = mod1.fit()
idata1 = mod1.fit()

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, Intercept, floor]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.

In [14]:

Copied!

import arviz as az

az.summary(idata1, kind="stats")
import arviz as az

az.summary(idata1, kind="stats")

Out[14]:

	mean	sd	hdi_3%	hdi_97%
Intercept	1.326	0.029	1.270	1.380
floor[ground]	-0.614	0.071	-0.748	-0.485
sigma	0.823	0.019	0.789	0.860

In [15]:

Copied!

az.plot_posterior(idata1);
az.plot_posterior(idata1);

In [16]:

Copied!





# # FIGURES ONLY
# az.plot_posterior(idata1, round_to=2, figsize=(6,1.8));
# filename = os.path.join(DESTDIR, "example1_posterior.pdf")
# savefigure(plt.gcf(), filename)
# # FIGURES ONLY
# az.plot_posterior(idata1, round_to=2, figsize=(6,1.8));
# filename = os.path.join(DESTDIR, "example1_posterior.pdf")
# savefigure(plt.gcf(), filename)

In [17]:

Copied!





fig, axs = bmb.interpret.plot_predictions(mod1, idata1, conditional="floor")

means1 = az.summary(idata1, kind="stats")["mean"]
y0 = means1["Intercept"]
y1 = means1["Intercept"] + means1["floor[ground]"]
sns.lineplot(x=[0,1], y=[y0,y1], ax=axs[0]);

midpoint = [0.5, (y0+y1)/2 + 0.03]
slope = means1["floor[ground]"].round(3)
axs[0].annotate("$\\mu_{B_{f}}=%.3f$" % slope, midpoint);
fig, axs = bmb.interpret.plot_predictions(mod1, idata1, conditional="floor")

means1 = az.summary(idata1, kind="stats")["mean"]
y0 = means1["Intercept"]
y1 = means1["Intercept"] + means1["floor[ground]"]
sns.lineplot(x=[0,1], y=[y0,y1], ax=axs[0]);

midpoint = [0.5, (y0+y1)/2 + 0.03]
slope = means1["floor[ground]"].round(3)
axs[0].annotate("$\\mu_{B_{f}}=%.3f$" % slope, midpoint);

Default computed for conditional variable: floor

In [18]:

Copied!





# # FIGURES ONLY
# fig, axs = bmb.interpret.plot_predictions(mod1, idata1, conditional="floor",
#                                           fig_kwargs={"figsize":(3,2)})
# means1 = az.summary(idata1, kind="stats")["mean"]
# y0 = means1["Intercept"]
# y1 = means1["Intercept"] + means1["floor[ground]"]
# sns.lineplot(x=[0,1], y=[y0,y1], ax=axs[0]);
# midpoint = [0.5, (y0+y1)/2 + 0.03]
# slope = means1["floor[ground]"].round(3)
# axs[0].annotate("$\\mu_{B_{f}}=%.3f$" % slope, midpoint);
# filename = os.path.join(DESTDIR, "example1_basement_ground_slope.pdf")
# savefigure(plt.gcf(), filename)
# # FIGURES ONLY
# fig, axs = bmb.interpret.plot_predictions(mod1, idata1, conditional="floor",
#                                           fig_kwargs={"figsize":(3,2)})
# means1 = az.summary(idata1, kind="stats")["mean"]
# y0 = means1["Intercept"]
# y1 = means1["Intercept"] + means1["floor[ground]"]
# sns.lineplot(x=[0,1], y=[y0,y1], ax=axs[0]);
# midpoint = [0.5, (y0+y1)/2 + 0.03]
# slope = means1["floor[ground]"].round(3)
# axs[0].annotate("$\\mu_{B_{f}}=%.3f$" % slope, midpoint);
# filename = os.path.join(DESTDIR, "example1_basement_ground_slope.pdf")
# savefigure(plt.gcf(), filename)

Conclusion¶

not using group membership, so we have lots of bias

Example 2: no pooling model¶

= separate intercept for each county

Bayesian model¶

If we treat different counties as independent, so each one gets an intercept term:

$$ Y_j = \beta_{0j} + \beta_{x} x. $$

Bambi model¶

In [19]:

Copied!





priors2 = {
    "county": bmb.Prior("Normal", mu=0, sigma=10),
    "floor": bmb.Prior("Normal", mu=0, sigma=30),
    "sigma": bmb.Prior("HalfStudentT", nu=4, sigma=1),
}

mod2 = bmb.Model("log_radon ~ 0 + county + floor",
                 family="gaussian",
                 link="identity",
                 priors=priors2,
                 data=radon)
mod2
priors2 = {
    "county": bmb.Prior("Normal", mu=0, sigma=10),
    "floor": bmb.Prior("Normal", mu=0, sigma=30),
    "sigma": bmb.Prior("HalfStudentT", nu=4, sigma=1),
}

mod2 = bmb.Model("log_radon ~ 0 + county + floor",
                 family="gaussian",
                 link="identity",
                 priors=priors2,
                 data=radon)
mod2

Out[19]:

       Formula: log_radon ~ 0 + county + floor
        Family: gaussian
          Link: mu = identity
  Observations: 919
        Priors: 
    target = mu
        Common-level effects
            county ~ Normal(mu: 0.0, sigma: 10.0)
            floor ~ Normal(mu: 0.0, sigma: 30.0)
        
        Auxiliary parameters
            sigma ~ HalfStudentT(nu: 4.0, sigma: 1.0)

In [20]:

Copied!





mod2.build()
mod2.graph()

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example2_no_pooling_mod2_graph")
# mod2.graph(name=filename, fmt="png", dpi=300)
mod2.build()
mod2.graph()

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example2_no_pooling_mod2_graph")
# mod2.graph(name=filename, fmt="png", dpi=300)

Out[20]:

Model fitting and analysis¶

In [21]:

Copied!

idata2 = mod2.fit()
idata2 = mod2.fit()

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, county, floor]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 5 seconds.

In [22]:

Copied!

idata2sel = idata2.sel(county_dim=sel_counties)
az.summary(idata2sel, kind="stats")
idata2sel = idata2.sel(county_dim=sel_counties)
az.summary(idata2sel, kind="stats")

Out[22]:

	mean	sd	hdi_3%	hdi_97%
county[LAC QUI PARLE]	2.952	0.536	1.968	3.946
county[AITKIN]	0.843	0.381	0.098	1.530
county[KOOCHICHING]	0.822	0.288	0.263	1.339
county[DOUGLAS]	1.731	0.256	1.217	2.173
county[HENNEPIN]	1.360	0.076	1.219	1.504
county[STEARNS]	1.493	0.146	1.216	1.760
county[RAMSEY]	1.156	0.132	0.906	1.400
county[ST LOUIS]	0.867	0.072	0.723	0.999
floor[ground]	-0.718	0.073	-0.854	-0.579
sigma	0.757	0.019	0.722	0.793

In [23]:

Copied!

axs = az.plot_forest(idata2sel, combined=True, figsize=(6,2.6))
axs[0].set_xticks(np.arange(-0.5,4.6,0.5))
axs[0].set_title(None);
axs = az.plot_forest(idata2sel, combined=True, figsize=(6,2.6))
axs[0].set_xticks(np.arange(-0.5,4.6,0.5))
axs[0].set_title(None);

In [24]:

Copied!

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "forest_plot_mod2_sel_counties.pdf")
# savefigure(axs[0], filename)
# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "forest_plot_mod2_sel_counties.pdf")
# savefigure(axs[0], filename)

In [25]:

Copied!

plot_counties(radon, idata_cp=idata1, idata_np=idata2);
plot_counties(radon, idata_cp=idata1, idata_np=idata2);

In [26]:

Copied!





# # FIGURES ONLY
# sel_counties = [
#   "LAC QUI PARLE", "AITKIN", "KOOCHICHING", "DOUGLAS",
#   "HENNEPIN", "STEARNS", "RAMSEY", "ST LOUIS",
# ]
# fig = plot_counties(radon, idata_cp=idata1, idata_np=idata2,
#                     counties=sel_counties, figsize=(7,3.2))
# filename = os.path.join(DESTDIR, "sel_counties_complete_and_no_pooling.pdf")
# savefigure(fig, filename)
# # FIGURES ONLY
# sel_counties = [
#   "LAC QUI PARLE", "AITKIN", "KOOCHICHING", "DOUGLAS",
#   "HENNEPIN", "STEARNS", "RAMSEY", "ST LOUIS",
# ]
# fig = plot_counties(radon, idata_cp=idata1, idata_np=idata2,
#                     counties=sel_counties, figsize=(7,3.2))
# filename = os.path.join(DESTDIR, "sel_counties_complete_and_no_pooling.pdf")
# savefigure(fig, filename)

In [27]:

Copied!

# fig, axs = bmb.interpret.plot_predictions(mod2, idata2, ["floor", "county"]);
# axs[0].get_legend().remove()
# fig, axs = bmb.interpret.plot_predictions(mod2, idata2, ["floor", "county"]);
# axs[0].get_legend().remove()

In [28]:

Copied!





# post2 = idata2["posterior"]
# unpooled_means = post2.mean(dim=("chain", "draw"))
# unpooled_hdi = az.hdi(idata2)

# unpooled_means_iter = unpooled_means.sortby("county")
# unpooled_hdi_iter = unpooled_hdi.sortby(unpooled_means_iter.county)

# _, ax = plt.subplots(figsize=(12, 5))
# unpooled_means_iter.plot.scatter(x="county_dim", y="county", ax=ax, alpha=0.9)
# ax.vlines(
#     np.arange(len(radon["county"].unique())),
#     unpooled_hdi_iter.county.sel(hdi="lower"),
#     unpooled_hdi_iter.county.sel(hdi="higher"),
#     color="orange",
#     alpha=0.6,
# )
# ax.set(ylabel="Radon estimate", ylim=(-2, 4.5))
# ax.tick_params(rotation=90);
# post2 = idata2["posterior"]
# unpooled_means = post2.mean(dim=("chain", "draw"))
# unpooled_hdi = az.hdi(idata2)

# unpooled_means_iter = unpooled_means.sortby("county")
# unpooled_hdi_iter = unpooled_hdi.sortby(unpooled_means_iter.county)

# _, ax = plt.subplots(figsize=(12, 5))
# unpooled_means_iter.plot.scatter(x="county_dim", y="county", ax=ax, alpha=0.9)
# ax.vlines(
#     np.arange(len(radon["county"].unique())),
#     unpooled_hdi_iter.county.sel(hdi="lower"),
#     unpooled_hdi_iter.county.sel(hdi="higher"),
#     color="orange",
#     alpha=0.6,
# )
# ax.set(ylabel="Radon estimate", ylim=(-2, 4.5))
# ax.tick_params(rotation=90);

Conclusion¶

treating each group independently, so we have lots of variance

Example 3: hierarchical model¶

= partial pooling model = varying intercepts model

Bayesian hierarchical model¶

\begin{align*} Y_j \;\; &\sim \;\; \mathcal{N}(B_0 + B_j + B_x x, \, \Sigma_Y), \\ B_0 \;\; &\sim \;\; \mathcal{N}(\mu_{B_0},\sigma_{B_0}), \\ B_x \;\; &\sim \;\; \mathcal{N}(\mu_{B_J},\sigma_{B_x}), \\ B_j \;\; &\sim \;\; \mathcal{N}(0,\Sigma_{B_j}), \\ \Sigma_Y \;\; &\sim \;\; \mathcal{T}^+(\nu=4, \sigma=?), \\ \Sigma_{B_j} \;\; &\sim \;\; \mathcal{T}^+(\nu=4, \sigma=?). \end{align*}

The partial pooling formula estimates per-county intercepts which drawn from the same distribution which is estimated jointly with the rest of the model parameters. The 1 is the intercept co-efficient. The estimates across counties will all have the same slope.

log_radon ~ 1 + (1|county_id) + floor

There is a middle ground to both of these extremes. Specifically, we may assume that the intercepts are different for each county as in the unpooled case, but they are drawn from the same distribution. The different counties are effectively sharing information through the common prior.

NOTE: some counties have very few sample; the hierarchical model will provide "shrinkage" for these groups, and use global information learned from all counties

In [29]:

Copied!

radon["log_radon"].describe()
radon["log_radon"].describe()

Out[29]:

count    919.000000
mean       1.224623
std        0.853327
min       -2.302585
25%        0.641854
50%        1.280934
75%        1.791759
max        3.875359
Name: log_radon, dtype: float64

In [30]:

Copied!

radon.groupby("floor")["log_radon"].describe()
radon.groupby("floor")["log_radon"].describe()

Out[30]:

	count	mean	std	min	25%	50%	75%	max
floor
basement	766.0	1.326744	0.782709	-2.302585	0.788457	1.360977	1.883253	3.875359
ground	153.0	0.713349	0.999376	-2.302585	0.095310	0.741937	1.308333	3.234749

Bambi model¶

In [31]:

Copied!





priors3 = {
    "Intercept": bmb.Prior("Normal", mu=1, sigma=2),
    "floor": bmb.Prior("Normal", mu=0, sigma=5),
    "1|county": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Exponential", lam=1)),
    # "1|county": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("HalfNormal", sigma=2)),
    "sigma": bmb.Prior("HalfStudentT", nu=4, sigma=1),
    # "sigma": bmb.Prior("Exponential", lam=1),
    # "sigma": bmb.Prior("HalfNormal", sigma=10), # from PyStan tutorial
    # "sigma": bmb.Prior("Uniform", lower=0, upper=100), # from PyMC example
}

formula3 = "log_radon ~ 1 + (1|county) + floor"
#######################################################
mod3 = bmb.Model(formula=formula3,
                 family="gaussian",
                 link="identity",
                 priors=priors3,
                 data=radon,
                 noncentered=False)

mod3
priors3 = {
    "Intercept": bmb.Prior("Normal", mu=1, sigma=2),
    "floor": bmb.Prior("Normal", mu=0, sigma=5),
    "1|county": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Exponential", lam=1)),
    # "1|county": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("HalfNormal", sigma=2)),
    "sigma": bmb.Prior("HalfStudentT", nu=4, sigma=1),
    # "sigma": bmb.Prior("Exponential", lam=1),
    # "sigma": bmb.Prior("HalfNormal", sigma=10), # from PyStan tutorial
    # "sigma": bmb.Prior("Uniform", lower=0, upper=100), # from PyMC example
}

formula3 = "log_radon ~ 1 + (1|county) + floor"
#######################################################
mod3 = bmb.Model(formula=formula3,
                 family="gaussian",
                 link="identity",
                 priors=priors3,
                 data=radon,
                 noncentered=False)

mod3

Out[31]:

       Formula: log_radon ~ 1 + (1|county) + floor
        Family: gaussian
          Link: mu = identity
  Observations: 919
        Priors: 
    target = mu
        Common-level effects
            Intercept ~ Normal(mu: 1.0, sigma: 2.0)
            floor ~ Normal(mu: 0.0, sigma: 5.0)
        
        Group-level effects
            1|county ~ Normal(mu: 0.0, sigma: Exponential(lam: 1.0))
        
        Auxiliary parameters
            sigma ~ HalfStudentT(nu: 4.0, sigma: 1.0)

In [32]:

Copied!





mod3.build()
mod3.graph()

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example3_partial_pooling_mod3_graph")
# mod3.graph(name=filename, fmt="png", dpi=300)
mod3.build()
mod3.graph()

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example3_partial_pooling_mod3_graph")
# mod3.graph(name=filename, fmt="png", dpi=300)

Out[32]:

Model fitting and analysis¶

In [33]:

Copied!

idata3 = mod3.fit()
idata3 = mod3.fit()

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, Intercept, floor, 1|county_sigma, 1|county]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 4 seconds.

The group level parameters

In [34]:

Copied!

idata3sel = idata3.sel(county__factor_dim=sel_counties)
az.summary(idata3sel, kind="stats")
idata3sel = idata3.sel(county__factor_dim=sel_counties)
az.summary(idata3sel, kind="stats")

Out[34]:

	mean	sd	hdi_3%	hdi_97%
1\|county[LAC QUI PARLE]	0.415	0.297	-0.141	0.974
1\|county[AITKIN]	-0.274	0.255	-0.754	0.194
1\|county[KOOCHICHING]	-0.372	0.220	-0.798	0.025
1\|county[DOUGLAS]	0.166	0.207	-0.223	0.554
1\|county[HENNEPIN]	-0.098	0.086	-0.263	0.061
1\|county[STEARNS]	0.020	0.145	-0.257	0.287
1\|county[RAMSEY]	-0.258	0.137	-0.499	0.012
1\|county[ST LOUIS]	-0.572	0.088	-0.728	-0.402
1\|county_sigma	0.332	0.045	0.249	0.414
Intercept	1.462	0.054	1.361	1.565
floor[ground]	-0.692	0.071	-0.821	-0.556
sigma	0.757	0.018	0.723	0.792

The intercept offsets for each county are:

In [35]:

Copied!

# sum( idata3["posterior"]["1|county"].stack(sample=("chain","draw")).values.mean(axis=1) )
# sum( idata3["posterior"]["1|county"].stack(sample=("chain","draw")).values.mean(axis=1) )

In [36]:

Copied!

# az.plot_forest(idata3, combined=True, figsize=(7,2),
#                var_names=["Intercept", "floor", "1|county_sigma", "sigma"]);
# az.plot_forest(idata3, combined=True, figsize=(7,2),
#                var_names=["Intercept", "floor", "1|county_sigma", "sigma"]);

In [37]:

Copied!





#######################################################
axs = az.plot_forest(idata3sel, var_names=["Intercept", "1|county", "floor", "1|county_sigma", "sigma"], combined=True, figsize=(6,3))
# axs = az.plot_forest(idata3sel, combined=True, figsize=(6,3))
axs[0].set_xticks(np.arange(-0.8,1.6,0.2))
axs[0].set_title(None);

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "forest_plot_mod3_sel_counties.pdf")
# savefigure(plt.gcf(), filename)
#######################################################
axs = az.plot_forest(idata3sel, var_names=["Intercept", "1|county", "floor", "1|county_sigma", "sigma"], combined=True, figsize=(6,3))
# axs = az.plot_forest(idata3sel, combined=True, figsize=(6,3))
axs[0].set_xticks(np.arange(-0.8,1.6,0.2))
axs[0].set_title(None);

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "forest_plot_mod3_sel_counties.pdf")
# savefigure(plt.gcf(), filename)

In [38]:

Copied!

# az.plot_forest(idata3, var_names=["1|county"], combined=True);
# az.plot_forest(idata3, var_names=["1|county"], combined=True);

Compare models¶

Compare complete pooling, no pooling, and partial pooling models

In [39]:

Copied!

plot_counties(radon, idata_cp=idata1, idata_np=idata2, idata_pp=idata3);
plot_counties(radon, idata_cp=idata1, idata_np=idata2, idata_pp=idata3);

In [40]:

Copied!





# # FIGURES ONLY
# fig = plot_counties(radon, idata_cp=idata1, idata_np=idata2, idata_pp=idata3, figsize=(7,3.2))
# filename = os.path.join(DESTDIR, "sel_counties_all_models.pdf")
# savefigure(fig, filename)
# # FIGURES ONLY
# fig = plot_counties(radon, idata_cp=idata1, idata_np=idata2, idata_pp=idata3, figsize=(7,3.2))
# filename = os.path.join(DESTDIR, "sel_counties_all_models.pdf")
# savefigure(fig, filename)

Conclusions¶

Explanations¶

Prior selection for hierarchical models¶

?

Varying intercepts and slopes model¶

= Group-specific slopes We can also make beta_x group-specific

The varying-slope, varying intercept model adds floor to the group-level co-efficients. Now estimates across counties will all have varying slope.

log_radon ~ floor + (1 + floor | county_id)

In [41]:

Copied!





#######################################################
formula4 = "log_radon ~ (1 + floor | county)"

priors4 = {
    "Intercept": bmb.Prior("Normal", mu=1, sigma=2),
    "1|county": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Exponential", lam=1)),
    "floor|county": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Exponential", lam=1)),
    "sigma": bmb.Prior("HalfStudentT", nu=4, sigma=1),
}

mod4 = bmb.Model(formula=formula4,
                 family="gaussian",
                 link="identity",
                 priors=priors4,
                 data=radon,
                 noncentered=False)
mod4
#######################################################
formula4 = "log_radon ~ (1 + floor | county)"

priors4 = {
    "Intercept": bmb.Prior("Normal", mu=1, sigma=2),
    "1|county": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Exponential", lam=1)),
    "floor|county": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Exponential", lam=1)),
    "sigma": bmb.Prior("HalfStudentT", nu=4, sigma=1),
}

mod4 = bmb.Model(formula=formula4,
                 family="gaussian",
                 link="identity",
                 priors=priors4,
                 data=radon,
                 noncentered=False)
mod4

Out[41]:

       Formula: log_radon ~ (1 + floor | county)
        Family: gaussian
          Link: mu = identity
  Observations: 919
        Priors: 
    target = mu
        Common-level effects
            Intercept ~ Normal(mu: 1.0, sigma: 2.0)
        
        Group-level effects
            1|county ~ Normal(mu: 0.0, sigma: Exponential(lam: 1.0))
            floor|county ~ Normal(mu: 0.0, sigma: Exponential(lam: 1.0))
        
        Auxiliary parameters
            sigma ~ HalfStudentT(nu: 4.0, sigma: 1.0)

In [42]:

Copied!





mod4.build()
mod4.graph()

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "varying_int_and_slopes_mod4_graph")
# mod4.graph(name=filename, fmt="png", dpi=300)
mod4.build()
mod4.graph()

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "varying_int_and_slopes_mod4_graph")
# mod4.graph(name=filename, fmt="png", dpi=300)

Out[42]:

In [43]:

Copied!

idata4 = mod4.fit()
idata4 = mod4.fit()

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, Intercept, 1|county_sigma, 1|county, floor|county_sigma, floor|county]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 5 seconds.

In [44]:

Copied!

az.autocorr(idata4["posterior"]["sigma"].values.flatten())[0:10]
az.autocorr(idata4["posterior"]["sigma"].values.flatten())[0:10]

Out[44]:

array([ 1.        , -0.29695825,  0.09938457,  0.00268883, -0.00236847,
       -0.02257113, -0.001118  , -0.00457042,  0.00551639, -0.00293763])

In [45]:

Copied!

az.summary(idata4)
az.summary(idata4)

Out[45]:

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
1\|county[AITKIN]	-0.307	0.262	-0.770	0.206	0.003	0.003	6503.0	3037.0	1.00
1\|county[ANOKA]	-0.470	0.112	-0.690	-0.269	0.002	0.001	5375.0	3360.0	1.00
1\|county[BECKER]	-0.049	0.291	-0.594	0.495	0.003	0.005	7426.0	3003.0	1.00
1\|county[BELTRAMI]	0.035	0.250	-0.448	0.486	0.003	0.004	7061.0	3354.0	1.00
1\|county[BENTON]	-0.054	0.253	-0.551	0.403	0.003	0.004	8452.0	3073.0	1.00
...	...	...	...	...	...	...	...	...	...
floor\|county[ground, WINONA]	-1.329	0.416	-2.124	-0.580	0.006	0.004	4827.0	3193.0	1.00
floor\|county[ground, WRIGHT]	-0.347	0.533	-1.337	0.635	0.006	0.007	6974.0	3217.0	1.00
floor\|county[ground, YELLOW MEDICINE]	0.012	0.721	-1.381	1.315	0.008	0.013	8499.0	2808.0	1.00
floor\|county_sigma[ground]	0.727	0.108	0.523	0.923	0.003	0.002	1109.0	2154.0	1.01
sigma	0.748	0.019	0.712	0.783	0.000	0.000	6623.0	3058.0	1.00

174 rows × 9 columns

In [46]:

Copied!

# idata4["posterior"]
# idata4["posterior"]

In [47]:

Copied!





idata4sel = idata4.sel(county__factor_dim=sel_counties)
var_names = ["Intercept",
             "1|county",
             "floor|county",
             "1|county_sigma",
             "floor|county_sigma",
             "sigma"]
axs = az.plot_forest(idata4sel, combined=True, var_names=var_names, figsize=(6,4))
axs[0].set_xticks(np.arange(-1.6,1.6,0.2))
axs[0].set_title(None);

# FIGURES ONLY
# filename = os.path.join(DESTDIR, "forest_plot_mod4_sel_counties.pdf")
# savefigure(plt.gcf(), filename)
idata4sel = idata4.sel(county__factor_dim=sel_counties)
var_names = ["Intercept",
             "1|county",
             "floor|county",
             "1|county_sigma",
             "floor|county_sigma",
             "sigma"]
axs = az.plot_forest(idata4sel, combined=True, var_names=var_names, figsize=(6,4))
axs[0].set_xticks(np.arange(-1.6,1.6,0.2))
axs[0].set_title(None);

# FIGURES ONLY
# filename = os.path.join(DESTDIR, "forest_plot_mod4_sel_counties.pdf")
# savefigure(plt.gcf(), filename)

Comparing models¶

In [48]:

Copied!





idata1 = mod1.fit(idata_kwargs={"log_likelihood": True})
idata2 = mod2.fit(idata_kwargs={"log_likelihood": True})
idata3 = mod3.fit(idata_kwargs={"log_likelihood": True})
idata4 = mod4.fit(idata_kwargs={"log_likelihood": True})
idata1 = mod1.fit(idata_kwargs={"log_likelihood": True})
idata2 = mod2.fit(idata_kwargs={"log_likelihood": True})
idata3 = mod3.fit(idata_kwargs={"log_likelihood": True})
idata4 = mod4.fit(idata_kwargs={"log_likelihood": True})

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, Intercept, floor]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, county, floor]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 4 seconds.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, Intercept, floor, 1|county_sigma, 1|county]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 3 seconds.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, Intercept, 1|county_sigma, 1|county, floor|county_sigma, floor|county]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 4 seconds.

Compare models based on their expected log pointwise predictive density (ELPD).

The ELPD is estimated either by Pareto smoothed importance sampling leave-one-out cross-validation (LOO) or using the widely applicable information criterion (WAIC). We recommend loo. Read more theory here - in a paper by some of the leading authorities on model comparison dx.doi.org/10.1111/1467-9868.00353

In [49]:

Copied!





radon_models = {
    "mod1 (complete pooling)": idata1,
    "mod2 (no pooling)": idata2,
    "mod3 (varying intercepts)": idata3,
    "mod4 (varying int. and slopes)": idata4,
}
compare_results = az.compare(radon_models)
compare_results
radon_models = {
    "mod1 (complete pooling)": idata1,
    "mod2 (no pooling)": idata2,
    "mod3 (varying intercepts)": idata3,
    "mod4 (varying int. and slopes)": idata4,
}
compare_results = az.compare(radon_models)
compare_results

Out[49]:

	rank	elpd_loo	p_loo	elpd_diff	weight	se	dse	warning	scale
mod3 (varying intercepts)	0	-1074.381588	49.901357	0.000000	0.717711	28.177042	0.000000	False	log
mod4 (varying int. and slopes)	1	-1088.868131	88.212135	14.486543	0.204327	30.029017	7.289633	True	log
mod2 (no pooling)	2	-1099.123664	86.173617	24.742076	0.000000	28.134669	6.330524	True	log
mod1 (complete pooling)	3	-1126.880651	3.742634	52.499063	0.077962	25.562509	10.647410	False	log

In [50]:

Copied!





az.plot_compare(compare_results);

# # FIGURES ONLY
# ax = az.plot_compare(compare_results, figsize=(9,3))
# ax.set_title(None)
# filename = os.path.join(DESTDIR, "model_comparison_mod1_mod2_mod3_mod4.pdf")
# savefigure(ax, filename)
az.plot_compare(compare_results);

# # FIGURES ONLY
# ax = az.plot_compare(compare_results, figsize=(9,3))
# ax.set_title(None)
# filename = os.path.join(DESTDIR, "model_comparison_mod1_mod2_mod3_mod4.pdf")
# savefigure(ax, filename)

ELPD and elpd_loo¶

The ELPD is the theoretical expected log pointwise predictive density for a new dataset (Eq 1 in VGG2017), which can be estimated, e.g., using cross-validation. elpd_loo is the Bayesian LOO estimate of the expected log pointwise predictive density (Eq 4 in VGG2017) and is a sum of N individual pointwise log predictive densities. Probability densities can be smaller or larger than 1, and thus log predictive densities can be negative or positive. For simplicity the ELPD acronym is used also for expected log pointwise predictive probabilities for discrete models. Probabilities are always equal or less than 1, and thus log predictive probabilities are 0 or negative.

via https://mc-stan.org/loo/reference/loo-glossary.html

In [ ]:

Frequentist multilevel models¶

We can use statsmodels to fit multilevel models too.

Varying intercepts model using `statsmodels`¶

In [51]:

Copied!





import statsmodels.formula.api as smf
sm3 = smf.mixedlm("log_radon ~ 0 + floor",   # Fixed effects (no intercept and floor as a fixed effect)
                  groups="county",           # Grouping variable for random effects
                  re_formula="1",            # Random effects = intercept
                  data=radon)
res3 = sm3.fit()
res3.summary().tables[1]
import statsmodels.formula.api as smf
sm3 = smf.mixedlm("log_radon ~ 0 + floor",   # Fixed effects (no intercept and floor as a fixed effect)
                  groups="county",           # Grouping variable for random effects
                  re_formula="1",            # Random effects = intercept
                  data=radon)
res3 = sm3.fit()
res3.summary().tables[1]

Out[51]:

	Coef.	Std.Err.	z	P>\|z\|	[0.025	0.975]
floor[basement]	1.462	0.052	28.164	0.000	1.360	1.563
floor[ground]	0.769	0.074	10.339	0.000	0.623	0.914
county Var	0.108	0.041

In [52]:

Copied!

# slope
res3.params["floor[ground]"] - res3.params["floor[basement]"]
# slope
res3.params["floor[ground]"] - res3.params["floor[basement]"]

Out[52]:

-0.6929937406558043

In [53]:

Copied!

# sigma-hat
np.sqrt(res3.scale)
# sigma-hat
np.sqrt(res3.scale)

Out[53]:

0.7555891484188184

In [54]:

Copied!

# standard deviation of the variability among county-specific Intercepts
np.sqrt(res3.cov_re.values)
# standard deviation of the variability among county-specific Intercepts
np.sqrt(res3.cov_re.values)

Out[54]:

array([[0.32822242]])

In [55]:

Copied!

# intercept for first country in res3 
res3.random_effects["AITKIN"].values
# intercept for first country in res3 
res3.random_effects["AITKIN"].values

Out[55]:

array([-0.27009756])

This is very close to the mean of the random effect coefficient for AITKIN in the Bayesian model mod3 which was $-0.268$.

Varying intercepts and slopes model using `statsmodels`¶

In [56]:

Copied!





sm4 = smf.mixedlm("log_radon ~ 0 + floor",   # Fixed effects (no intercept and floor as a fixed effect)
                  groups="county",           # Grouping variable for random effects
                  re_formula="1 + floor",    # Random effects: 1 for intercept, floor for slope
                  data=radon)
res4 = sm4.fit()
res4.summary().tables[1]
sm4 = smf.mixedlm("log_radon ~ 0 + floor",   # Fixed effects (no intercept and floor as a fixed effect)
                  groups="county",           # Grouping variable for random effects
                  re_formula="1 + floor",    # Random effects: 1 for intercept, floor for slope
                  data=radon)
res4 = sm4.fit()
res4.summary().tables[1]

Out[56]:

	Coef.	Std.Err.	z	P>\|z\|	[0.025	0.975]
floor[basement]	1.463	0.054	26.977	0.000	1.356	1.569
floor[ground]	0.782	0.085	9.208	0.000	0.615	0.948
county Var	0.122	0.049
county x floor[T.ground] Cov	-0.040	0.057
floor[T.ground] Var	0.118	0.120

In [57]:

Copied!

# slope estimate
res4.params["floor[ground]"] - res4.params["floor[basement]"]
# slope estimate
res4.params["floor[ground]"] - res4.params["floor[basement]"]

Out[57]:

-0.6810977572101944

In [58]:

Copied!

# sigma-hat
np.sqrt(res4.scale)
# sigma-hat
np.sqrt(res4.scale)

Out[58]:

0.7461559982563548

In [59]:

Copied!

# standard deviation of the variability among county-specific Intercepts
county_var_int = res4.cov_re.loc["county", "county"]
np.sqrt(county_var_int)
# standard deviation of the variability among county-specific Intercepts
county_var_int = res4.cov_re.loc["county", "county"]
np.sqrt(county_var_int)

Out[59]:

0.3486722110725776

In [60]:

Copied!

# standard deviation of the variability among county-specific slopes
county_var_slopes = res4.cov_re.loc["floor[T.ground]", "floor[T.ground]"]
np.sqrt(county_var_slopes)
# standard deviation of the variability among county-specific slopes
county_var_slopes = res4.cov_re.loc["floor[T.ground]", "floor[T.ground]"]
np.sqrt(county_var_slopes)

Out[60]:

0.3435539748320278

In [61]:

Copied!

# correlation between Intercept and slope group-level coefficients
county_floor_cov = res4.cov_re.loc["county", "floor[T.ground]"]
county_floor_cov / (np.sqrt(county_var_int)*np.sqrt(county_var_slopes))
# correlation between Intercept and slope group-level coefficients
county_floor_cov = res4.cov_re.loc["county", "floor[T.ground]"]
county_floor_cov / (np.sqrt(county_var_int)*np.sqrt(county_var_slopes))

Out[61]:

-0.3372303607779023

Discussion¶

Alternative notations for hierarchical models¶

IMPORT FROM Gelman & Hill Section 12.5 printout
watch the subscripts!

Computational challenges associated with hierarchical models¶

centred vs. noncentred representations

Benefits of multilevel models¶

TODO LIST
Better than repeated measures ANOVA because:
- tells you the direction and magnitude of effect
- can handle more multiple grouping scenarios (e.g. by-item, and by-student)
- works for categorical predictors

Applications¶

Need for hierarchical models often occurs in social sciences (better than ANOVA)
Hierarchical models are often used for Bayesian meta-analysis

Exercises¶

Exercise: mod1u¶

Same model as Example 3 but also include the predictor log_uranium.

In [62]:

Copied!





import bambi as bmb

covariate_priors = {
    "floor": bmb.Prior("Normal", mu=0, sigma=10),
    "log_uranium": bmb.Prior("Normal", mu=0, sigma=10),
    "1|county": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Exponential", lam=1)),
    "sigma": bmb.Prior("Exponential", lam=1),
}

mod1u = bmb.Model(formula="log_radon ~ 1 + floor + (1|county) + log_uranium",
                  priors=covariate_priors,
                  data=radon)

mod1u
import bambi as bmb

covariate_priors = {
    "floor": bmb.Prior("Normal", mu=0, sigma=10),
    "log_uranium": bmb.Prior("Normal", mu=0, sigma=10),
    "1|county": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Exponential", lam=1)),
    "sigma": bmb.Prior("Exponential", lam=1),
}

mod1u = bmb.Model(formula="log_radon ~ 1 + floor + (1|county) + log_uranium",
                  priors=covariate_priors,
                  data=radon)

mod1u

Out[62]:

       Formula: log_radon ~ 1 + floor + (1|county) + log_uranium
        Family: gaussian
          Link: mu = identity
  Observations: 919
        Priors: 
    target = mu
        Common-level effects
            Intercept ~ Normal(mu: 1.2246, sigma: 2.1322)
            floor ~ Normal(mu: 0.0, sigma: 10.0)
            log_uranium ~ Normal(mu: 0.0, sigma: 10.0)
        
        Group-level effects
            1|county ~ Normal(mu: 0.0, sigma: Exponential(lam: 1.0))
        
        Auxiliary parameters
            sigma ~ Exponential(lam: 1.0)

Exercise: pigs dataset¶

https://bambinos.github.io/bambi/notebooks/multi-level_regression.html

In [63]:

Copied!

import statsmodels.api as sm
dietox = sm.datasets.get_rdataset("dietox", "geepack").data
dietox
import statsmodels.api as sm
dietox = sm.datasets.get_rdataset("dietox", "geepack").data
dietox

Out[63]:

	Pig	Evit	Cu	Litter	Start	Weight	Feed	Time
0	4601	Evit000	Cu000	1	26.5	26.50000	NaN	1
1	4601	Evit000	Cu000	1	26.5	27.59999	5.200005	2
2	4601	Evit000	Cu000	1	26.5	36.50000	17.600000	3
3	4601	Evit000	Cu000	1	26.5	40.29999	28.500000	4
4	4601	Evit000	Cu000	1	26.5	49.09998	45.200001	5
...	...	...	...	...	...	...	...	...
856	8442	Evit000	Cu175	24	25.7	73.19995	83.800003	8
857	8442	Evit000	Cu175	24	25.7	81.69995	99.800003	9
858	8442	Evit000	Cu175	24	25.7	90.29999	115.200001	10
859	8442	Evit000	Cu175	24	25.7	96.00000	133.200001	11
860	8442	Evit000	Cu175	24	25.7	103.50000	151.400002	12

861 rows × 8 columns

In [64]:

Copied!

pigsmodel = bmb.Model("Weight ~ Time + (Time|Pig)", dietox)
pigsidata = pigsmodel.fit()
pigsmodel = bmb.Model("Weight ~ Time + (Time|Pig)", dietox)
pigsidata = pigsmodel.fit()

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, Intercept, Time, 1|Pig_sigma, 1|Pig_offset, Time|Pig_sigma, Time|Pig_offset]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 8 seconds.

In [65]:

Copied!

az.summary(pigsidata, var_names=["Intercept", "Time", "1|Pig_sigma", "Time|Pig_sigma", "sigma"])
az.summary(pigsidata, var_names=["Intercept", "Time", "1|Pig_sigma", "Time|Pig_sigma", "sigma"])

Out[65]:

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
Intercept	15.728	0.567	14.697	16.821	0.021	0.015	720.0	1307.0	1.00
Time	6.943	0.083	6.779	7.085	0.003	0.002	613.0	1150.0	1.01
1\|Pig_sigma	4.547	0.435	3.755	5.344	0.012	0.009	1237.0	2082.0	1.00
Time\|Pig_sigma	0.663	0.061	0.552	0.776	0.002	0.001	1025.0	1834.0	1.00
sigma	2.458	0.064	2.332	2.571	0.001	0.001	5082.0	3282.0	1.00

Exercise: educational data¶

cf. https://mc-stan.org/users/documentation/case-studies/tutorial_rstanarm.html

1.1 Data example We will be analyzing the Gcsemv dataset (Rasbash et al. 2000) from the mlmRev package in R. The data include the General Certificate of Secondary Education (GCSE) exam scores of 1,905 students from 73 schools in England on a science subject. The Gcsemv dataset consists of the following 5 variables:

school: school identifier
student: student identifier
gender: gender of a student (M: Male, F: Female)
written: total score on written paper
course: total score on coursework paper

In [66]:

Copied!

import pyreadr

# Gcsemv_r = pyreadr.read_r('/Users/ivan/Downloads/mlmRev/data/Gcsemv.rda')
# Gcsemv_r["Gcsemv"].dropna().to_csv("../datasets/gcsemv.csv", index=False)

gcsemv = pd.read_csv("../datasets/gcsemv.csv")
gcsemv.head()
import pyreadr

# Gcsemv_r = pyreadr.read_r('/Users/ivan/Downloads/mlmRev/data/Gcsemv.rda')
# Gcsemv_r["Gcsemv"].dropna().to_csv("../datasets/gcsemv.csv", index=False)

gcsemv = pd.read_csv("../datasets/gcsemv.csv")
gcsemv.head()

Out[66]:

	school	student	gender	written	course
0	20920	27	F	39.0	76.8
1	20920	31	F	36.0	87.9
2	20920	42	M	16.0	44.4
3	20920	101	F	49.0	89.8
4	20920	113	M	25.0	17.5

In [67]:

Copied!

import bambi as bmb
m1 = bmb.Model(formula="course ~ 1 + (1 | school)", data=gcsemv)
m1
import bambi as bmb
m1 = bmb.Model(formula="course ~ 1 + (1 | school)", data=gcsemv)
m1

Out[67]:

       Formula: course ~ 1 + (1 | school)
        Family: gaussian
          Link: mu = identity
  Observations: 1523
        Priors: 
    target = mu
        Common-level effects
            Intercept ~ Normal(mu: 73.3814, sigma: 41.0781)
        
        Group-level effects
            1|school ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 41.0781))
        
        Auxiliary parameters
            sigma ~ HalfStudentT(nu: 4.0, sigma: 16.4312)

In [68]:

Copied!

idata_m1 = m1.fit()
idata_m1 = m1.fit()

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, Intercept, 1|school_sigma, 1|school_offset]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 3 seconds.

In [69]:

Copied!

az.summary(idata_m1)
az.summary(idata_m1)

Out[69]:

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
1\|school[20920]	-6.920	5.088	-16.830	2.427	0.067	0.059	5673.0	3212.0	1.00
1\|school[22520]	-15.667	2.147	-19.559	-11.496	0.045	0.032	2267.0	2796.0	1.00
1\|school[22710]	6.356	3.648	-0.614	13.049	0.053	0.042	4791.0	3064.0	1.00
1\|school[22738]	-0.821	4.430	-9.283	7.546	0.061	0.081	5328.0	2723.0	1.00
1\|school[22908]	-0.756	5.915	-11.248	10.665	0.077	0.096	5847.0	2911.0	1.00
...	...	...	...	...	...	...	...	...	...
1\|school[84707]	3.857	7.661	-10.353	17.952	0.109	0.118	4942.0	2715.0	1.00
1\|school[84772]	8.591	3.623	1.976	15.587	0.056	0.043	4238.0	2806.0	1.00
1\|school_sigma	8.750	0.908	7.075	10.443	0.031	0.022	888.0	1132.0	1.01
Intercept	73.819	1.134	71.736	76.002	0.039	0.028	834.0	1452.0	1.01
sigma	13.985	0.262	13.499	14.444	0.004	0.003	5221.0	2428.0	1.00

76 rows × 9 columns

In [70]:

Copied!

m3 = bmb.Model(formula="course ~ gender + (1 + gender|school)", data=gcsemv)
m3
m3 = bmb.Model(formula="course ~ gender + (1 + gender|school)", data=gcsemv)
m3

Out[70]:

       Formula: course ~ gender + (1 + gender|school)
        Family: gaussian
          Link: mu = identity
  Observations: 1523
        Priors: 
    target = mu
        Common-level effects
            Intercept ~ Normal(mu: 73.3814, sigma: 53.4663)
            gender ~ Normal(mu: 0.0, sigma: 83.5292)
        
        Group-level effects
            1|school ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 53.4663))
            gender|school ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 83.5292))
        
        Auxiliary parameters
            sigma ~ HalfStudentT(nu: 4.0, sigma: 16.4312)

In [71]:

Copied!

idata_m3 = m3.fit()
idata_m3 = m3.fit()

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, Intercept, gender, 1|school_sigma, 1|school_offset, gender|school_sigma, gender|school_offset]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 6 seconds.

Exercise: sleepstudy dataset¶

Description: Contains reaction times of subjects under sleep deprivation conditions.
Source: Featured in the R package lme4.
Application: Demonstrates linear mixed-effects modeling with random slopes and intercepts.

Links:

In [72]:

Copied!

sleepstudy = bmb.load_data("sleepstudy")
sleepstudy
sleepstudy = bmb.load_data("sleepstudy")
sleepstudy

Out[72]:

	Reaction	Days	Subject
0	249.5600	0	308
1	258.7047	1	308
2	250.8006	2	308
3	321.4398	3	308
4	356.8519	4	308
...	...	...	...
175	329.6076	5	372
176	334.4818	6	372
177	343.2199	7	372
178	369.1417	8	372
179	364.1236	9	372

180 rows × 3 columns

In [ ]:

Exercise: tadpoles (BONUS)¶

https://www.youtube.com/watch?v=iwVqiiXYeC4

logistic regression model

Links¶

In [ ]:

EXTRA MATERIAL¶

In [ ]:

Section 5.5 — Hierarchical models¶

Notebook setup¶

Definitions¶

Hierarchical (multilevel) models¶

Model formula¶

Radon dataset¶

Loading the data¶

Example 1: complete pooling model¶

Bayesian model¶

Bambi model¶

Model fitting and analysis¶

Conclusion¶

Example 2: no pooling model¶

Bayesian model¶

Bambi model¶

Model fitting and analysis¶

Conclusion¶

Example 3: hierarchical model¶

Bayesian hierarchical model¶

Bambi model¶

Model fitting and analysis¶

Compare models¶

Conclusions¶

Explanations¶

Prior selection for hierarchical models¶

Varying intercepts and slopes model¶

Comparing models¶

ELPD and elpd_loo¶

Frequentist multilevel models¶

Varying intercepts model using statsmodels¶

Varying intercepts and slopes model using statsmodels¶

Discussion¶

Alternative notations for hierarchical models¶

Computational challenges associated with hierarchical models¶

Benefits of multilevel models¶

Applications¶

Exercises¶

Exercise: mod1u¶

Exercise: pigs dataset¶

Exercise: educational data¶

Exercise: sleepstudy dataset¶

Exercise: tadpoles (BONUS)¶

Links¶

EXTRA MATERIAL¶

Varying intercepts model using `statsmodels`¶

Varying intercepts and slopes model using `statsmodels`¶