Section 5.3 — Bayesian linear models¶

This notebook contains the code examples from Section 5.3 Bayesian linear models from the No Bullshit Guide to Statistics.

Notebook setup¶

In [1]:

Copied!





# load Python modules
import os
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import bambi as bmb
import numpy as np
import arviz as az
# load Python modules
import os
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import bambi as bmb
import numpy as np
import arviz as az

WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.

In [2]:

Copied!





# Figures setup
plt.clf()  # needed otherwise `sns.set_theme` doesn"t work
from plot_helpers import RCPARAMS
# RCPARAMS.update({"figure.figsize": (9, 5)})   # good for screen
RCPARAMS.update({"figure.figsize": (6, 3)})  # good for print
sns.set_theme(
    context="paper",
    style="whitegrid",
    palette="colorblind",
    rc=RCPARAMS,
)

# High-resolution please
%config InlineBackend.figure_format = "retina"

# Where to store figures
from ministats.utils import savefigure
DESTDIR = "figures/bayes/linear"
#######################################################
# Figures setup
plt.clf()  # needed otherwise `sns.set_theme` doesn"t work
from plot_helpers import RCPARAMS
# RCPARAMS.update({"figure.figsize": (9, 5)})   # good for screen
RCPARAMS.update({"figure.figsize": (6, 3)})  # good for print
sns.set_theme(
    context="paper",
    style="whitegrid",
    palette="colorblind",
    rc=RCPARAMS,
)

# High-resolution please
%config InlineBackend.figure_format = "retina"

# Where to store figures
from ministats.utils import savefigure
DESTDIR = "figures/bayes/linear"
#######################################################

<Figure size 640x480 with 0 Axes>

In [3]:

Copied!

# set random seed for repeatability
np.random.seed(42)
# set random seed for repeatability
np.random.seed(42)

In [4]:

Copied!

# silence statsmodels kurtosistest warning when using n < 20
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
# silence statsmodels kurtosistest warning when using n < 20
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

Bayesian model¶

TODO: formula

TODO: graphical model diagram

Example 1: students score as a function of effort¶

Students dataset¶

In [5]:

Copied!

students = pd.read_csv("../datasets/students.csv")
students.shape
students = pd.read_csv("../datasets/students.csv")
students.shape

Out[5]:

(15, 5)

In [6]:

Copied!

students.head(3)
students.head(3)

Out[6]:

	student_ID	background	curriculum	effort	score
0	1	arts	debate	10.96	75.0
1	2	science	lecture	8.69	75.0
2	3	arts	debate	8.60	67.0

In [7]:

Copied!

students[["effort","score"]].describe().T
students[["effort","score"]].describe().T

Out[7]:

	count	mean	std	min	25%	50%	75%	max
effort	15.0	8.904667	1.948156	5.21	7.76	8.69	10.35	12.0
score	15.0	72.580000	9.979279	57.00	68.00	72.70	75.75	96.2

In [8]:

Copied!

sns.scatterplot(x="effort", y="score", data=students);

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example1_posterior.pdf")
# savefigure(plt.gcf(), filename)
sns.scatterplot(x="effort", y="score", data=students);

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example1_posterior.pdf")
# savefigure(plt.gcf(), filename)

No description has been provided for this image

Bayesian model¶

TODO: add formulas

Bambi model¶

In [9]:

Copied!





#######################################################
priors1 = {
    "Intercept": bmb.Prior("Normal", mu=70, sigma=20),
    "effort": bmb.Prior("Normal", mu=0, sigma=10),
    "sigma": bmb.Prior("HalfStudentT", nu=4, sigma=10),
}

mod1 = bmb.Model("score ~ 1 + effort",
                 family="gaussian",
                 link="identity",
                 priors=priors1,
                 data=students)
mod1
#######################################################
priors1 = {
    "Intercept": bmb.Prior("Normal", mu=70, sigma=20),
    "effort": bmb.Prior("Normal", mu=0, sigma=10),
    "sigma": bmb.Prior("HalfStudentT", nu=4, sigma=10),
}

mod1 = bmb.Model("score ~ 1 + effort",
                 family="gaussian",
                 link="identity",
                 priors=priors1,
                 data=students)
mod1

Out[9]:

       Formula: score ~ 1 + effort
        Family: gaussian
          Link: mu = identity
  Observations: 15
        Priors: 
    target = mu
        Common-level effects
            Intercept ~ Normal(mu: 70.0, sigma: 20.0)
            effort ~ Normal(mu: 0.0, sigma: 10.0)
        
        Auxiliary parameters
            sigma ~ HalfStudentT(nu: 4.0, sigma: 10.0)

In [10]:

Copied!





mod1.build()
mod1.graph()

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example1_students_mod1_graph")
# mod1.graph(name=filename, fmt="png", dpi=300)
mod1.build()
mod1.graph()

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example1_students_mod1_graph")
# mod1.graph(name=filename, fmt="png", dpi=300)

Out[10]:

Model fitting and analysis¶

In [11]:

Copied!

idata1 = mod1.fit()
idata1 = mod1.fit()

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, Intercept, effort]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 3 seconds.

In [12]:

Copied!

az.summary(idata1, kind="stats")
az.summary(idata1, kind="stats")

Out[12]:

	mean	sd	hdi_3%	hdi_97%
Intercept	32.585	6.867	19.962	45.901
effort	4.491	0.753	3.066	5.943
sigma	5.386	1.191	3.474	7.622

In [13]:

Copied!

az.plot_posterior(idata1);
az.plot_posterior(idata1);

In [14]:

Copied!





# # FIGURES ONLY
# az.plot_posterior(idata1, round_to=2, figsize=(7,2));
# filename = os.path.join(DESTDIR, "example1_students_posterior.pdf")
# savefigure(plt.gcf(), filename)
# # FIGURES ONLY
# az.plot_posterior(idata1, round_to=2, figsize=(7,2));
# filename = os.path.join(DESTDIR, "example1_students_posterior.pdf")
# savefigure(plt.gcf(), filename)

In [15]:

Copied!

# az.plot_ppc(idata1_rep, group="posterior")
# az.plot_ppc(idata1_rep, group="posterior")

In [16]:

Copied!





import xarray as xr

# Generate samples form the posterior predictive distribution
idata1_rep = mod1.predict(idata1, inplace=False, kind="response")

# Calculate the model mean
post1 = idata1_rep["posterior"]
efforts = students["effort"]
post1["score_model"] = post1["Intercept"] + post1["effort"] * xr.DataArray(efforts)

# Plot
# az.plot_lm(y="score", idata=idata1, x=efforts);
# az.plot_lm(y="score", idata=idata1, y_model="score_pred", x=efforts);
az.plot_lm(y="score", idata=idata1_rep, y_model="score_model",
           x=efforts, kind_pp="hdi", kind_model="hdi");

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example1_plot_predictions.pdf")
# savefigure(plt.gcf(), filename)
import xarray as xr

# Generate samples form the posterior predictive distribution
idata1_rep = mod1.predict(idata1, inplace=False, kind="response")

# Calculate the model mean
post1 = idata1_rep["posterior"]
efforts = students["effort"]
post1["score_model"] = post1["Intercept"] + post1["effort"] * xr.DataArray(efforts)

# Plot
# az.plot_lm(y="score", idata=idata1, x=efforts);
# az.plot_lm(y="score", idata=idata1, y_model="score_pred", x=efforts);
az.plot_lm(y="score", idata=idata1_rep, y_model="score_model",
           x=efforts, kind_pp="hdi", kind_model="hdi");

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example1_plot_predictions.pdf")
# savefigure(plt.gcf(), filename)

Compare to previous results¶

In [17]:

Copied!





# compare with statsmodels results
import statsmodels.formula.api as smf
lm1 = smf.ols("score ~ 1 + effort", data=students).fit()
lm1.summary().tables[1]
# compare with statsmodels results
import statsmodels.formula.api as smf
lm1 = smf.ols("score ~ 1 + effort", data=students).fit()
lm1.summary().tables[1]

Out[17]:

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	32.4658	6.155	5.275	0.000	19.169	45.763
effort	4.5049	0.676	6.661	0.000	3.044	5.966

In [18]:

Copied!

np.sqrt(lm1.scale)
np.sqrt(lm1.scale)

Out[18]:

4.929598282660259

Conclusions¶

We see effort tends to increase student scores. The results we obtain from the Bayesian analysis are largely consistent with the frequentist results from Section 4.1, however Bayesian models allow for simpler interpretation.

Example 2: doctors sleep scores¶

Doctors dataset¶

In [19]:

Copied!

doctors = pd.read_csv("../datasets/doctors.csv")
doctors.shape
doctors = pd.read_csv("../datasets/doctors.csv")
doctors.shape

Out[19]:

(156, 9)

In [20]:

Copied!

doctors.head(3)
doctors.head(3)

Out[20]:

	permit	loc	work	hours	caf	alc	weed	exrc	score
0	93273	rur	hos	21	2	0	5.0	0.0	63
1	90852	urb	cli	74	26	20	0.0	4.5	16
2	92744	urb	hos	63	25	1	0.0	7.0	58

In [21]:

Copied!

doctors[["alc","weed","exrc","score"]].describe().T
doctors[["alc","weed","exrc","score"]].describe().T

Out[21]:

	count	mean	std	min	25%	50%	75%	max
alc	156.0	11.839744	9.428506	0.0	3.750	11.0	19.0	44.0
weed	156.0	0.628205	1.391068	0.0	0.000	0.0	0.5	10.5
exrc	156.0	5.387821	4.796361	0.0	0.875	4.5	8.0	19.0
score	156.0	48.025641	20.446294	4.0	33.000	49.5	62.0	97.0

Bayesian model¶

TODO: add formulas

Bambi model¶

In [22]:

Copied!





#######################################################
priors2 = {
    "Intercept": bmb.Prior("Normal", mu=50, sigma=40),
    # we'll set the priors for the slopes below
    "sigma": bmb.Prior("HalfStudentT", nu=4, sigma=20),
}

mod2 = bmb.Model("score ~ 1 + alc + weed + exrc",
                 family="gaussian",
                 link="identity",
                 priors=priors2,
                 data=doctors)

# set the same prior for all slopes using `set_priors`
slope_prior = bmb.Prior("Normal", mu=0, sigma=10)
mod2.set_priors(common=slope_prior)

mod2
#######################################################
priors2 = {
    "Intercept": bmb.Prior("Normal", mu=50, sigma=40),
    # we'll set the priors for the slopes below
    "sigma": bmb.Prior("HalfStudentT", nu=4, sigma=20),
}

mod2 = bmb.Model("score ~ 1 + alc + weed + exrc",
                 family="gaussian",
                 link="identity",
                 priors=priors2,
                 data=doctors)

# set the same prior for all slopes using `set_priors`
slope_prior = bmb.Prior("Normal", mu=0, sigma=10)
mod2.set_priors(common=slope_prior)

mod2

Out[22]:

       Formula: score ~ 1 + alc + weed + exrc
        Family: gaussian
          Link: mu = identity
  Observations: 156
        Priors: 
    target = mu
        Common-level effects
            Intercept ~ Normal(mu: 50.0, sigma: 40.0)
            alc ~ Normal(mu: 0.0, sigma: 10.0)
            weed ~ Normal(mu: 0.0, sigma: 10.0)
            exrc ~ Normal(mu: 0.0, sigma: 10.0)
        
        Auxiliary parameters
            sigma ~ HalfStudentT(nu: 4.0, sigma: 20.0)

In [23]:

Copied!





mod2.build()
mod2.graph()

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example2_doctors_mod2_graph")
# mod2.graph(name=filename, fmt="png", dpi=300)
mod2.build()
mod2.graph()

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example2_doctors_mod2_graph")
# mod2.graph(name=filename, fmt="png", dpi=300)

Out[23]:

Model fitting and analysis¶

In [24]:

Copied!

idata2 = mod2.fit(draws=5000)
idata2 = mod2.fit(draws=5000)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, Intercept, alc, weed, exrc]

Output()

Sampling 4 chains for 1_000 tune and 5_000 draw iterations (4_000 + 20_000 draws total) took 6 seconds.

In [25]:

Copied!

print(az.summary(idata2, kind="stats"))
print(az.summary(idata2, kind="stats"))

             mean     sd  hdi_3%  hdi_97%
Intercept  60.461  1.294  57.986   62.826
alc        -1.800  0.071  -1.939   -1.668
exrc        1.767  0.139   1.507    2.027
sigma       8.266  0.480   7.382    9.170
weed       -1.026  0.479  -1.924   -0.118

In [26]:

Copied!

az.plot_posterior(idata2, var_names=["alc", "weed", "exrc"], ref_val=0);
az.plot_posterior(idata2, var_names=["alc", "weed", "exrc"], ref_val=0);

In [27]:

Copied!





# # FIGURES ONLY
# ref_vals = {"alc":[{"ref_val":0}],
#             "weed":[{"ref_val":0}],
#             "exrc":[{"ref_val":0}]}
# az.plot_posterior(idata2, round_to=2, ref_val=ref_vals, figsize=(7,3));
# filename = os.path.join(DESTDIR, "example2_posterior.pdf")
# savefigure(plt.gcf(), filename)
# # FIGURES ONLY
# ref_vals = {"alc":[{"ref_val":0}],
#             "weed":[{"ref_val":0}],
#             "exrc":[{"ref_val":0}]}
# az.plot_posterior(idata2, round_to=2, ref_val=ref_vals, figsize=(7,3));
# filename = os.path.join(DESTDIR, "example2_posterior.pdf")
# savefigure(plt.gcf(), filename)

Partial correlation scale?¶

cf. https://bambinos.github.io/bambi/notebooks/ESCS_multiple_regression.html#summarize-effects-on-partial-correlation-scale

Compare to previous results¶

In [28]:

Copied!





# compare with statsmodels results
import statsmodels.formula.api as smf
formula = "score ~ 1 + alc + weed + exrc"
lm2 = smf.ols(formula, data=doctors).fit()
lm2.summary().tables[1]
# compare with statsmodels results
import statsmodels.formula.api as smf
formula = "score ~ 1 + alc + weed + exrc"
lm2 = smf.ols(formula, data=doctors).fit()
lm2.summary().tables[1]

Out[28]:

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	60.4529	1.289	46.885	0.000	57.905	63.000
alc	-1.8001	0.070	-25.726	0.000	-1.938	-1.662
weed	-1.0216	0.476	-2.145	0.034	-1.962	-0.081
exrc	1.7683	0.138	12.809	0.000	1.496	2.041

In [29]:

Copied!

np.sqrt(lm2.scale)
np.sqrt(lm2.scale)

Out[29]:

8.202768119825624

Conclusions¶

In [ ]:

Example 3: Bayesian logistic regression¶

Interns data¶

In [30]:

Copied!

interns = pd.read_csv("../datasets/interns.csv")
print(interns.head(3))
interns = pd.read_csv("../datasets/interns.csv")
print(interns.head(3))

   work  hired
0  42.5      1
1  39.3      0
2  43.2      1

Bayesian logistic regression model¶

Bambi model¶

In [31]:

Copied!





priors3 = {
    "Intercept": bmb.Prior("Normal", mu=0, sigma=20),
    "work": bmb.Prior("Normal", mu=0, sigma=2),
}

mod3 = bmb.Model("hired ~ 1 + work",
                 family="bernoulli",
                 link="logit",
                 priors=priors3,
                 data=interns)
mod3
priors3 = {
    "Intercept": bmb.Prior("Normal", mu=0, sigma=20),
    "work": bmb.Prior("Normal", mu=0, sigma=2),
}

mod3 = bmb.Model("hired ~ 1 + work",
                 family="bernoulli",
                 link="logit",
                 priors=priors3,
                 data=interns)
mod3

Out[31]:

       Formula: hired ~ 1 + work
        Family: bernoulli
          Link: p = logit
  Observations: 100
        Priors: 
    target = p
        Common-level effects
            Intercept ~ Normal(mu: 0.0, sigma: 20.0)
            work ~ Normal(mu: 0.0, sigma: 2.0)

In [32]:

Copied!





mod3.build()
mod3.graph()

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example3_interns_mod3_graph")
# mod3.graph(name=filename, fmt="png", dpi=300)
mod3.build()
mod3.graph()

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example3_interns_mod3_graph")
# mod3.graph(name=filename, fmt="png", dpi=300)

Out[32]:

Model fitting and analysis¶

In [33]:

Copied!

idata3 = mod3.fit()
idata3 = mod3.fit()

Modeling the probability that hired==1
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [Intercept, work]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 2 seconds.
There were 1 divergences after tuning. Increase `target_accept` or reparameterize.

In [34]:

Copied!

az.summary(idata3, kind="stats")
az.summary(idata3, kind="stats")

Out[34]:

	mean	sd	hdi_3%	hdi_97%
Intercept	-82.051	19.306	-118.112	-46.927
work	2.066	0.486	1.188	2.977

In [35]:

Copied!

az.plot_posterior(idata3);
az.plot_posterior(idata3);

In [36]:

Copied!





# # FIGURES ONLY
# az.plot_posterior(idata3, round_to=2, figsize=(5,2));
# filename = os.path.join(DESTDIR, "example3_posterior.pdf")
# savefigure(plt.gcf(), filename)
# # FIGURES ONLY
# az.plot_posterior(idata3, round_to=2, figsize=(5,2));
# filename = os.path.join(DESTDIR, "example3_posterior.pdf")
# savefigure(plt.gcf(), filename)

Predictions¶

In [37]:

Copied!





import pandas as pd
new_interns = pd.DataFrame({"work": range(20, 60)})
idata3_pred = mod3.predict(idata3, data=new_interns, kind="response", inplace=False)
# idata3
# TODO: try to repdouce 
# https://github.com/tomicapretto/talks/blob/main/pydataglobal21/index.Rmd#L181-L193
import pandas as pd
new_interns = pd.DataFrame({"work": range(20, 60)})
idata3_pred = mod3.predict(idata3, data=new_interns, kind="response", inplace=False)
# idata3
# TODO: try to repdouce 
# https://github.com/tomicapretto/talks/blob/main/pydataglobal21/index.Rmd#L181-L193

Compare with previous results¶

In [38]:

Copied!

import statsmodels.formula.api as smf
lr1 = smf.logit("hired ~ 1 + work", data=interns).fit()
lr1.params
import statsmodels.formula.api as smf
lr1 = smf.logit("hired ~ 1 + work", data=interns).fit()
lr1.params

Optimization terminated successfully.
         Current function value: 0.138101
         Iterations 10

Out[38]:

Intercept   -78.693205
work          1.981458
dtype: float64

Conclusions¶

We end up with similar results...

Explanations¶

Shrinkage priors¶

Shrinkage priors = Prior distributions for a parameter that shrink its posterior estimate towards a particular value. Sparsity = A situation where most parameter values are zero and only a few are non-zero.

Laplace priors L1 regularization = lasso regression https://en.wikipedia.org/wiki/Lasso_(statistics)
Gaussian priors L2 regularization = ridge regression https://en.wikipedia.org/wiki/Ridge_regression
Reference priors Reference prior ppal pha, beta, sigmaq91{sigma Produces the same results as frequentist linear regression
Spike-and-slab priors Specialized for spike-and-slab prior = mix- ture of two distributions: one peaked around zero (spike) and the other a diffuse distribution (slab. The spike component identifies the zero elements whereas the slab component captures the non-zero coefficients.

Standardizing predictors¶

We make choosing priors easier makes inference more efficient cf. 04_lm/cut_material/standardized_predictors.tex Robust linear regression

Robust linear regression¶

We swap out the Normal distribution for Student's $t$-distribution to handle outliers better very useful when data has outliers; see EXX

Links:

In [39]:

Copied!





#######################################################
priors1r = {
    "Intercept": bmb.Prior("Normal", mu=70, sigma=20),
    "effort": bmb.Prior("Normal", mu=0, sigma=10),
    "sigma": bmb.Prior("HalfStudentT", nu=4, sigma=10),
    "nu": bmb.Prior("Gamma", alpha=2, beta=0.1),
}

mod1r = bmb.Model("score ~ 1 + effort",
                 family="t",
                 link="identity",
                 priors=priors1r,
                 data=students)
mod1r
#######################################################
priors1r = {
    "Intercept": bmb.Prior("Normal", mu=70, sigma=20),
    "effort": bmb.Prior("Normal", mu=0, sigma=10),
    "sigma": bmb.Prior("HalfStudentT", nu=4, sigma=10),
    "nu": bmb.Prior("Gamma", alpha=2, beta=0.1),
}

mod1r = bmb.Model("score ~ 1 + effort",
                 family="t",
                 link="identity",
                 priors=priors1r,
                 data=students)
mod1r

Out[39]:

       Formula: score ~ 1 + effort
        Family: t
          Link: mu = identity
  Observations: 15
        Priors: 
    target = mu
        Common-level effects
            Intercept ~ Normal(mu: 70.0, sigma: 20.0)
            effort ~ Normal(mu: 0.0, sigma: 10.0)
        
        Auxiliary parameters
            sigma ~ HalfStudentT(nu: 4.0, sigma: 10.0)
            nu ~ Gamma(alpha: 2.0, beta: 0.1)

In [40]:

Copied!

idata1r = mod1r.fit()
idata1r = mod1r.fit()

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, nu, Intercept, effort]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 2 seconds.

In [41]:

Copied!

az.summary(idata1r, kind="stats")
az.summary(idata1r, kind="stats")

Out[41]:

	mean	sd	hdi_3%	hdi_97%
Intercept	33.881	6.516	21.612	46.093
effort	4.328	0.727	2.895	5.655
nu	19.620	13.899	1.280	45.122
sigma	4.896	1.188	2.779	7.122

The mean of the slope $4.324$ is slightly less than the mean slope we found in mod1 ($4.462$), which shows the robust model doesn't care as much about the one outlier.

We also found a slightly smaller sigma, since we're using the $t$-distribution.

Discussion¶

Comparison to frequentist linear models¶

We can obtain similar results
Bayesian models naturally apply regularization (no need to manually add in)

Causal graphs¶

Causal graphs also used with Bayesian LMs (remember Sec 4.5)

Next steps¶

LMs work with categorical predators too, which is what we'll discuss in Section 5.4
LMs can be extended hierarchical models, which is what we'll discuss in Section 5.5

Exercises¶

Exercise 1: redo some of the exercises/problems from Ch4 using Bayesian methods¶

Exercise 2: redo examples of causal inference¶

Exercise 3: fit model with different priors¶

Exercise 4: redo logistic regression exercises from Sec 4.6¶

Exercise 5: bioassay logistic regression¶

Gelman et al. (2003) present an example of an acute toxicity test, commonly performed on animals to estimate the toxicity of various compounds.

In this dataset log_dose includes 4 levels of dosage, on the log scale, each administered to 5 rats during the experiment. The response variable is death, the number of positive responses to the dosage.

The number of deaths can be modeled as a binomial response, with the probability of death being a linear function of dose:

$$\begin{aligned} y_i &\sim \text{Binom}(n_i, p_i) \\ \text{logit}(p_i) &= a + b x_i \end{aligned}$$

The common statistic of interest in such experiments is the LD50, the dosage at which the probability of death is 50%.

via https://github.com/fonnesbeck/pymc_sdss_2024/blob/main/notebooks/Section2-PyMC_Intro.ipynb

In [42]:

Copied!

# Sample size in each group
n = 5

# Log dose in each group
log_dose = [-.86, -.3, -.05, .73]

# Outcomes
deaths = [0, 1, 3, 5]

df_bio = pd.DataFrame({"log_dose":log_dose, "deaths":deaths, "n":n})
# Sample size in each group
n = 5

# Log dose in each group
log_dose = [-.86, -.3, -.05, .73]

# Outcomes
deaths = [0, 1, 3, 5]

df_bio = pd.DataFrame({"log_dose":log_dose, "deaths":deaths, "n":n})

In [43]:

Copied!





# SOLUTION
priors_bio = {
    "Intercept": bmb.Prior("Normal", mu=0, sigma=5),
    "log_dose": bmb.Prior("Normal", mu=0, sigma=5),
}

mod_bio = bmb.Model(formula="p(deaths,n) ~ 1 + log_dose",
                    family="binomial",
                    link="logit",
                    priors=priors_bio,
                    data=df_bio)

idata_bio = mod_bio.fit()

post_bio = idata_bio["posterior"]
post_bio["LD50"] = -post_bio["Intercept"] / post_bio["log_dose"]

az.summary(idata_bio)
# SOLUTION
priors_bio = {
    "Intercept": bmb.Prior("Normal", mu=0, sigma=5),
    "log_dose": bmb.Prior("Normal", mu=0, sigma=5),
}

mod_bio = bmb.Model(formula="p(deaths,n) ~ 1 + log_dose",
                    family="binomial",
                    link="logit",
                    priors=priors_bio,
                    data=df_bio)

idata_bio = mod_bio.fit()

post_bio = idata_bio["posterior"]
post_bio["LD50"] = -post_bio["Intercept"] / post_bio["log_dose"]

az.summary(idata_bio)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [Intercept, log_dose]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.

Out[43]:

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
Intercept	0.602	0.786	-0.805	2.124	0.015	0.012	2626.0	2465.0	1.0
log_dose	6.357	2.437	2.057	10.917	0.050	0.038	2695.0	2711.0	1.0
LD50	-0.085	0.135	-0.310	0.195	0.003	0.002	2846.0	2584.0	1.0

Exercise 6: redo Poisson regression exercises from Sec 4.6¶

Exercise 7: fit normal and robust to the dataset ??TODO?? which has outliers¶

Exercise 8: PhD delays¶

cf. https://www.rensvandeschoot.com/tutorials/advanced-bayesian-regression-in-jasp/
https://zenodo.org/records/3999424
https://sci-hub.se/https://www.nature.com/articles/ s43586-020-00001-2

Links¶

In [ ]:

BONUS MATERIAL¶

Simple linear regression on synthetic data¶

In [44]:

Copied!





# Simulated data
np.random.seed(42)
x = np.random.normal(0, 1, 100)
y = 3 + 2 * x + np.random.normal(0, 1, 100)
# Simulated data
np.random.seed(42)
x = np.random.normal(0, 1, 100)
y = 3 + 2 * x + np.random.normal(0, 1, 100)

In [45]:

Copied!





df1 = pd.DataFrame({"x":x, "y":y})

priors1 = {
    "Intercept": bmb.Prior("Normal", mu=0, sigma=10),
    "x": bmb.Prior("Normal", mu=0, sigma=10),
    "sigma": bmb.Prior("HalfNormal", sigma=1),
}

model1 = bmb.Model("y ~ 1 + x",
                   priors=priors1,
                   data=df1)
print(model1)

idata = model1.fit()
df1 = pd.DataFrame({"x":x, "y":y})

priors1 = {
    "Intercept": bmb.Prior("Normal", mu=0, sigma=10),
    "x": bmb.Prior("Normal", mu=0, sigma=10),
    "sigma": bmb.Prior("HalfNormal", sigma=1),
}

model1 = bmb.Model("y ~ 1 + x",
                   priors=priors1,
                   data=df1)
print(model1)

idata = model1.fit()

       Formula: y ~ 1 + x
        Family: gaussian
          Link: mu = identity
  Observations: 100
        Priors: 
    target = mu
        Common-level effects
            Intercept ~ Normal(mu: 0.0, sigma: 10.0)
            x ~ Normal(mu: 0.0, sigma: 10.0)
        
        Auxiliary parameters
            sigma ~ HalfNormal(sigma: 1.0)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, Intercept, x]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.

In [46]:

Copied!

model1.plot_priors();
model1.plot_priors();

Sampling: [Intercept, sigma, x]

Summary using mean¶

In [47]:

Copied!

# Posterior Summary
summary = az.summary(idata, kind="stats")
summary
# Posterior Summary
summary = az.summary(idata, kind="stats")
summary

Out[47]:

	mean	sd	hdi_3%	hdi_97%
Intercept	3.006	0.096	2.827	3.181
sigma	0.956	0.069	0.831	1.086
x	1.856	0.108	1.667	2.062

Summary using median as focus statistic¶

ETI = Equal-Tailed Interval

In [48]:

Copied!

az.summary(idata, stat_focus="median", kind="stats")
az.summary(idata, stat_focus="median", kind="stats")

Out[48]:

	median	mad	eti_3%	eti_97%
Intercept	3.005	0.066	2.832	3.189
sigma	0.951	0.045	0.840	1.095
x	1.855	0.072	1.655	2.052

In [49]:

Copied!

# Plotting posterior
az.plot_posterior(idata, point_estimate="mean", round_to=3);
# Plotting posterior
az.plot_posterior(idata, point_estimate="mean", round_to=3);

Investigare further

https://python.arviz.org/en/latest/api/generated/arviz.plot_lm.html

In [50]:

Copied!

# az.plot_lm(idata)
# az.plot_lm(idata)

Simple linear regression using PyMC¶

In [51]:

Copied!

import pymc as pm
import pymc as pm

In [52]:

Copied!





# Simulated data
np.random.seed(42)
x = np.random.normal(0, 1, 100)
y = 3 + 2 * x + np.random.normal(0, 1, 100)
# Simulated data
np.random.seed(42)
x = np.random.normal(0, 1, 100)
y = 3 + 2 * x + np.random.normal(0, 1, 100)

In [53]:

Copied!





# Bayesian Linear Regression Model
with pm.Model() as pmmodel:
    # Priors
    beta0 = pm.Normal("beta0", mu=0, sigma=10)
    beta1 = pm.Normal("beta1", mu=0, sigma=10)
    sigma = pm.HalfNormal("sigma", sigma=1)
    
    # Likelihood
    mu = beta0 + beta1 * x
    y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=y)
    
    # Sampling
    idata = pm.sample()

az.summary(idata)
# Bayesian Linear Regression Model
with pm.Model() as pmmodel:
    # Priors
    beta0 = pm.Normal("beta0", mu=0, sigma=10)
    beta1 = pm.Normal("beta1", mu=0, sigma=10)
    sigma = pm.HalfNormal("sigma", sigma=1)
    
    # Likelihood
    mu = beta0 + beta1 * x
    y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=y)
    
    # Sampling
    idata = pm.sample()

az.summary(idata)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta0, beta1, sigma]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 2 seconds.

Out[53]:

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
beta0	3.007	0.097	2.828	3.190	0.001	0.001	5648.0	3261.0	1.0
beta1	1.857	0.107	1.664	2.065	0.001	0.001	6105.0	3549.0	1.0
sigma	0.958	0.070	0.821	1.078	0.001	0.001	6151.0	3006.0	1.0

In [ ]:

Bonus Bayesian logistic regression example¶

via file:///Users/ivan/Downloads/talks-main/pydataglobal21/index.html#15

via https://github.com/tomicapretto/talks/blob/main/pydataglobal21/index.Rmd#L123

In [54]:

Copied!

import bambi as bmb
data = bmb.load_data("ANES")
data.head()
import bambi as bmb
data = bmb.load_data("ANES")
data.head()

Out[54]:

	vote	age	party_id
0	clinton	56	democrat
1	trump	65	republican
2	clinton	80	democrat
3	trump	38	republican
4	trump	60	republican

In [55]:

Copied!

model = bmb.Model("vote[clinton] ~ 0 + party_id + party_id:age", data, family="bernoulli")
print(model)
idata = model.fit()
model = bmb.Model("vote[clinton] ~ 0 + party_id + party_id:age", data, family="bernoulli")
print(model)
idata = model.fit()

Modeling the probability that vote==clinton

       Formula: vote[clinton] ~ 0 + party_id + party_id:age
        Family: bernoulli
          Link: p = logit
  Observations: 421
        Priors: 
    target = p
        Common-level effects
            party_id ~ Normal(mu: [0. 0. 0.], sigma: [1. 1. 1.])
            party_id:age ~ Normal(mu: [0. 0. 0.], sigma: [0.0586 0.0586 0.0586])

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [party_id, party_id:age]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 12 seconds.

In [56]:

Copied!





import pandas as pd
new_subjects = pd.DataFrame({"age": [20, 60], "party_id": ["independent"] * 2})
model.predict(idata, data=new_subjects)

# TODO: try to repdouce 
# https://github.com/tomicapretto/talks/blob/main/pydataglobal21/index.Rmd#L181-L193
import pandas as pd
new_subjects = pd.DataFrame({"age": [20, 60], "party_id": ["independent"] * 2})
model.predict(idata, data=new_subjects)

# TODO: try to repdouce 
# https://github.com/tomicapretto/talks/blob/main/pydataglobal21/index.Rmd#L181-L193

In [ ]:

Bayesian Linear Regression (BONUS)¶

from cs109b_lect13_bayes_2_2021.ipynb

We will artificially create the data to predict on. We will then see if our model predicts them correctly.

In [57]:

Copied!





np.random.seed(123)

######## True parameter values 
##### our model does not see these
sigma = 1
beta0 = 1
beta = [1, 2.5]   
###############################
# Size of dataset
size = 100

# Feature variables
x1 = np.linspace(0, 1., size)
x2 = np.linspace(0,2., size)

# Create outcome variable with random noise
Y = beta0 + beta[0]*x1 + beta[1]*x2 + np.random.randn(size)*sigma
np.random.seed(123)

######## True parameter values 
##### our model does not see these
sigma = 1
beta0 = 1
beta = [1, 2.5]   
###############################
# Size of dataset
size = 100

# Feature variables
x1 = np.linspace(0, 1., size)
x2 = np.linspace(0,2., size)

# Create outcome variable with random noise
Y = beta0 + beta[0]*x1 + beta[1]*x2 + np.random.randn(size)*sigma

In [58]:

Copied!





from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
fontsize=14
labelsize=8
title='Observed Data (created artificially by ' + r'$Y(x_1,x_2)$)'
ax = fig.add_subplot(111, projection='3d')

ax.scatter(x1, x2, Y)
ax.set_xlabel(r'$x_1$', fontsize=fontsize)
ax.set_ylabel(r'$x_2$', fontsize=fontsize)
ax.set_zlabel(r'$Y$', fontsize=fontsize)

ax.tick_params(labelsize=labelsize)

fig.suptitle(title, fontsize=fontsize)        
fig.tight_layout(pad=.1, w_pad=10.1, h_pad=2.)
#fig.subplots_adjust(); #top=0.5
plt.tight_layout
plt.show()
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
fontsize=14
labelsize=8
title='Observed Data (created artificially by ' + r'$Y(x_1,x_2)$)'
ax = fig.add_subplot(111, projection='3d')

ax.scatter(x1, x2, Y)
ax.set_xlabel(r'$x_1$', fontsize=fontsize)
ax.set_ylabel(r'$x_2$', fontsize=fontsize)
ax.set_zlabel(r'$Y$', fontsize=fontsize)

ax.tick_params(labelsize=labelsize)

fig.suptitle(title, fontsize=fontsize)        
fig.tight_layout(pad=.1, w_pad=10.1, h_pad=2.)
#fig.subplots_adjust(); #top=0.5
plt.tight_layout
plt.show()

Now let's see if our model will correctly predict the values for our unknown parameters, namely $b_0$, $b_1$, $b_2$ and $\sigma$.

Defining the Problem¶

Our problem is the following: we want to perform multiple linear regression to predict an outcome variable $Y$ which depends on variables $\bf{x}_1$ and $\bf{x}_2$.

We will model $Y$ as normally distributed observations with an expected value $mu$ that is a linear function of the two predictor variables, $\bf{x}_1$ and $\bf{x}_2$.

\begin{equation} Y \sim \mathcal{N}(\mu,\,\sigma^{2}) \end{equation}

\begin{equation} \mu = \beta_0 + \beta_1 \bf{x}_1 + \beta_2 x_2 \end{equation}

where $\sigma^2$ represents the measurement error (in this example, we will use $\sigma^2 = 10$)

We also choose the parameters to have normal distributions with those parameters set by us.

\begin{eqnarray} \beta_i \sim \mathcal{N}(0,\,10) \\ \sigma^2 \sim |\mathcal{N}(0,\,10)| \end{eqnarray}

Defining a Model in PyMC3¶

In [59]:

Copied!





with pm.Model() as my_linear_model:

    # Priors for unknown model parameters, specifically created stochastic random variables 
    # with Normal prior distributions for the regression coefficients,
    # and a half-normal distribution for the standard deviation of the observations.
    # These are our parameters. P(theta)

    beta0 = pm.Normal('beta0', mu=0, sigma=10)
    # Note: betas is a vector of two variables, b1 and b2, (denoted by shape=2)
    # so, in array notation, our beta1 = betas[0], and beta2=betas[1]
    betas = pm.Normal('betas', mu=0, sigma=10, shape=2) 
    sigma = pm.HalfNormal('sigma', sigma=1)
    
    # mu is what is called a deterministic random variable, which implies that its value is completely
    # determined by its parents’ values (betas and sigma in our case). 
    # There is no uncertainty in the variable beyond that which is inherent in the parents’ values
    
    mu = beta0 + betas[0]*x1 + betas[1]*x2
    
    # Likelihood function = how probable is my observed data?
    # This is a special case of a stochastic variable that we call an observed stochastic.
    # It is identical to a standard stochastic, except that its observed argument, 
    # which passes the data to the variable, indicates that the values for this variable were observed, 
    # and should not be changed by any fitting algorithm applied to the model. 
    # The data can be passed in the form of either a numpy.ndarray or pandas.DataFrame object.
    
    Y_obs = pm.Normal('Y_obs', mu=mu, sigma=sigma, observed=Y)
with pm.Model() as my_linear_model:

    # Priors for unknown model parameters, specifically created stochastic random variables 
    # with Normal prior distributions for the regression coefficients,
    # and a half-normal distribution for the standard deviation of the observations.
    # These are our parameters. P(theta)

    beta0 = pm.Normal('beta0', mu=0, sigma=10)
    # Note: betas is a vector of two variables, b1 and b2, (denoted by shape=2)
    # so, in array notation, our beta1 = betas[0], and beta2=betas[1]
    betas = pm.Normal('betas', mu=0, sigma=10, shape=2) 
    sigma = pm.HalfNormal('sigma', sigma=1)
    
    # mu is what is called a deterministic random variable, which implies that its value is completely
    # determined by its parents’ values (betas and sigma in our case). 
    # There is no uncertainty in the variable beyond that which is inherent in the parents’ values
    
    mu = beta0 + betas[0]*x1 + betas[1]*x2
    
    # Likelihood function = how probable is my observed data?
    # This is a special case of a stochastic variable that we call an observed stochastic.
    # It is identical to a standard stochastic, except that its observed argument, 
    # which passes the data to the variable, indicates that the values for this variable were observed, 
    # and should not be changed by any fitting algorithm applied to the model. 
    # The data can be passed in the form of either a numpy.ndarray or pandas.DataFrame object.
    
    Y_obs = pm.Normal('Y_obs', mu=mu, sigma=sigma, observed=Y)

Note: If our problem was a classification for which we would use Logistic regression see below

Python Note: pm.Model is designed as a simple API that abstracts away the details of the inference. For the use of with see Compounds statements in Python..

In [60]:

Copied!





## do not worry about this, it's just a nice graph to have
## you need to install python-graphviz first
# conda install -c conda-forge python-graphviz
pm.model_to_graphviz(my_linear_model)
## do not worry about this, it's just a nice graph to have
## you need to install python-graphviz first
# conda install -c conda-forge python-graphviz
pm.model_to_graphviz(my_linear_model)

Out[60]:

Fitting the Model with Sampling - Doing Inference¶

See below for PyMC3's sampling method. As you can see it has quite a few parameters. Most of them are set to default values by the package. For some, it's useful to set your own values.

pymc3.sampling.sample(draws=500, step=None, n_init=200000, chains=None, 
                      cores=None, tune=500, random_seed=None)

Parameters to set:

draws: (int): Number of samples to keep when drawing, defaults to 500. Number starts after the tuning has ended.
tune: (int): Number of iterations to use for tuning the model, also called the burn-in period, defaults to 500. Samples from the tuning period will be discarded.
target_accept (float in $[0, 1]$). The step size is tuned such that we approximate this acceptance rate. Higher values like 0.9 or 0.95 often work better for problematic posteriors.
(optional) chains (int) number of chains to run in parallel, defaults to the number of CPUs in the system, but at most 4.

pm.sample returns a pymc3.backends.base.MultiTrace object that contains the samples. We usually name it a variation of the word trace. All the information about the posterior is in trace, which also provides statistics about the sampler.

In [61]:

Copied!

## uncomment this to see more about pm.sample
#help(pm.sample)
## uncomment this to see more about pm.sample
#help(pm.sample)

In [62]:

Copied!





with my_linear_model:
    print(f'Starting MCMC process')
    # draw nsamples posterior samples and run the default number of chains = 4 
    nsamples = 1000 # number of samples to keep
    burnin = 1000 # burnin period
    trace = pm.sample(nsamples, tune=burnin, target_accept=0.8) 
    print(f'DONE')
with my_linear_model:
    print(f'Starting MCMC process')
    # draw nsamples posterior samples and run the default number of chains = 4 
    nsamples = 1000 # number of samples to keep
    burnin = 1000 # burnin period
    trace = pm.sample(nsamples, tune=burnin, target_accept=0.8) 
    print(f'DONE')

Starting MCMC process

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [beta0, betas, sigma]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 25 seconds.

DONE

In [63]:

Copied!

var_names = trace["posterior"].data_vars
# # var_names = var_names.remove('sigma_log__')
var_names = ["beta0", "sigma"]
var_names = trace["posterior"].data_vars
# # var_names = var_names.remove('sigma_log__')
var_names = ["beta0", "sigma"]

Model Plotting¶

PyMC3 provides a variety of visualizations via plots: https://docs.pymc.io/api/plots.html. arviz is another library that you can use.

In [64]:

Copied!

az.plot_trace(trace);
az.plot_trace(trace);

In [65]:

Copied!





# generate results table from trace samples
# remember our true hidden values sigma = 1, beta0 = 1, beta = [1, 2.5] 
# We want R_hat < 1.1
az.summary(trace)
# generate results table from trace samples
# remember our true hidden values sigma = 1, beta0 = 1, beta = [1, 2.5] 
# We want R_hat < 1.1
az.summary(trace)

Out[65]:

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
beta0	1.021	0.231	0.614	1.480	0.004	0.003	2829.0	2117.0	1.0
betas[0]	1.489	8.637	-14.397	17.003	0.240	0.169	1296.0	1815.0	1.0
betas[1]	2.265	4.320	-5.500	10.261	0.120	0.085	1288.0	1807.0	1.0
sigma	1.147	0.084	1.007	1.309	0.002	0.001	2273.0	1875.0	1.0

In [66]:

Copied!

#help(pm.Normal)
#help(pm.Normal)

$\hat{R}$ is a metric for comparing how well a chain has converged to the equilibrium distribution by comparing its behavior to other randomly initialized Markov chains. Multiple chains initialized from different initial conditions should give similar results. If all chains converge to the same equilibrium, $\hat{R}$ will be 1. If the chains have not converged to a common distribution, $\hat{R}$ will be > 1.01. $\hat{R}$ is a necessary but not sufficient condition.

For details on the $\hat{R}$ see Gelman and Rubin (1992).

This linear regression example is from the original paper on PyMC3: Salvatier J, Wiecki TV, Fonnesbeck C. 2016. Probabilistic programming in Python using PyMC3. PeerJ Computer Science 2:e55 https://doi.org/10.7717/peerj-cs.55

In [ ]: