Section 5.2 — Bayesian inference computations¶

This notebook contains the code examples from Section 5.2 Bayesian inference computations from the No Bullshit Guide to Statistics.

Notebook setup¶

In [1]:

Copied!





# load Python modules
import os
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# load Python modules
import os
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:

Copied!





# Figures setup
plt.clf()  # needed otherwise `sns.set_theme` doesn"t work
from plot_helpers import RCPARAMS
RCPARAMS.update({"figure.figsize": (5, 3)})   # good for screen
# RCPARAMS.update({"figure.figsize": (5, 2)})  # good for print
sns.set_theme(
    context="paper",
    style="whitegrid",
    palette="colorblind",
    rc=RCPARAMS,
)

# High-resolution please
%config InlineBackend.figure_format = "retina"

# Where to store figures
from ministats.utils import savefigure
DESTDIR = "figures/bayes/computations"

#######################################################
# Figures setup
plt.clf()  # needed otherwise `sns.set_theme` doesn"t work
from plot_helpers import RCPARAMS
RCPARAMS.update({"figure.figsize": (5, 3)})   # good for screen
# RCPARAMS.update({"figure.figsize": (5, 2)})  # good for print
sns.set_theme(
    context="paper",
    style="whitegrid",
    palette="colorblind",
    rc=RCPARAMS,
)

# High-resolution please
%config InlineBackend.figure_format = "retina"

# Where to store figures
from ministats.utils import savefigure
DESTDIR = "figures/bayes/computations"

#######################################################

WARNING (pytensor.tensor.blas): Using NumPy C-API based implementation for BLAS functions.

<Figure size 640x480 with 0 Axes>

Definitions¶

TODO

In [ ]:

Posterior inference using MCMC estimation¶

TODO: point-form definitions

Bayesian inference using Bambi¶

TODO: mention Bambi classes

In [3]:

Copied!

import bambi as bmb
import bambi as bmb

In [4]:

Copied!

print(bmb.Prior.__doc__)
print(bmb.Prior.__doc__)

Abstract specification of a term prior

    Parameters
    ----------
    name : str
        Name of prior distribution. Must be the name of a PyMC distribution
        (e.g., `"Normal"`, `"Bernoulli"`, etc.)
    auto_scale: bool
        Whether to adjust the parameters of the prior or use them as passed. Default to `True`.
    kwargs : dict
        Optional keywords specifying the parameters of the named distribution.
    dist : pymc.distributions.distribution.DistributionMeta or callable
        A callable that returns a valid PyMC distribution. The signature must contain `name`,
        `dims`, and `shape`, as well as its own keyworded arguments.

In [5]:

Copied!

print("\n".join(bmb.Model.__doc__.splitlines()[0:27]))
print("\n".join(bmb.Model.__doc__.splitlines()[0:27]))

Specification of model class

    Parameters
    ----------
    formula : str or bambi.formula.Formula
        A model description written using the formula syntax from the `formulae` library.
    data : pandas.DataFrame
        A pandas dataframe containing the data on which the model will be fit, with column
        names matching variables defined in the formula.
    family : str or bambi.families.Family
        A specification of the model family (analogous to the family object in R). Either
        a string, or an instance of class `bambi.families.Family`. If a string is passed, a
        family with the corresponding name must be defined in the defaults loaded at `Model`
        initialization. Valid pre-defined families are `"bernoulli"`, `"beta"`,
        `"binomial"`, `"categorical"`, `"gamma"`, `"gaussian"`, `"negativebinomial"`,
        `"poisson"`, `"t"`, and `"wald"`. Defaults to `"gaussian"`.
    priors : dict
        Optional specification of priors for one or more terms. A dictionary where the keys are
        the names of terms in the model, "common," or "group_specific" and the values are
        instances of class `Prior`. If priors are unset, uses automatic priors inspired by
        the R rstanarm library.
    link : str or Dict[str, str]
        The name of the link function to use. Valid names are `"cloglog"`, `"identity"`,
        `"inverse_squared"`, `"inverse"`, `"log"`, `"logit"`, `"probit"`, and
        `"softmax"`. Not all the link functions can be used with all the families.
        If a dictionary, keys are the names of the target parameters and the values are the names
        of the link functions.

Example 1: estimating the probability of a bias of a coin¶

Let's revisit the biased coin example from Section 5.1, where we want to estimate the probability the coin turns up heads.

Data¶

We have a sample that contains $n=50$ observations (coin tosses) from the coin. The outcome 1 corresponds to heads, while the outcome 0 corresponds to tails. We package the data as the ctoss column of a new data frame ctosses.

In [6]:

Copied!





ctosses = [1,1,0,0,1,0,1,1,1,1,1,0,1,1,0,0,0,1,1,1,
           1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,0,0,1,0,
           0,1,0,1,0,1,0,1,1,0]
ctosses = pd.DataFrame({"ctoss":ctosses})
ctosses.head(3)
ctosses = [1,1,0,0,1,0,1,1,1,1,1,0,1,1,0,0,0,1,1,1,
           1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,0,0,1,0,
           0,1,0,1,0,1,0,1,1,0]
ctosses = pd.DataFrame({"ctoss":ctosses})
ctosses.head(3)

Out[6]:

	ctoss
0	1
1	1
2	0

The proportion of heads outcomes is:

In [7]:

Copied!

ctosses["ctoss"].mean()
ctosses["ctoss"].mean()

Out[7]:

0.68

Model¶

The model is

$$ C \sim \textrm{Bernoulli}(P), \qquad P \sim \mathcal{U}(0,1). $$

In [8]:

Copied!

import bambi as bmb
bmb.Prior("Uniform", lower=0, upper=1)
import bambi as bmb
bmb.Prior("Uniform", lower=0, upper=1)

Out[8]:

Uniform(lower: 0.0, upper: 1.0)

In [9]:

Copied!





priors1 = {
    "Intercept": bmb.Prior("Uniform", lower=0, upper=1)
    # "Intercept": bmb.Prior("Beta", alpha=1, beta=1),
}

mod1 = bmb.Model(formula="ctoss ~ 1",
                 family="bernoulli",
                 link="identity",
                 priors=priors1,
                 data=ctosses)
mod1.set_alias({"Intercept": "P"})
mod1
priors1 = {
    "Intercept": bmb.Prior("Uniform", lower=0, upper=1)
    # "Intercept": bmb.Prior("Beta", alpha=1, beta=1),
}

mod1 = bmb.Model(formula="ctoss ~ 1",
                 family="bernoulli",
                 link="identity",
                 priors=priors1,
                 data=ctosses)
mod1.set_alias({"Intercept": "P"})
mod1

Out[9]:

       Formula: ctoss ~ 1
        Family: bernoulli
          Link: p = identity
  Observations: 50
        Priors: 
    target = p
        Common-level effects
            Intercept ~ Uniform(lower: 0.0, upper: 1.0)

In [10]:

Copied!





mod1.build()
mod1.graph()

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example1_mod1_graph")
# mod1.graph(name=filename, fmt="png", dpi=300)
mod1.build()
mod1.graph()

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example1_mod1_graph")
# mod1.graph(name=filename, fmt="png", dpi=300)

Out[10]:

No description has been provided for this image

In [120]:

Copied!

# mod1.backend.model
# mod1.backend.model

Fitting the model¶

In [12]:

Copied!

idata1 = mod1.fit(random_seed=[42,42,44,45])
idata1 = mod1.fit(random_seed=[42,42,44,45])

Modeling the probability that ctoss==1
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [P]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.

Exploring `InferenceData` objects¶

In [14]:

Copied!

type(idata1)
type(idata1)

Out[14]:

arviz.data.inference_data.InferenceData

In [15]:

Copied!

idata1.groups()
idata1.groups()

Out[15]:

['posterior', 'sample_stats', 'observed_data']

In [16]:

Copied!

type(idata1["posterior"])
type(idata1["posterior"])

Out[16]:

xarray.core.dataset.Dataset

In [17]:

Copied!

list(idata1["posterior"].coords)
list(idata1["posterior"].coords)

Out[17]:

['chain', 'draw']

In [18]:

Copied!

list(idata1["posterior"].data_vars)
list(idata1["posterior"].data_vars)

Out[18]:

['P']

In [19]:

Copied!

type(idata1["posterior"]["P"])
type(idata1["posterior"]["P"])

Out[19]:

xarray.core.dataarray.DataArray

In [21]:

Copied!

type(idata1["posterior"]["P"].values)
type(idata1["posterior"]["P"].values)

Out[21]:

numpy.ndarray

In [22]:

Copied!

idata1["posterior"]["P"].values.round(3)
idata1["posterior"]["P"].values.round(3)

Out[22]:

array([[0.763, 0.751, 0.765, ..., 0.634, 0.619, 0.626],
       [0.683, 0.734, 0.743, ..., 0.63 , 0.658, 0.641],
       [0.561, 0.772, 0.725, ..., 0.615, 0.644, 0.697],
       [0.627, 0.691, 0.637, ..., 0.618, 0.629, 0.67 ]])

In [23]:

Copied!

idata1["posterior"]["P"].values.shape
idata1["posterior"]["P"].values.shape

Out[23]:

(4, 1000)

Extracting the samples from the posterior¶

In [24]:

Copied!

postP = idata1["posterior"]["P"].values.flatten()
postP
postP = idata1["posterior"]["P"].values.flatten()
postP

Out[24]:

array([0.7633857 , 0.75063231, 0.76484667, ..., 0.61816546, 0.62925079,
       0.67031565])

Visualize the posterior¶

Histogram¶

In [25]:

Copied!

ax = sns.histplot(x=postP, stat="density")
ax.set_xlabel("$p$")
ax.set_ylabel("$f_{P|\\mathbf{c}}$");
ax = sns.histplot(x=postP, stat="density")
ax.set_xlabel("$p$")
ax.set_ylabel("$f_{P|\\mathbf{c}}$");

Kernel density plot¶

In [26]:

Copied!

ax = sns.kdeplot(x=postP)
ax.set_xlabel("$p$")
ax.set_ylabel("$f_{P|\\mathbf{c}}$");
ax = sns.kdeplot(x=postP)
ax.set_xlabel("$p$")
ax.set_ylabel("$f_{P|\\mathbf{c}}$");

In [27]:

Copied!





# FIGURES ONLY
with plt.rc_context({"figure.figsize":(5.7,2)}):
    fig, axs = plt.subplots(1, 2, sharey=True)

    sns.histplot(x=postP, stat="density", ax=axs[0])
    axs[0].set_xlabel("$p$")
    axs[0].set_ylabel("$f_{P|\\mathbf{c}}$");
    axs[0].set_title("(a) Histogram of the posterior density")

    sns.kdeplot(x=postP, ax=axs[1])
    axs[1].set_xlabel("$p$")
    axs[1].set_title("(b) Kernel density plot of the posterior")
    
    filename = os.path.join(DESTDIR, "example1_histplot_and_kdeplot_postP.pdf")
    savefigure(plt.gcf(), filename)
# FIGURES ONLY
with plt.rc_context({"figure.figsize":(5.7,2)}):
    fig, axs = plt.subplots(1, 2, sharey=True)

    sns.histplot(x=postP, stat="density", ax=axs[0])
    axs[0].set_xlabel("$p$")
    axs[0].set_ylabel("$f_{P|\\mathbf{c}}$");
    axs[0].set_title("(a) Histogram of the posterior density")

    sns.kdeplot(x=postP, ax=axs[1])
    axs[1].set_xlabel("$p$")
    axs[1].set_title("(b) Kernel density plot of the posterior")
    
    filename = os.path.join(DESTDIR, "example1_histplot_and_kdeplot_postP.pdf")
    savefigure(plt.gcf(), filename)

Saved figure to figures/bayes/computations/example1_histplot_and_kdeplot_postP.pdf
Saved figure to figures/bayes/computations/example1_histplot_and_kdeplot_postP.png

Summarize the posterior¶

In [28]:

Copied!

len(postP)
len(postP)

Out[28]:

Posterior mean¶

In [29]:

Copied!

np.mean(postP)
np.mean(postP)

Out[29]:

0.6711720441074204

Posterior standard deviation¶

In [30]:

Copied!

np.std(postP)
np.std(postP)

Out[30]:

0.06381346718829903

Posterior median¶

In [31]:

Copied!

np.median(postP)
np.median(postP)

Out[31]:

0.6725321643738474

Posterior quartiles¶

In [32]:

Copied!

np.quantile(postP, [0.25, 0.5, 0.75])
np.quantile(postP, [0.25, 0.5, 0.75])

Out[32]:

array([0.62863239, 0.67253216, 0.71606159])

Posterior percentiles¶

In [33]:

Copied!

np.percentile(postP, [3, 97])
np.percentile(postP, [3, 97])

Out[33]:

array([0.54596545, 0.78705472])

Posterior mode¶

In [34]:

Copied!





# 1. Fit a Gaussian KDE curve approx. to the posterior samples `postP`
from scipy.stats import gaussian_kde
postP_kde = gaussian_kde(postP)

# 2. Find the max of the KDE curve
ps = np.linspace(0, 1, 10001)
densityP = postP_kde(ps)
ps[np.argmax(densityP)]
# 1. Fit a Gaussian KDE curve approx. to the posterior samples `postP`
from scipy.stats import gaussian_kde
postP_kde = gaussian_kde(postP)

# 2. Find the max of the KDE curve
ps = np.linspace(0, 1, 10001)
densityP = postP_kde(ps)
ps[np.argmax(densityP)]

Out[34]:

0.6867000000000001

Credible interval¶

In [35]:

Copied!

from ministats import hdi_from_samples
hdi_from_samples(postP, hdi_prob=0.9)
from ministats import hdi_from_samples
hdi_from_samples(postP, hdi_prob=0.9)

Out[35]:

[0.5706123254669888, 0.7792698739095439]

Example 2: estimating the IQ score¶

Let's return to the investigation of the smart drug effects on IQ scores.

Data¶

We have collected data from $30$ individuals who took the smart drug, which we have packaged into the iq column of the data frame iqs.

In [36]:

Copied!





iqs = [ 82.6, 105.5,  96.7,  84.0, 127.2,  98.8,  94.3,
       122.1,  86.8,  86.1, 107.9, 118.9, 116.5, 101.0,
        91.0, 130.0, 155.7, 120.8, 107.9, 117.1, 100.1,
       108.2,  99.8, 103.6, 108.1, 110.3, 101.8, 131.7,
       103.8, 116.4]
iqs = pd.DataFrame({"iq":iqs})
iqs["iq"].mean()
iqs = [ 82.6, 105.5,  96.7,  84.0, 127.2,  98.8,  94.3,
       122.1,  86.8,  86.1, 107.9, 118.9, 116.5, 101.0,
        91.0, 130.0, 155.7, 120.8, 107.9, 117.1, 100.1,
       108.2,  99.8, 103.6, 108.1, 110.3, 101.8, 131.7,
       103.8, 116.4]
iqs = pd.DataFrame({"iq":iqs})
iqs["iq"].mean()

Out[36]:

107.82333333333334

Model¶

We will use the following Bayesian model:

$$ X \sim \mathcal{N}(M, \sigma=15), \qquad M \sim \mathcal{N}(\mu_M=100,\sigma_M=40). $$

We place a broad prior on the mean parameter $M$, by choosing a large value for the standard deviation $\sigma_M$. We assume the IQ scores come from a population with standard deviation $\sigma = 15$.

In [37]:

Copied!





import bambi as bmb

priors2 = {
    "Intercept": bmb.Prior("Normal", mu=100, sigma=40),
    # "sigma": bmb.Prior("Data", value=15),
    "sigma": 15,  # CHANGE WHEN https://github.com/bambinos/bambi/pull/851/files LANDS
}

mod2 = bmb.Model(formula="iq ~ 1",
                 family="gaussian",
                 link="identity",
                 priors=priors2,
                 data=iqs)

mod2.set_alias({"Intercept": "M"})
mod2
import bambi as bmb

priors2 = {
    "Intercept": bmb.Prior("Normal", mu=100, sigma=40),
    # "sigma": bmb.Prior("Data", value=15),
    "sigma": 15,  # CHANGE WHEN https://github.com/bambinos/bambi/pull/851/files LANDS
}

mod2 = bmb.Model(formula="iq ~ 1",
                 family="gaussian",
                 link="identity",
                 priors=priors2,
                 data=iqs)

mod2.set_alias({"Intercept": "M"})
mod2

Out[37]:

       Formula: iq ~ 1
        Family: gaussian
          Link: mu = identity
  Observations: 30
        Priors: 
    target = mu
        Common-level effects
            Intercept ~ Normal(mu: 100.0, sigma: 40.0)
        
        Auxiliary parameters
            sigma ~ 15

In [38]:

Copied!





mod2.build()
mod2.graph()

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example2_mod2_graph")
# mod2.graph(name=filename, fmt="png", dpi=300)
mod2.build()
mod2.graph()

# # FIGURES ONLY
# filename = os.path.join(DESTDIR, "example2_mod2_graph")
# mod2.graph(name=filename, fmt="png", dpi=300)

Out[38]:

Fitting the model¶

This time we'll generate $N=2000$ samples from each chain.

In [39]:

Copied!

idata2 = mod2.fit(draws=2000, random_seed=[42,43,44,45])
idata2 = mod2.fit(draws=2000, random_seed=[42,43,44,45])

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [M]

Output()

Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 1 seconds.

Extract the samples from the posterior¶

In [40]:

Copied!

postM = idata2["posterior"]["M"].values.flatten()
len(postM)
postM = idata2["posterior"]["M"].values.flatten()
len(postM)

Out[40]:

In [41]:

Copied!

postM
postM

Out[41]:

array([109.72320604, 108.77814301, 108.41518482, ..., 113.22965959,
       109.71272604, 102.9694779 ])

Visualize the posterior¶

Histogram¶

In [42]:

Copied!

ax = sns.histplot(x=postM, stat="density")
ax.set_xlabel("$\\mu$")
ax.set_ylabel("$f_{M|\\mathbf{x}}$");
ax = sns.histplot(x=postM, stat="density")
ax.set_xlabel("$\\mu$")
ax.set_ylabel("$f_{M|\\mathbf{x}}$");

Kernel density plot¶

In [43]:

Copied!

ax = sns.kdeplot(x=postM)
ax.set_xlabel("$\\mu$")
ax.set_ylabel("$f_{M|\\mathbf{x}}$");
ax = sns.kdeplot(x=postM)
ax.set_xlabel("$\\mu$")
ax.set_ylabel("$f_{M|\\mathbf{x}}$");

In [44]:

Copied!





# FIGURES ONLY
with plt.rc_context({"figure.figsize":(5.7,2)}):
    fig, axs = plt.subplots(1, 2, sharey=True)

    sns.histplot(x=postM, stat="density", ax=axs[0])
    axs[0].set_xlabel("$\\mu$")
    axs[0].set_ylabel("$f_{M|\\mathbf{x}}$");
    axs[0].set_title("(a) Histogram of the posterior density")

    sns.kdeplot(x=postM, ax=axs[1])
    axs[1].set_xlabel("$\\mu$")
    axs[1].set_title("(b) Kernel density plot of the posterior")
    
    filename = os.path.join(DESTDIR, "example2_histplot_and_kdeplot_postM.pdf")
    savefigure(plt.gcf(), filename)
# FIGURES ONLY
with plt.rc_context({"figure.figsize":(5.7,2)}):
    fig, axs = plt.subplots(1, 2, sharey=True)

    sns.histplot(x=postM, stat="density", ax=axs[0])
    axs[0].set_xlabel("$\\mu$")
    axs[0].set_ylabel("$f_{M|\\mathbf{x}}$");
    axs[0].set_title("(a) Histogram of the posterior density")

    sns.kdeplot(x=postM, ax=axs[1])
    axs[1].set_xlabel("$\\mu$")
    axs[1].set_title("(b) Kernel density plot of the posterior")
    
    filename = os.path.join(DESTDIR, "example2_histplot_and_kdeplot_postM.pdf")
    savefigure(plt.gcf(), filename)

Saved figure to figures/bayes/computations/example2_histplot_and_kdeplot_postM.pdf
Saved figure to figures/bayes/computations/example2_histplot_and_kdeplot_postM.png

Summarize the posterior¶

Posterior mean¶

In [45]:

Copied!

np.mean(postM)
np.mean(postM)

Out[45]:

107.70792049114215

Posterior standard deviation¶

In [46]:

Copied!

np.std(postM)
np.std(postM)

Out[46]:

2.747782246509056

Posterior median¶

In [47]:

Copied!

np.median(postM)
np.median(postM)

Out[47]:

107.71896436263302

Posterior quartiles¶

In [48]:

Copied!

np.quantile(postM, [0.25, 0.5, 0.75])
np.quantile(postM, [0.25, 0.5, 0.75])

Out[48]:

array([105.8404028 , 107.71896436, 109.58680961])

Posterior percentiles¶

In [49]:

Copied!

np.percentile(postM, [3, 97])
np.percentile(postM, [3, 97])

Out[49]:

array([102.53098902, 112.85640869])

Posterior mode¶

In [50]:

Copied!





# 1. Fit a Gaussian KDE curve approx. to the posterior samples `postM`
from scipy.stats import gaussian_kde
postM_kde = gaussian_kde(postM)

# 2. Find the max of the KDE curve
mus = np.linspace(0, 200, 10001)
densityM = postM_kde(mus)
mus[np.argmax(densityM)]
# 1. Fit a Gaussian KDE curve approx. to the posterior samples `postM`
from scipy.stats import gaussian_kde
postM_kde = gaussian_kde(postM)

# 2. Find the max of the KDE curve
mus = np.linspace(0, 200, 10001)
densityM = postM_kde(mus)
mus[np.argmax(densityM)]

Out[50]:

107.9

Credible interval¶

In [51]:

Copied!

from ministats import hdi_from_samples
hdi_from_samples(postM, hdi_prob=0.9)
from ministats import hdi_from_samples
hdi_from_samples(postM, hdi_prob=0.9)

Out[51]:

[103.26580730069273, 112.284577803755]

Visualizing and interpreting posteriors¶

In [52]:

Copied!

import arviz as az
import arviz as az

Example 1 continued: inferences about the biased coin¶

Extracting samples (CUTTABLE)¶

For the purpose of our data analysis, we don't care about the chain and draw information and are only interested in in the variable P, which are the actual samples from the posterior. The easiest way to extract the samples from the posterior is to use the ArviZ helper method az.extract shown below.

In [53]:

Copied!

postP = az.extract(idata1, var_names="P").values
postP.round(3)
postP = az.extract(idata1, var_names="P").values
postP.round(3)

Out[53]:

array([0.763, 0.751, 0.765, ..., 0.618, 0.629, 0.67 ])

The call to the function az.extract selected the posterior group, then selected the variable P within the posterior group.

/CUTTABLE

Summarizing the posterior¶

In [54]:

Copied!

print(az.summary(idata1, kind="stats", hdi_prob=0.9))
print(az.summary(idata1, kind="stats", hdi_prob=0.9))

    mean     sd  hdi_5%  hdi_95%
P  0.671  0.064   0.571    0.779

In [55]:

Copied!

print(az.summary(idata1, kind="stats", stat_focus="median"))
print(az.summary(idata1, kind="stats", stat_focus="median"))

   median    mad  eti_3%  eti_97%
P   0.673  0.044   0.546    0.787

Plotting the posterior¶

In [56]:

Copied!

az.plot_posterior(idata1, var_names="P", hdi_prob=0.9);
az.plot_posterior(idata1, var_names="P", hdi_prob=0.9);

In [57]:

Copied!





# FIGURES ONLY
az.plot_posterior(idata1, var_names="P", hdi_prob=0.9, round_to=3);
filename = os.path.join(DESTDIR, "example1_arviz_plot_posterior.pdf")
savefigure(plt.gcf(), filename)
# FIGURES ONLY
az.plot_posterior(idata1, var_names="P", hdi_prob=0.9, round_to=3);
filename = os.path.join(DESTDIR, "example1_arviz_plot_posterior.pdf")
savefigure(plt.gcf(), filename)

Saved figure to figures/bayes/computations/example1_arviz_plot_posterior.pdf
Saved figure to figures/bayes/computations/example1_arviz_plot_posterior.png

Options:

If multiple variables, you specify a list to var_names to select only certain variables to plot
Set the option point_estimate to "mode" or "median"
Add the option rope to draw region of practical equivalence, e.g., rope=[97,103]

In [58]:

Copied!

az.plot_forest(idata1, hdi_prob=0.9);
az.plot_forest(idata1, hdi_prob=0.9);

In [59]:

Copied!





# FIGURES ONLY
az.plot_forest(idata1, hdi_prob=0.9, figsize=(6,2.2));
filename = os.path.join(DESTDIR, "example1_arviz_plot_forest.pdf")
savefigure(plt.gcf(), filename)
# FIGURES ONLY
az.plot_forest(idata1, hdi_prob=0.9, figsize=(6,2.2));
filename = os.path.join(DESTDIR, "example1_arviz_plot_forest.pdf")
savefigure(plt.gcf(), filename)

Saved figure to figures/bayes/computations/example1_arviz_plot_forest.pdf
Saved figure to figures/bayes/computations/example1_arviz_plot_forest.png

Example 2 continued: inferences about the population mean¶

Summarizing the posterior¶

In [60]:

Copied!

az.summary(idata2, kind="stats", hdi_prob=0.9)
az.summary(idata2, kind="stats", hdi_prob=0.9)

Out[60]:

	mean	sd	hdi_5%	hdi_95%
M	107.708	2.748	103.266	112.285

In [61]:

Copied!

print(az.summary(idata2, kind="stats", stat_focus="median"))
print(az.summary(idata2, kind="stats", stat_focus="median"))

    median    mad   eti_3%  eti_97%
M  107.719  1.871  102.531  112.856

Plotting the posterior¶

In [62]:

Copied!

az.plot_posterior(idata2, var_names="M", hdi_prob=0.9);
az.plot_posterior(idata2, var_names="M", hdi_prob=0.9);

In [63]:

Copied!

az.plot_posterior(idata2, var_names="M", hdi_prob=0.9, kind="hist", bins=70);
az.plot_posterior(idata2, var_names="M", hdi_prob=0.9, kind="hist", bins=70);

In [64]:

Copied!





# FIGURES ONLY
with plt.rc_context({"figure.figsize":(5.7,2)}):
    fig, axs = plt.subplots(1,2)
    az.plot_posterior(idata2, var_names="M", hdi_prob=0.9, ax=axs[0]);

    axs[0].set_title("(a) Kernel density plot of the posterior")
    axs[0].set_xlabel("$\\mu$")
    axs[0].xaxis.set_label_coords(0.5, -0.13)

    az.plot_posterior(idata2, var_names="M", hdi_prob=0.9, kind="hist", bins=70, ax=axs[1]);
    axs[1].set_title("(b) Histogram of the posterior density")
    axs[1].set_xlabel("$\\mu$")
    axs[1].xaxis.set_label_coords(0.5, -0.13)

    filename = os.path.join(DESTDIR, "example2_arviz_plot_posterior.pdf")
    savefigure(plt.gcf(), filename)
# FIGURES ONLY
with plt.rc_context({"figure.figsize":(5.7,2)}):
    fig, axs = plt.subplots(1,2)
    az.plot_posterior(idata2, var_names="M", hdi_prob=0.9, ax=axs[0]);

    axs[0].set_title("(a) Kernel density plot of the posterior")
    axs[0].set_xlabel("$\\mu$")
    axs[0].xaxis.set_label_coords(0.5, -0.13)

    az.plot_posterior(idata2, var_names="M", hdi_prob=0.9, kind="hist", bins=70, ax=axs[1]);
    axs[1].set_title("(b) Histogram of the posterior density")
    axs[1].set_xlabel("$\\mu$")
    axs[1].xaxis.set_label_coords(0.5, -0.13)

    filename = os.path.join(DESTDIR, "example2_arviz_plot_posterior.pdf")
    savefigure(plt.gcf(), filename)

Saved figure to figures/bayes/computations/example2_arviz_plot_posterior.pdf
Saved figure to figures/bayes/computations/example2_arviz_plot_posterior.png

In [65]:

Copied!

az.plot_forest(idata2, hdi_prob=0.9, combined=True, figsize=(6,1.5));
az.plot_forest(idata2, hdi_prob=0.9, combined=True, figsize=(6,1.5));

In [66]:

Copied!





# FIGURES ONLY
az.plot_forest(idata2, hdi_prob=0.9, combined=True, figsize=(6,1.3))
filename = os.path.join(DESTDIR, "example2_arviz_plot_forest.pdf")
savefigure(plt.gcf(), filename)
# FIGURES ONLY
az.plot_forest(idata2, hdi_prob=0.9, combined=True, figsize=(6,1.3))
filename = os.path.join(DESTDIR, "example2_arviz_plot_forest.pdf")
savefigure(plt.gcf(), filename)

Saved figure to figures/bayes/computations/example2_arviz_plot_forest.pdf
Saved figure to figures/bayes/computations/example2_arviz_plot_forest.png

In [67]:

Copied!

az.plot_forest(idata2, combined=True, hdi_prob=0.9, figsize=(6,0.6));
az.plot_forest(idata2, combined=True, hdi_prob=0.9, figsize=(6,0.6));

In [68]:

Copied!

# TODO: investigate further once predict bug fixed
# az.plot_ppc(idata2_pred)
# TODO: investigate further once predict bug fixed
# az.plot_ppc(idata2_pred)

Explanations¶

Visualizing prior distributions¶

In [69]:

Copied!

mod2.plot_priors(random_seed=43, color="C1", linestyle="dashed");
mod2.plot_priors(random_seed=43, color="C1", linestyle="dashed");

Sampling: [M]

In [70]:

Copied!





# FIGURES ONLY
with plt.rc_context({"figure.figsize":(5,1.5)}):
    ax = mod2.plot_priors(random_seed=43, color="C1", linestyle="dashed")
    ax.set_title(None)
    # ax.set_xlabel("$\\mu$")
    ax.set_xlabel(None)
    ax.set_ylabel("$f_{M}$")
    
    filename = os.path.join(DESTDIR, "example2_mod2_plot_priors.pdf")
    savefigure(plt.gcf(), filename)
# FIGURES ONLY
with plt.rc_context({"figure.figsize":(5,1.5)}):
    ax = mod2.plot_priors(random_seed=43, color="C1", linestyle="dashed")
    ax.set_title(None)
    # ax.set_xlabel("$\\mu$")
    ax.set_xlabel(None)
    ax.set_ylabel("$f_{M}$")
    
    filename = os.path.join(DESTDIR, "example2_mod2_plot_priors.pdf")
    savefigure(plt.gcf(), filename)

Sampling: [M]

Saved figure to figures/bayes/computations/example2_mod2_plot_priors.pdf
Saved figure to figures/bayes/computations/example2_mod2_plot_priors.png

Bambi default priors¶

In [71]:

Copied!

iqs["iq"].mean(), iqs["iq"].std(ddof=0), 2.5*iqs["iq"].std(ddof=0)
iqs["iq"].mean(), iqs["iq"].std(ddof=0), 2.5*iqs["iq"].std(ddof=0)

Out[71]:

(107.82333333333334, 15.812119472803834, 39.53029868200959)

In [72]:

Copied!

mod2d = bmb.Model("iq ~ 1", family="gaussian", data=iqs)
mod2d
mod2d = bmb.Model("iq ~ 1", family="gaussian", data=iqs)
mod2d

Out[72]:

       Formula: iq ~ 1
        Family: gaussian
          Link: mu = identity
  Observations: 30
        Priors: 
    target = mu
        Common-level effects
            Intercept ~ Normal(mu: 107.8233, sigma: 39.5303)
        
        Auxiliary parameters
            sigma ~ HalfStudentT(nu: 4.0, sigma: 15.8121)

In [73]:

Copied!

mod2d.build()
mod2d.plot_priors();
mod2d.build()
mod2d.plot_priors();

Sampling: [Intercept, sigma]

More about inference data objects¶

In [75]:

Copied!

type(idata1)
type(idata1)

Out[75]:

arviz.data.inference_data.InferenceData

In [76]:

Copied!

idata1.groups()
idata1.groups()

Out[76]:

['posterior', 'sample_stats', 'observed_data']

Groups are `Dataset` objects¶

In [77]:

Copied!

post1 = idata1["posterior"]
type(post1)
post1 = idata1["posterior"]
type(post1)

Out[77]:

xarray.core.dataset.Dataset

In [78]:

Copied!

# post1
# post1

In [79]:

Copied!

post1.coords
post1.coords

Out[79]:

Coordinates:
  * chain    (chain) int64 32B 0 1 2 3
  * draw     (draw) int64 8kB 0 1 2 3 4 5 6 7 ... 993 994 995 996 997 998 999

In [80]:

Copied!

post1.data_vars
post1.data_vars

Out[80]:

Data variables:
    P        (chain, draw) float64 32kB 0.7634 0.7506 0.7648 ... 0.6293 0.6703

Variables are `DataArray` objects¶

In [81]:

Copied!

Ps = post1["P"]  # == idata1["posterior"]["P"]
type(Ps)
Ps = post1["P"]  # == idata1["posterior"]["P"]
type(Ps)

Out[81]:

xarray.core.dataarray.DataArray

In [82]:

Copied!

# Ps
# Ps

In [83]:

Copied!

Ps.shape
Ps.shape

Out[83]:

(4, 1000)

In [84]:

Copied!

Ps.name
Ps.name

Out[84]:

'P'

In [85]:

Copied!

Ps.to_dataframe()
Ps.to_dataframe()

Out[85]:

		P
chain	draw
0	0	0.763386
	1	0.750632
	2	0.764847
	3	0.693797
	4	0.737926
...	...	...
3	995	0.647555
	996	0.631131
	997	0.618165
	998	0.629251
	999	0.670316

4000 rows × 1 columns

In [86]:

Copied!

Ps.values
Ps.values

Out[86]:

array([[0.7633857 , 0.75063231, 0.76484667, ..., 0.63406509, 0.61862731,
        0.62594321],
       [0.68252483, 0.7336341 , 0.74339606, ..., 0.63036481, 0.65844829,
        0.64060878],
       [0.56138272, 0.77242737, 0.72531302, ..., 0.61454399, 0.64448805,
        0.69652911],
       [0.62745546, 0.69083019, 0.63706012, ..., 0.61816546, 0.62925079,
        0.67031565]])

In [87]:

Copied!

Ps.values.flatten()
Ps.values.flatten()

Out[87]:

array([0.7633857 , 0.75063231, 0.76484667, ..., 0.61816546, 0.62925079,
       0.67031565])

MCMC diagnostics¶

Trace plots¶

There are several Arviz plots we can use to check if the Markov Chain Monte Carlo chains were sampling from the posterior as expected, or ...

In [88]:

Copied!

az.plot_trace(idata2);
az.plot_trace(idata2);

In [89]:

Copied!

az.plot_rank(idata2);
az.plot_rank(idata2);

In [90]:

Copied!

az.plot_trace(idata2);
az.plot_trace(idata2);

In [91]:

Copied!





# FIGURES ONLY
axs = az.plot_trace(idata2, figsize=(8,2))
ax1, ax2 = axs[0,0], axs[0,1]
#
ax1.set_title("Posterior distribution")
ax1.set_xlabel(None)
ax1.set_xticks(np.arange(95,120,2.5))
# 
ax2.set_title("Trace plot")
ax2.set_xticks(np.arange(0,2001,250))
#
filename = os.path.join(DESTDIR, "example2_diagnostics_trace.pdf")
savefigure(plt.gcf(), filename)
# FIGURES ONLY
axs = az.plot_trace(idata2, figsize=(8,2))
ax1, ax2 = axs[0,0], axs[0,1]
#
ax1.set_title("Posterior distribution")
ax1.set_xlabel(None)
ax1.set_xticks(np.arange(95,120,2.5))
# 
ax2.set_title("Trace plot")
ax2.set_xticks(np.arange(0,2001,250))
#
filename = os.path.join(DESTDIR, "example2_diagnostics_trace.pdf")
savefigure(plt.gcf(), filename)

Saved figure to figures/bayes/computations/example2_diagnostics_trace.pdf
Saved figure to figures/bayes/computations/example2_diagnostics_trace.png

Rank plots¶

In [92]:

Copied!

az.plot_rank(idata2);
az.plot_rank(idata2);

In [93]:

Copied!





# FIGURES ONLY
with plt.rc_context({"figure.figsize":(5,2.7)}):
    ax = az.plot_rank(idata2)
    ax.set_title(None)
    filename = os.path.join(DESTDIR, "example2_diagnostics_rank.pdf")
    savefigure(plt.gcf(), filename)
# FIGURES ONLY
with plt.rc_context({"figure.figsize":(5,2.7)}):
    ax = az.plot_rank(idata2)
    ax.set_title(None)
    filename = os.path.join(DESTDIR, "example2_diagnostics_rank.pdf")
    savefigure(plt.gcf(), filename)

Saved figure to figures/bayes/computations/example2_diagnostics_rank.pdf
Saved figure to figures/bayes/computations/example2_diagnostics_rank.png

? Autocorrelation plot ?¶

https://python.arviz.org/en/stable/api/generated/arviz.plot_autocorr.html

In [94]:

Copied!

az.plot_autocorr(idata2, combined=True, max_lag=20);
az.plot_autocorr(idata2, combined=True, max_lag=20);

Diagnostic summary¶

In [95]:

Copied!

az.summary(idata2, kind="stats")
az.summary(idata2, kind="stats")

Out[95]:

	mean	sd	hdi_3%	hdi_97%
M	107.708	2.748	102.911	113.174

In [96]:

Copied!

az.summary(idata2, kind="diagnostics")
az.summary(idata2, kind="diagnostics")

Out[96]:

	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
M	0.044	0.031	3854.0	5649.0	1.0

Bayesian workflow¶

Prior predictive checks¶

In [97]:

Copied!

idata2_pri = mod2.prior_predictive(draws=50, random_seed=45)
az.plot_ppc(idata2_pri, group="prior");
idata2_pri = mod2.prior_predictive(draws=50, random_seed=45)
az.plot_ppc(idata2_pri, group="prior");

Sampling: [M, iq]

In [98]:

Copied!

idata2_pri['prior']["mu"].shape
idata2_pri['prior']["mu"].shape

Out[98]:

(1, 50, 30)

In [99]:

Copied!

idata2_pri['prior_predictive']["iq"].shape
idata2_pri['prior_predictive']["iq"].shape

Out[99]:

(1, 50, 30)

In [100]:

Copied!





# FIGURES ONLY
with plt.rc_context({"figure.figsize":(4.5,2.3)}):
    ax = az.plot_ppc(idata2_pri, group="prior", alpha=0.3);
    ax.set_xlabel(None)
    filename = os.path.join(DESTDIR, "example2_prior_predicive_checks.pdf")
    savefigure(plt.gcf(), filename)
# FIGURES ONLY
with plt.rc_context({"figure.figsize":(4.5,2.3)}):
    ax = az.plot_ppc(idata2_pri, group="prior", alpha=0.3);
    ax.set_xlabel(None)
    filename = os.path.join(DESTDIR, "example2_prior_predicive_checks.pdf")
    savefigure(plt.gcf(), filename)

Saved figure to figures/bayes/computations/example2_prior_predicive_checks.pdf
Saved figure to figures/bayes/computations/example2_prior_predicive_checks.png

Posterior predictive checks¶

In [101]:

Copied!





# Randomly sample 200 from all available posterior mean
# draws_subset = np.random.choice(idata2["posterior"]["draw"].values, 50, replace=False)
draws_subset = np.random.choice(2000, 50, replace=False)
idata2_rep = idata2.sel(draw=draws_subset)
mod2.predict(idata2_rep, kind="response")
az.plot_ppc(idata2_rep, group="posterior")
# Randomly sample 200 from all available posterior mean
# draws_subset = np.random.choice(idata2["posterior"]["draw"].values, 50, replace=False)
draws_subset = np.random.choice(2000, 50, replace=False)
idata2_rep = idata2.sel(draw=draws_subset)
mod2.predict(idata2_rep, kind="response")
az.plot_ppc(idata2_rep, group="posterior")

Out[101]:

<Axes: xlabel='iq'>

In [102]:

Copied!





# FIGURES ONLY
import pymc as pm

np.random.seed(42)
selected_draws = np.random.choice(idata2["posterior"]["draw"].values, 50, replace=False)
idata2_rep = idata2.sel(draw=selected_draws)


# Sample from posterior predictive using a fixed seed to ensure repeatability 
with mod2.backend.model:
    idata2_rep = pm.sample_posterior_predictive(idata2_rep, random_seed=57)

# plot
with plt.rc_context({"figure.figsize":(4.5,2.3)}):
    ax = az.plot_ppc(idata2_rep, group="posterior", random_seed=45);
    ax.set_xlabel(None)
    filename = os.path.join(DESTDIR, "example2_posterior_predicive_checks.pdf")
    savefigure(plt.gcf(), filename)
# FIGURES ONLY
import pymc as pm

np.random.seed(42)
selected_draws = np.random.choice(idata2["posterior"]["draw"].values, 50, replace=False)
idata2_rep = idata2.sel(draw=selected_draws)


# Sample from posterior predictive using a fixed seed to ensure repeatability 
with mod2.backend.model:
    idata2_rep = pm.sample_posterior_predictive(idata2_rep, random_seed=57)

# plot
with plt.rc_context({"figure.figsize":(4.5,2.3)}):
    ax = az.plot_ppc(idata2_rep, group="posterior", random_seed=45);
    ax.set_xlabel(None)
    filename = os.path.join(DESTDIR, "example2_posterior_predicive_checks.pdf")
    savefigure(plt.gcf(), filename)

Sampling: [iq]

Output()

Saved figure to figures/bayes/computations/example2_posterior_predicive_checks.pdf
Saved figure to figures/bayes/computations/example2_posterior_predicive_checks.png

Fitting model to synthetic data¶

In [103]:

Copied!





np.random.seed(46)
from scipy.stats import norm
fakeiqs = norm(loc=110, scale=15).rvs(30)
fakeiqs = pd.DataFrame({"iq":fakeiqs})
fakeiqs["iq"].mean()
np.random.seed(46)
from scipy.stats import norm
fakeiqs = norm(loc=110, scale=15).rvs(30)
fakeiqs = pd.DataFrame({"iq":fakeiqs})
fakeiqs["iq"].mean()

Out[103]:

109.7132680142492

In [104]:

Copied!





#######################################################
mod2f = bmb.Model(formula="iq ~ 1", family="gaussian",
                  priors=priors2, data=fakeiqs)
mod2f.set_alias({"Intercept": "M"})
idata2f = mod2f.fit()
#######################################################
mod2f = bmb.Model(formula="iq ~ 1", family="gaussian",
                  priors=priors2, data=fakeiqs)
mod2f.set_alias({"Intercept": "M"})
idata2f = mod2f.fit()

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [M]

Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1 seconds.

In [105]:

Copied!

print(az.summary(idata2f, kind="stats"))
print(az.summary(idata2f, kind="stats"))

      mean     sd   hdi_3%  hdi_97%
M  109.584  2.691  104.812   114.97

In [106]:

Copied!

az.plot_posterior(idata2f, ref_val=110);
az.plot_posterior(idata2f, ref_val=110);

In [107]:

Copied!





# FIGURES ONLY
with plt.rc_context({"figure.figsize":(4.5,1.6)}):
    ax = az.plot_posterior(idata2f, ref_val=110)
    ax.set_title(None)
    filename = os.path.join(DESTDIR, "example2_mod2f_synthetic_data_posterior.pdf")
    savefigure(plt.gcf(), filename)
# FIGURES ONLY
with plt.rc_context({"figure.figsize":(4.5,1.6)}):
    ax = az.plot_posterior(idata2f, ref_val=110)
    ax.set_title(None)
    filename = os.path.join(DESTDIR, "example2_mod2f_synthetic_data_posterior.pdf")
    savefigure(plt.gcf(), filename)

Saved figure to figures/bayes/computations/example2_mod2f_synthetic_data_posterior.pdf
Saved figure to figures/bayes/computations/example2_mod2f_synthetic_data_posterior.png

Sensitivity analysis¶

In [108]:

Copied!

# TODO: reproduce sensitivity analysis from Secion 5.1
# TODO: reproduce sensitivity analysis from Secion 5.1

Iterative model building¶

Discussion¶

Other types of priors¶

Conjugate priors¶

Uninformative priors¶

Software for Bayesian inference¶

Computational cost¶

Reporting Bayesian results¶

Model predictions (BONUS TOPIC)¶

Coin toss predictions¶

Let's generate some observations form the posterior predictive distribution.

In [109]:

Copied!





idata1_pred = idata1.copy()
# MAYBE: alt notation _rep instead of _pred ?
mod1.predict(idata1_pred, kind="response")
ctosses_pred = az.extract(idata1_pred,
                          group="posterior_predictive",
                          var_names=["ctoss"]).values
numheads_pred = ctosses_pred.sum(axis=0)

sns.histplot(x=numheads_pred, bins=range(15,50));
idata1_pred = idata1.copy()
# MAYBE: alt notation _rep instead of _pred ?
mod1.predict(idata1_pred, kind="response")
ctosses_pred = az.extract(idata1_pred,
                          group="posterior_predictive",
                          var_names=["ctoss"]).values
numheads_pred = ctosses_pred.sum(axis=0)

sns.histplot(x=numheads_pred, bins=range(15,50));

IQ predictions¶

In [110]:

Copied!





## Can't use Bambi predict because of issue
## https://github.com/bambinos/bambi/issues/850
# idata2_pred = mod2.predict(idata2, kind="response", inplace=False)
# preds2 = az.extract(idata2_pred, group="posterior", var_names=["mu"]).values.flatten()
## Can't use Bambi predict because of issue
## https://github.com/bambinos/bambi/issues/850
# idata2_pred = mod2.predict(idata2, kind="response", inplace=False)
# preds2 = az.extract(idata2_pred, group="posterior", var_names=["mu"]).values.flatten()

In [111]:

Copied!





from scipy.stats import norm

sigma = 15  # known population standard deviation

iq_preds = []
np.random.seed(42)
for i in range(1000):
    mu_post = np.random.choice(postM)
    iq_pred = norm(loc=mu_post, scale=sigma).rvs(1)[0]
    iq_preds.append(iq_pred)

sns.histplot(x=iq_preds);
from scipy.stats import norm

sigma = 15  # known population standard deviation

iq_preds = []
np.random.seed(42)
for i in range(1000):
    mu_post = np.random.choice(postM)
    iq_pred = norm(loc=mu_post, scale=sigma).rvs(1)[0]
    iq_preds.append(iq_pred)

sns.histplot(x=iq_preds);

Exercises¶

TODO: repeat exercises form Sec 5.1 using Bambi and ArviZ

Exercise A¶

Reproduce the analysis of the IQ scores data from Example 2, but this time use a model with Bambi default priors.

a) Plot the priors using plot_priors.
b) Compute the posterior mean, std, and a 90% hdi.
c) Compare your result in (b) to the results we obtained from Example 2.

In [112]:

Copied!





mod2d = bmb.Model("iq ~ 1", family="gaussian", data=iqs)
mod2d.set_alias({"Intercept": "M"})
mod2d.build()
mod2d.plot_priors();
mod2d = bmb.Model("iq ~ 1", family="gaussian", data=iqs)
mod2d.set_alias({"Intercept": "M"})
mod2d.build()
mod2d.plot_priors();

Sampling: [M, sigma]

In [113]:

Copied!

idata2d = mod2d.fit(draws=2000)
az.summary(idata2d, var_names="M", kind="stats", hdi_prob=0.9)
idata2d = mod2d.fit(draws=2000)
az.summary(idata2d, var_names="M", kind="stats", hdi_prob=0.9)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sigma, M]

Output()

Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 1 seconds.

Out[113]:

	mean	sd	hdi_5%	hdi_95%
M	107.802	3.017	102.891	112.816

In [114]:

Copied!

az.summary(idata2, kind="stats", hdi_prob=0.9)
az.summary(idata2, kind="stats", hdi_prob=0.9)

Out[114]:

	mean	sd	hdi_5%	hdi_95%
M	107.708	2.748	103.266	112.285

In [115]:

Copied!

az.plot_forest(data=[idata2, idata2d],
               model_names=["Example 2 prior", "Default priors"],
               var_names="M", combined=True, figsize=(7,2));
az.plot_forest(data=[idata2, idata2d],
               model_names=["Example 2 prior", "Default priors"],
               var_names="M", combined=True, figsize=(7,2));

Exercise B¶

Repeat the analysis in Example 2, but this time use the exponential prior $\text{Expon}(\lambda=0.1)$ on the parameter sigma. Calculate:

a) Plot the priors using plot_priors.
b) Compute posterior mean of M and sigma.
c) Compute the posterior median of M and sigma.
d) Compute 90% credible interval for M and sigma.

In [116]:

Copied!





import bambi as bmb
priors2e = {
    "Intercept": bmb.Prior("Normal", mu=100, sigma=40),
    "sigma": bmb.Prior("Exponential", lam=0.1),
}

mod2e = bmb.Model(formula="iq ~ 1",
                  family="gaussian",
                  link="identity",
                  priors=priors2e,
                  data=iqs)
mod2e.set_alias({"Intercept": "M"})
import bambi as bmb
priors2e = {
    "Intercept": bmb.Prior("Normal", mu=100, sigma=40),
    "sigma": bmb.Prior("Exponential", lam=0.1),
}

mod2e = bmb.Model(formula="iq ~ 1",
                  family="gaussian",
                  link="identity",
                  priors=priors2e,
                  data=iqs)
mod2e.set_alias({"Intercept": "M"})

In [117]:

Copied!

mod2e.build()
mod2e.plot_priors();
mod2e.build()
mod2e.plot_priors();

Sampling: [M, sigma]

In [118]:

Copied!

# idata2e = mod2e.fit()
# az.plot_posterior(idata2e)
# idata2e = mod2e.fit()
# az.plot_posterior(idata2e)

In [119]:

Copied!





# JOINT PLOT
# M_samples = idata2e.posterior['M'].values.flatten()
# sigma_samples = idata2e.posterior['sigma'].values.flatten()
# sns.kdeplot(x=M_samples, y=sigma_samples, fill=True)
# plt.xlabel('M')
# plt.ylabel('Sigma')
# plt.title('Density Plot of Mean vs Sigma')
# JOINT PLOT
# M_samples = idata2e.posterior['M'].values.flatten()
# sigma_samples = idata2e.posterior['sigma'].values.flatten()
# sns.kdeplot(x=M_samples, y=sigma_samples, fill=True)
# plt.xlabel('M')
# plt.ylabel('Sigma')
# plt.title('Density Plot of Mean vs Sigma')

Exercise C¶

In [ ]:

Links¶

TODO

In [ ]:

Section 5.2 — Bayesian inference computations¶

Notebook setup¶

Definitions¶

Posterior inference using MCMC estimation¶

Bayesian inference using Bambi¶

Example 1: estimating the probability of a bias of a coin¶

Data¶

Model¶

Fitting the model¶

Exploring InferenceData objects¶

Extracting the samples from the posterior¶

Visualize the posterior¶

Histogram¶

Kernel density plot¶

Summarize the posterior¶

Posterior mean¶

Posterior standard deviation¶

Posterior median¶

Posterior quartiles¶

Posterior percentiles¶

Posterior mode¶

Credible interval¶

Example 2: estimating the IQ score¶

Data¶

Model¶

Fitting the model¶

Extract the samples from the posterior¶

Visualize the posterior¶

Histogram¶

Kernel density plot¶

Summarize the posterior¶

Posterior mean¶

Posterior standard deviation¶

Posterior median¶

Posterior quartiles¶

Posterior percentiles¶

Posterior mode¶

Credible interval¶

Visualizing and interpreting posteriors¶

Example 1 continued: inferences about the biased coin¶

Extracting samples (CUTTABLE)¶

Summarizing the posterior¶

Plotting the posterior¶

Example 2 continued: inferences about the population mean¶

Summarizing the posterior¶

Plotting the posterior¶

Explanations¶

Visualizing prior distributions¶

Bambi default priors¶

More about inference data objects¶

Groups are Dataset objects¶

Variables are DataArray objects¶

MCMC diagnostics¶

Trace plots¶

Rank plots¶

? Autocorrelation plot ?¶

Diagnostic summary¶

Bayesian workflow¶

Prior predictive checks¶

Posterior predictive checks¶

Fitting model to synthetic data¶

Sensitivity analysis¶

Iterative model building¶

Discussion¶

Other types of priors¶

Conjugate priors¶

Uninformative priors¶

Software for Bayesian inference¶

Computational cost¶

Reporting Bayesian results¶

Model predictions (BONUS TOPIC)¶

Coin toss predictions¶

IQ predictions¶

Exercises¶

Exercise A¶

Exercise B¶

Exercise C¶

Links¶

Exploring `InferenceData` objects¶

Groups are `Dataset` objects¶

Variables are `DataArray` objects¶