Nolan Miller

Hello! I’m Nolan, a practitioner of lattice quantum chromodynamics. I obtained my doctorate from the University of North Carolina and currently conduct research as a postdoc at the Helmholtz Institute.

mars

mars (model averaging by resampling stochastically) is a module for creating a bootstrap distribution of a model-averaged quantity, taking resampled (i.e., bootstrapped or jackknifed) data from individual models as input.

But why?

Why bootstrap?

The bootstrap (as well as its linear approximation, the jackknife) is a robust procedure for estimating the distribution of a statistic of interest. In particular, the bootstrap performs well in many circumstances even when the target distribution is unknown, and as a bonus, it relatively simple to implement.

To motivate the bootstrap, suppose we would like to understand the distribution of $\theta = s(X)$, where $s(\cdot)$ is a function or procedure of a random variable $X$. Normally we do not have access to the full distribution of $X$ but instead a set of sample data, which forms the empirical distribution $\hat X = (x_1, \dots, x_N)$. The plug-in principle tells us that the best estimate for $s(X)$ is $s(\hat X)$. However, the plug-in estimate yields only the expected value of $s(X)$, not its uncertainty.

Of course, we could side step this issue by collecting more samples (or partitioning our sample into sets), therefore estimating the distribution of $\theta$ from multiple plug-in estimates $s(\hat X_1), \dots, s(\hat X_N)$. But this is expensive (or in the case of partitioning, comes at a loss of precision). Instead, we create empirical distributions of the empirical distribution by bootstrapping. A bootstrap resample is formed by drawing $N$ values from $\hat X$ with replacement, $x^\ast = (x_1^\ast, \dots, x_N^\ast)$, which allows us to form a bootstrap replication $\theta^\ast = s(x^\ast)$.

By forming $M$ bootstrap resamples, the distribution of $\theta$ can then be estimated using the $M$ bootstrap replications $\theta^\ast_1, \dots, \theta^\ast_M$.

Why model average?

Suppose our random variable $X$ has multiple valid descriptions by models $M_1, \dots, M_K$. If one model is clearly superior to the others (e.g., as determined by a reduced $\chi^2$), then we might choose a single model to represent our knowledge of $X$, a procedure known as model selection. However, if it difficult to distinguish between the validity of multiple models, then choosing a single model introduces a source of bias, as a single chosen model will not capture the features of the other models.

The normal procedure for model averaging proceeds as follows: given a set of models, our best guess for $X$ is just the weighed average

\[\mathbb E[X] = \sum_M p(M) \, \mathbb E[X|M]\]

where $X|M$ is the distribution for $X$ conditioned on model $M$. From the law of total variance, we can then compute

\[\text{Var }X = \sum_M p(M) \text{Var } [X | M] + \sum_M p(M) \, \mathbb E[X | M]^2 - \left(\sum_M p(M) \, \mathbb E[X | M]\right)^2\]

The first term is a weighed average of the variance of the individual models, hence measuring the statistical variance, while the remaining terms describe the variance of the hypothetical means of each model, thereby measuring the spread of the models and hence the systematic error from the choice of model averaging. This formula naturally generalizes to the case of computing the covariance between two variables, albeit with some subtlety pertaining to what constitutes a model.

In some sense, “model averaging” is a bit of misnomer – what we are really doing is marginalizing over the conditional distributions $X |M_1, \dots, X| M_K$, i.e. the choice of model. That is, there is more to model averaging than just computing the first and second moments of the distribution; in principle, one could also compute the higher order moments via the law of total cumulance.

Explicit decompositions for the higher order moments are tedious and luckily unnecessary; we are bootstrapping. Using mars, we can visualize the marginal distribution given two models $X|M_1 \sim N(-4, 3^2)$ and $X|M_1 \sim N(4, 3^2)$ .

rng = mars.utils.RNG('1234')

# generate some fake bootstrap resamples
x_given_m1 = rng.normal(-4, 3, 1000)
x_given_m2 = rng.normal(4, 3, 1000)

# create individual models for each set of data,
# use icvals = 1 for both to give models equal weight
mdl_m1 = mars.Model(dstat=x_given_m1, mu=-4, icvals=1)
mdl_m2 = mars.Model(dstat=x_given_m1, mu=+4, icvals=1)

# marginalize stochastically
bma = mars.BMA(models=[mdl_m1, mdl_m2], seed='example')

# plot result
bma.plot_histogram(show_gaussian=True)


The mixture of two Gaussians, $X\|M_1 \sim N(-4, 3^2)$ and $X\|M_2 \sim N(4, 3^2)$

Before proceeding, let us remark on some of the features in this plot:

The light-blue, filled-in histogram is the weighted sum of the individual histograms for $X|M_1$ and $X|M_2$.
The black, overlaid histogram is an approximation of the marginal distribution of $X$, formed by stochastically sampling from $X|M_1$ and $X|M_2$ per their weights (as will be further explained later).
The teal curve is the approximating Gaussian, formed from the expectation value and standard deviation of the stochastically-estimated $X$. Note that this prediction (as shown in the legend) agrees with the prediction from the equations above.
The vertical bands show the full uncertainty as well as the statistical uncertainty.

While the predictions for the variance and mean match the formulae given above, it is clear that the marginal distribution is not Gaussian. Indeed, when model averaging, it is very rarely the case that the marginal distribution will be!

If desired, we can now compute, e.g., the kurtosis of the marginal distribution:

import scipy
scipy.stats.kurtosis(bma.posterior[1:]) # exclude 0th entry, corresponding to mu
# = -0.77..., with some error from stochastic resampling

2025 2
2024 3
2023 1
2022 2
2021 5
2020 6

2025

mars

4 minute read

mars (model averaging by resampling stochastically) is a module for creating a bootstrap distribution of a model-averaged quantity, taking resampled (i.e....

Scale setting the Möbius domain wall fermion on HISQ action with $M_\Omega$ and the gradient-flow scales

less than 1 minute read

Scale Setting Workshop at ECT* [PDF of slides]

2023

Reconstructing the late-time isovector correlation function through spectroscopy

less than 1 minute read

Sixth Plenary Workshop of the Muon g-2 Theory Initiative [PDF of slides]

2022

Extracting the pion-nucleon sigma term from the lattice [seminar]

less than 1 minute read

Lawrence Berkeley National Lab [seminar] [PDF of slides]

The hyperon spectrum from lattice QCD

less than 1 minute read

PoS(Lattice2021) Volume 396 (2022) [arXiv:2201.01343]

2020

Scale setting with $m_\Omega$

2 minute read

Python code for our scale setting analysis.

Scale setting the Möbius Domain Wall Fermion on gradient-flowed HISQ action using the Omega baryon mass and the gradient-flow scales $t_0$ and $w_0$

1 minute read

Phys. Rev. D 103, 054511 (2021) [arXiv:2011.12166]

Nolan Miller

mars

But why?

Why bootstrap?

Why model average?

2025

mars

Scale setting the Möbius domain wall fermion on HISQ action with $M_\Omega$ and the gradient-flow scales

2024

The hadronic vacuum polarization contribution to the muon $g - 2$ at long distances

The timelike pion form factor and other applications of $I=1$ $\pi\pi$ scattering

lsqfitics

2023

Reconstructing the late-time isovector correlation function through spectroscopy

2022

Extracting the pion-nucleon sigma term from the lattice [seminar]

The hyperon spectrum from lattice QCD

2021

$V_{us}$ from the lattice [talk]

The nucleon mass and sigma term from lattice QCD [talk]

lsqfit-gui

Determining properties of hyperons [talk]

$V_{us}$ from semileptonic hyperon decays [poster]

2020

Scale setting with $m_\Omega$

Scale setting the Möbius Domain Wall Fermion on gradient-flowed HISQ action using the Omega baryon mass and the gradient-flow scales $t_0$ and $w_0$

Lattice calculation of $F_K/F_\pi$ from a mixed domain-wall on HISQ action [talk]

spacetime-plots

$F_K/F_\pi$ from Möbius domain-wall fermions solved on gradient-flowed HISQ ensembles

$F_K/F_\pi$ from MDWF on HISQ