Nolan Miller

Hello! I’m Nolan, a practitioner of lattice quantum chromodynamics. I obtained my doctorate from the University of North Carolina and currently conduct research as a postdoc at the Helmholtz Institute.

MixMatch

Estimate an approximating Gaussian mixture for observed data that maximizes the expected log pointwise predictive density.

Why?

Given data $X$ taken from an arbitrary distribution $\mathcal D = p(X|\theta)$, we are often are interested in the inverse problem of parameterizing $\mathcal D$. If we already know a priori what the underlying distribution should be, it is relatively straightforward to apply techniques like maximum likelihood estimation to determine the best parameters of $\mathcal D$. However, occasionally we are unsure what the data-generating distribution is. In this case it is necessary to make some assumptions about what the candidate models could be and then evaluate quality of these models using some information criterion.

In this package, we use a Gaussian mixture model (that is, a weighted sum of Gaussians) to estimate the posterior predictive distribution $\widetilde{\mathcal D}(x | X) = \int_\theta d\theta \, p(x|\theta) p(\theta | X)$. The optimal number of components is estimated using leave-one-out cross-validation.

This work is implemented in PyMC. Thus, rather than rely on the expectation-maximization algorithm, we estimate $\widetilde{\mathcal D}(x | X)$ through a fully Bayesian, MCMC-based approach.

Examples

Data-generating distribution is a Gaussian mixture

Suppose we have data generated from the Gaussian mixture $\mathcal D = \frac 12 \mathcal N(-5, 1) + \frac 12 \mathcal N(-5, 1)$.

import numpy as np
data = np.hstack([np.random.normal(-5, 1, 100),  np.random.normal(5, 1, 100)])

To estimate the posterior predictive distribution, we need only the minimum and maximum number of components in our Gaussian mixture. The library will estimate these parameters for $N_\text{min}, \dots, N_\text{max}$ components before returning the model that maximizes the expected log pointwise predictive density.

from mixmatch import MixMatch

mix = MixMatch(data=data, min_components=1, max_components=4)
print(mix)

# Returns: 
# Best approximation [2 components]:
#  0.50 [-4.90(0.96)] + 0.50 [4.8(1.0)]

This results agrees with our expectations.

Finally, can also compare the performance of the different models as well as the highest-weight model against the data.

We note that although there appears to be some discrepency between the observed data and the posterior predictive distribution, this apparent discrepency is a consequence of using kernel density estimation to plot the data; when plotting the true data generating function, the agreement is visually much better.

Data-generating distribution is a uniform distribution

Next we consider data drawn from the uniform distribution $\mathcal U(0, 1)$. Unlike the previous example, the data-generating and posterior predictive distributions have different supports. However, as we add components, we should generally expect our mixture model to improve.

import numpy as np
data = np.random.uniform(0, 1, 1000)

Considering up to 6 components, we find

from mixmatch import MixMatch

mix = MixMatch(data=data, min_components=1, max_components=6)
print(mix)

# Returns: 
# Best approximation [6 components]:
#   0.08 [0.053(0.034)] + 0.18 [0.189(0.079)] + 0.33 [0.42(0.13)] + 0.25 [0.68(0.11)] + 0.11 [0.868(0.055)] + 0.05 [0.968(0.022)]

As expected, adding more components causes the model to improve (albeit with diminishing returns).

2025 3
2024 3
2023 1
2022 2
2021 5
2020 6

2025

MixMatch

2 minute read

Estimate an approximating Gaussian mixture for observed data that maximizes the expected log pointwise predictive density.

mars

4 minute read

mars (model averaging by resampling stochastically) is a module for creating a bootstrap distribution of a model-averaged quantity, taking resampled (i.e....

Scale setting the Möbius domain wall fermion on HISQ action with $M_\Omega$ and the gradient-flow scales

less than 1 minute read

Scale Setting Workshop at ECT* [PDF of slides]

2023

Reconstructing the late-time isovector correlation function through spectroscopy

less than 1 minute read

Sixth Plenary Workshop of the Muon g-2 Theory Initiative [PDF of slides]

2022

Extracting the pion-nucleon sigma term from the lattice [seminar]

less than 1 minute read

Lawrence Berkeley National Lab [seminar] [PDF of slides]

The hyperon spectrum from lattice QCD

less than 1 minute read

PoS(Lattice2021) Volume 396 (2022) [arXiv:2201.01343]

2020

Scale setting with $m_\Omega$

2 minute read

Python code for our scale setting analysis.

Scale setting the Möbius Domain Wall Fermion on gradient-flowed HISQ action using the Omega baryon mass and the gradient-flow scales $t_0$ and $w_0$

1 minute read

Phys. Rev. D 103, 054511 (2021) [arXiv:2011.12166]

Nolan Miller

MixMatch

Why?

Examples

Data-generating distribution is a Gaussian mixture

Data-generating distribution is a uniform distribution

2025

MixMatch

mars

Scale setting the Möbius domain wall fermion on HISQ action with $M_\Omega$ and the gradient-flow scales

2024

The hadronic vacuum polarization contribution to the muon $g - 2$ at long distances

The timelike pion form factor and other applications of $I=1$ $\pi\pi$ scattering

lsqfitics

2023

Reconstructing the late-time isovector correlation function through spectroscopy

2022

Extracting the pion-nucleon sigma term from the lattice [seminar]

The hyperon spectrum from lattice QCD

2021

$V_{us}$ from the lattice [talk]

The nucleon mass and sigma term from lattice QCD [talk]

lsqfit-gui

Determining properties of hyperons [talk]

$V_{us}$ from semileptonic hyperon decays [poster]

2020

Scale setting with $m_\Omega$

Scale setting the Möbius Domain Wall Fermion on gradient-flowed HISQ action using the Omega baryon mass and the gradient-flow scales $t_0$ and $w_0$

Lattice calculation of $F_K/F_\pi$ from a mixed domain-wall on HISQ action [talk]

spacetime-plots

$F_K/F_\pi$ from Möbius domain-wall fermions solved on gradient-flowed HISQ ensembles

$F_K/F_\pi$ from MDWF on HISQ