Evidence

Additional functions for computing the evidence of a pyross compartment model.

This is an extension of pyross.inference. Evidence computation via nested sampling is already directly implemented in the inference module. However, for large-scale (high-dimensional) inference problems, nested sampling can become very slow. In this module, we implement two additional ways to compute the evidence that work whenever the MCMC simulation of the posterior distribution is feasible. See the ex-evidence.ipynb notebook for a code example of all ways to compute the evidence.

pyross.evidence.get_parameters(estimator, x, Tf, prior_dict, contactMatrix=None, generator=None, intervention_fun=None, tangent=False)

Process an estimator from pyross.inference to generate input arguments for the evidence computations pyross.evidence.evidence_smc and pyross.evidence.evidence_path_sampling for estimation problems without latent variables. The input has the same structure as the input of pyross.inference.infer, see there for a detailed documentation of the arguments.

Parameters:
  • estimator (pyross.inference.SIR_type) – The estimator object of the underlying (non-latent) inference problem.
  • x (2d numpy.array) –
  • Tf (float) –
  • prior_dict (dict) –
  • contactMatrix (callable, optional) –
  • generator (pyross.contactMatrix, optional) –
  • intervention_fun (callable, optional) –
  • tangent (bool, optional) –
Returns:

  • logl – The log-likelihood of the inference problem.
  • prior – The prior distribution of the parameters.
  • ndim – The number of (flat) parameters.

pyross.evidence.latent_get_parameters(estimator, obs, fltr, Tf, param_priors, init_priors, contactMatrix=None, generator=None, intervention_fun=None, tangent=False, smooth_penalty=False, disable_bounds=False)

Process an estimator from pyross.inference to generate input arguments for the evidence computations pyross.evidence.evidence_smc and pyross.evidence.evidence_path_sampling for estimation problems with latent variables. The input has the same structure as the input of pyross.inference.latent_infer, see there for a detailed documentation of the arguments.

Parameters:
  • estimator (pyross.inference.SIR_type) – The estimator object of the underlying (non-latent) inference problem.
  • obs (np.array) –
  • fltr (2d np.array) –
  • Tf (float) –
  • param_priors (dict) –
  • init_priors (dict) –
  • contactMatrix (callable, optional) –
  • generator (pyross.contactMatrix, optional) –
  • intervention_fun (callable, optional) –
  • tangent (bool, optional) –
Returns:

  • logl – The log-likelihood of the inference problem.
  • prior – The prior distribution of the parameters.
  • ndim – The number of (flat) parameters.

pyross.evidence.compute_ess(weights)

Compute the effective sample size of a weighted set of samples.

pyross.evidence.compute_cess(old_weights, weights)

Compute the conditional effective sample size as decribed in [Zhou, Johansen, Aston 2016].

pyross.evidence.resample(N, particles, logl, probs)

Implements the residual resampling scheme, see for example [Doucet, Johansen 2008], https://www.stats.ox.ac.uk/~doucet/doucet_johansen_tutorialPF2011.pdf

pyross.evidence.evidence_smc(logl, prior, ndim, npopulation=200, target_cess=0.9, min_ess=0.6, mcmc_iter=50, nprocesses=0, save_samples=True, verbose=True)

Compute the evidence using an adaptive sequential Monte Carlo method.

This function computes the model evidence of the inference problem using a sequential Monte Carlo particle method starting at the prior distribution. We implement the method SMC2 described in [Zhou, Johansen, Aston 2016], https://doi.org/10.1080/10618600.2015.1060885

We start by sampling npopulation particles from the prior distribution with uniform weights. The target distribution of the weighted set of particles gets transformed to the posterior distribution by a geometric annealing schedule. The step size is chosen adaptively based on the target decay rate of the effective samples size target_cess in each step. Once the effective sample size of the weighted particles goes below min_cess * npopulation, we replace the weighted set of samples by a resampled, unweighted set. Between each step, the particles are decorrelated and equilibreated on the current level distribution by running an MCMC chain.

Parameters:
  • logl – Input from pyross.evidence.get_parameters or pyross.evidence.latent_get_parameters.
  • prior – Input from pyross.evidence.get_parameters or pyross.evidence.latent_get_parameters.
  • ndim – Input from pyross.evidence.get_parameters or pyross.evidence.latent_get_parameters.
  • npopulation (int) – The number of particles used for the SMC iteration. Higher number of particles increases the accuracy of the result.
  • target_cess (float) – The target rate for the decay of the effective sample size (ess reduces by 1-target_cess each step). Smaller values result in more iterations and a higher accuracy result.
  • min_ess (float) – The minimal effective sample size of the system. Low effective sample size imply many particles in low probability regions. However, resampling adds variance so it should not be done in every step.
  • mcmc_iter (int) – The number of MCMC iterations in each step of the algorithm. The number of iterations should be large enough to equilibrate the particles with the current distribution, higher iteration numbers will typically not result in more accurate results. Oftentimes, it makes more sense to increase the number of steps (via target_cess) instead of increasing the number of iterations. This decreases the difference in distribution between consecutive steps and reduced the error of the final result. This number should however be large enough to allow equal-position particles (that occur via resampling) to diverge from each other.
  • nprocesses (int) – The number of processes passed to the emcee MCMC sampler. By default, the number of physical cores is used.
  • save_samples (bool) – If true, this function returns the interal state of each MCMC iteration.
  • verbose (bool) – If true, this function displays the progress of each MCMC iteration in addition to basic progress information.
Returns:

  • log_evidence (float) – The estimate of the log evidence.

  • if save_samples=True

    result_samples: list of (float, emcee.EnsembleSampler)

    The list of samplers and their corresponding step alpha.

pyross.evidence.evidence_path_sampling(logl, prior, ndim, steps, npopulation=100, mcmc_iter=1000, nprocesses=0, initial_samples=10, verbose=True, extend_step_list=None, extend_sampler_list=None)

Compute the evidence using path sampling (thermodynamic integration).

This function computes posterior samples for the distributions

p_s propto prior * likelihood^s

for 0<s≤1, s ∈ steps, using ensemble MCMC. The samples can be used to estimate the evidence via

log_evidence = int_0^1 E_{p_s}[log_likelihood] ds

which is know as path sampling or thermodynamic integration.

This function starts with sampling initial_samples * npopulation samples from the (truncated log-normal) prior. Afterwards, it runs an ensemble MCMC chain with npopulation ensemble members for mcmc_iter iterations. To minimise burn-in, the iteration is started with the last sample of the chain with the closest step s that has already been computed. To extend the results of this function with additional steps, provide the to-be-extended result via the optional arguments extend_step_list and extend_sampler_list.

This function only returns the step list and the corresponding samplers. To compute the evidence estimate, use pyross.evidence.evidence_path_sampling_process_result.

Parameters:
  • logl – Input from pyross.evidence.get_parameters or pyross.evidence.latent_get_parameters.
  • prior – Input from pyross.evidence.get_parameters or pyross.evidence.latent_get_parameters.
  • ndim – Input from pyross.evidence.get_parameters or pyross.evidence.latent_get_parameters.
  • steps (list of float) – List of steps s for which the distribution p_s is explored using MCMC. Should be in ascending order and not include 0.
  • npopulation (int) – The population size of the MCMC ensemble sampler (see documentation of emcee for details).
  • mcmc_iters (int) – The number of iterations of the MCMC chain for each s ∈ steps.
  • nprocesses (int) – The number of processes passed to the emcee MCMC sampler. By default, the number of physical cores is used.
  • initial_samples (int) – Compute initial_samples * npopulation independent samples as the result for s = 0.
  • verbose (bool) – If true, this function displays the progress of each MCMC iteration in addition to basic progress information.
  • extend_step_list (list of float) – Extends the result of an earlier run of this function if this argument and extend_sampler_list are provided.
  • extend_sampler_list (list of emcee.EnsembleSampler) – Extends the result of an earlier run of this function if this argument and extend_step_list are provided.
Returns:

  • step_list (list of float) – The steps s for which p_s has been sampled from (including 0). Always in ascending order.
  • sampler_list (list) – The list of emcee.EnsembleSamplers (and an array of prior samples at 0).

pyross.evidence.evidence_path_sampling_process_result(logl, prior, ndim, step_list, sampler_list, burn_in=0, nprocesses=0)

Compute the evidence estimate for the result of pyross.evidence.evidence_path_sampling.

Parameters:
  • logl – Input from pyross.evidence.get_parameters or pyross.evidence.latent_get_parameters.
  • prior – Input from pyross.evidence.get_parameters or pyross.evidence.latent_get_parameters.
  • ndim – Input from pyross.evidence.get_parameters or pyross.evidence.latent_get_parameters.
  • step_list (list of float) – Output of pyross.evidence.evidence_path_sampling. The steps s for which p_s has been sampled from (including 0). Always in ascending order.
  • sampler_list (list) – Output of pyross.evidence.evidence_path_sampling. The list of emcee.EnsembleSamplers (and an array of prior samples at 0).
  • burn_in (float or np.array) – The number of initial samples that are discarded before computing the Monte Carlo average.
  • nprocesses (int) – The number of processes used to compute the prior likelihood. By default, the number of physical cores is used.
Returns:

  • log_evidence (float) – The estimate of the log evidence.
  • vals (list of float) – The Monte Carlo average of the log-likelihood for each s s ∈ step_list.

pyross.evidence.generate_traceplot(sampler, dims=None)

Generate a traceplot for an emcee.EnsembleSampler.

Parameters:
  • sampler (emcee.EnsembleSampler) – The sampler to plot the traceplot for.
  • dims (list of int, optional) – Select the dimensions that are plotted. By default, all dimensions are selected.