3. User Customization

Implementing a new Model

One of the standard use cases of ABCpy is to do inference on a probabilistic model that is not part of ABCpy. We now go through the details of such a scenario using the (already implemented) Gaussian generative model to explain how to implement it from scratch.

There are two scenarios to use a model: First, we want to use our probabilistic model to explain a relationship between parameters (considered random variables for inference). In the second case, we use them to explain a relationship between parameters and observed data, as example when we want to do inference on mechanistic models that do not have a PDF. In both these case, our model implementation has to derive from ProbabilisticModel class of ABCpy and a few abstract methods have to be defined, as for example forward_simulate().

In the first scenario, we want to use the model to build a relationship between different parameters (between different random variables). Then our model is restricted to either output continuous or discrete parameters in form of a vector. Consequently, the model must derive from either from Continuous or Discrete and implement the required abstract methods. These two classes in turn derive from from ProbabilisticModel, such that the second scenario essentially extends the first.

Let us go through the implementation of a the Gaussian generative model. The model has to conform to the API specified by the base class ProbabilisticModels, and thus must implement at least the following methods:

If we want our model to work in both the described scenarios, so our model also has to conform to the API of Continuous since the model output, which is the resulting data from a forward simulation, is from a continuous domain. For completeness, here are the abstract methods defined by Continuous and Discrete correspondingly:

If we want our model to work only for the second scenario (typically the case for mechanistic or simulator models) and not to be used to build priors on parameters, we do not need to write the above two abstract methods.

Initializing a New Model

Since a Gaussian model generates continous numbers, the newly implemented class derives from Continuous and the header look as follows:

from abcpy.probabilisticmodels import ProbabilisticModel, Continuous, InputConnector
class Gaussian(ProbabilisticModel, Continuous):

A good way to start implementing a new model is to define a convenient way to initialize it with its input parameters. In ABCpy all input parameters are either independent ProbabilisticModels or Hyperparameters. Thus, they should not be stored within but rather referenced in the model we implement. This reference is handled by the InputConnector class and must be used in our model implementation. The required procedure is to call the init function of ProbabilisticModels and pass an InputConnector object to it.

ProbabilisticModel.__init__(input_connector, name='')[source]

This initializer must be called from any derived class to properly connect it to its input models.

It accepts as input an InputConnector object that fully specifies how to connect all parent models to the current model.

Parameters:
  • input_connector (list) – A list of input parameters.
  • name (string) – A human readable name for the model. Can be the variable name for example.

However, it would be very inconvenient to initialize our Gaussian model with an InputConnector object. We rather like the init function to accept a list of parameters [mu, sigma], where mu is the mean and sigma is the standard deviation which are the sole two parameters of our generative Gaussian model. So the idea is to take a convenient input and transform it to an InputConnection object that in turn can be passed to the initializer of the super class. This leads to the following implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def __init__(self, parameters, name='Gaussian'):
    # We expect input of type parameters = [mu, sigma]
    if not isinstance(parameters, list):
        raise TypeError('Input of Normal model is of type list')

    if len(parameters) != 2:
        raise RuntimeError('Input list must be of length 2, containing [mu, sigma].')

    input_connector = InputConnector.from_list(parameters)
    super().__init__(input_connector, name)

First, we do some basic syntactic checks on the input that throw exceptions if unreasonable input is provided. Line 9 is the interesting part: the InputConnector comes with a convenient set of factory methods that create InputConnector objects:

We use the factory method from_list. The resulting InputConnector creates links between our Gaussian model and the models (or hyperparameters) that are used for mu and sigma at initialization time. For example, if mu and sigma are initialized as hyperparameters like

model = Gaussian([0, 1])

the from_list() method will automatically create two HyperParameter objects HyperParameter(0) and HyperParameter(1) and will link our current Gaussian model inputs to them. If we initialize mu and sigma with existing models like

uniform1 = Uniform([-1, 1])
uniform2 = Uniform([10,20])
model = Gaussian([uniform1, uniform2])

the from_list() method will link our inputs to the uniform models.

Additionally, every model instance should have a unique name, which should also be passed to the init function of the super class.

Checking the Input

The next function we implement is _check_input which should behave as described in the documentation:

ProbabilisticModel._check_input(input_values)[source]

Check whether the input parameters are compatible with the underlying model.

The following behavior is expected:

1. If the input is of wrong type or has the wrong format, this method should raise an exception. For example, if the number of parameters does not match what the model expects.

2. If the values of the input models are not compatible, this method should return False. For example, if an input value is not from the expected domain.

Background information: Many inference schemes modify the input slightly by applying a small perturbation during sampling. This method is called to check whether the perturbation yields a reasonable input to the current model. In case this function returns False, the inference schemes re-perturb the input and try again. If the check is not done properly, the inference computation might crash or not terminate.

Parameters:input_values (list) – A list of numbers that are the concatenation of all parent model outputs in the order specified by the InputConnector object that was passed during initialization.
Returns:True if the fixed value of the parameters can be used as input for the current model. False otherwise.
Return type:boolean

This leads to the following implementation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def _check_input(self, input_values):
    # Check whether input has correct type or format
    if len(input_values) != 2:
        raise ValueError('Number of parameters of Normal model must be 2.')

    # Check whether input is from correct domain
    mu = input_values[0]
    sigma = input_values[1]
    if sigma < 0:
        return False

    return True

Forward Simulation

At the core of our model lies the capability to forward simulate and create pseudo observations. To expose this functionality the following method has to be implemented:

ProbabilisticModel.forward_simulate(input_values, k, rng, mpi_comm=None)[source]

Provides the output (pseudo data) from a forward simulation of the current model.

In case the model is intended to be used as input for another model, a forward simulation must return a list of k numpy arrays with shape (get_output_dimension(),).

In case the model is directly used for inference, and not as input for another model, a forward simulation also must return a list, but the elements can be arbitrarily defined. In this case it is only important that the used statistics and distance functions can read the input.

Parameters:
  • input_values (list) – A list of numbers that are the concatenation of all parent model outputs in the order specified by the InputConnector object that was passed during initialization.
  • k (integer) – The number of forward simulations that should be run
  • rng (Random number generator) – Defines the random number generator to be used. The default value uses a random seed to initialize the generator.
  • mpi_comm (MPI communicator object) – Defines the MPI communicator object for MPI parallelization. The default value is None, meaning the forward simulation is not MPI-parallelized.
Returns:

A list of k elements, where each element is of type numpy arary and represents the result of a single forward simulation.

Return type:

list

A proper implementation look as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def forward_simulate(self, input_values, k, rng=np.random.RandomState()):
    # Extract the input parameters
    mu = input_values[0]
    sigma = input_values[1]

    # Do the actual forward simulation
    vector_of_k_samples = np.array(rng.normal(mu, sigma, k))

    # Format the output to obey API
    result = [np.array([x]) for x in vector_of_k_samples]
    return result

Note that both mu and sigma are stored in the list input values in the same order as we provided them to the InputConnector object in the init function. Futher note that the output is a list of vectors, each of dimension one, though the Gaussian generative model only produces real numbers.

Checking the Output

We also need to check the output of the model. This method is commonly used in case our model is used as an input for other models. When using an inference scheme that utilizes perturbation, the output of our model is slightly perturbed. We have to make sure that the perturbed output is still valid for our model. The details of implementing the method _check_output() can be found in the documentation:

ProbabilisticModel._check_output(values)[source]

Checks whether values contains a reasonable output of the current model.

Parameters:values (numpy array) – Array of shape (get_output_dimension(),) that contains the model output.
Returns:Return false if values cannot possibly be generated from the model and true otherwise.
Return type:boolean

Since the output of a Gaussian generative model is a single number from the full real domain, we can restrict ourselves to syntactic checks. However, one could easily imagine models for which the output it restricted to a certain domain. Then, this function should return False as soon as values are out of the desired domain.

1
2
3
4
5
6
def _check_output(self, values):
    if not isinstance(values, Number):
        raise ValueError('Output of the normal distribution is always a number.')

    # At this point values is a number (int, float); full domain for Normal is allowed
    return True

Note that implementing this method is particularly important when using the current model as input for other models, hence in the first scenario described in Implementing a new Model. In case our model should only be used for the second scenario, it is safe to omit the check and return true.

Getting the Output Dimension

We have expose the dimension of the produced output of our model using the following method:

ProbabilisticModel.get_output_dimension()[source]

Provides the output dimension of the current model.

This function is in particular important if the current model is used as an input for other models. In such a case it is assumed that the output is always a vector of int or float. The length of the vector is the dimension that should be returned here.

Returns:The dimension of the output vector of a single forward simulation.
Return type:int

Since our model generates a single float number in one forward simulation, the implementation looks is straight forward:

1
2
def get_output_dimension(self):
    return 1

Note that implementing this method is particularly important when using the current model as input for other models, hence in the first scenario described in Implementing a new Model. In case our model should only be used for the second scenario, it is safe to return 1.

Calculating the Probability Density Function

Since our model also derives from Continuous we also have to implement the following function that calculates the probability density function at specific point.

Continuous.pdf(input_values, x)[source]

Calculates the probability density function of the model.

Parameters:
  • input_values (list) – A list of numbers that are the concatenation of all parent model outputs in the order specified by the InputConnector object that was passed during initialization.
  • x (float) – The location at which the probability density function should be evaluated.

As mentioned above, this is only required if one wants to use our model as input for other models. An implementation looks as follows:

1
2
3
4
5
def pdf(self, input_values, x):
    mu = input_values[0]
    sigma = input_values[1]
    pdf = np.norm(mu, sigma).pdf(x)
    return pdf

Our model now conforms to ABCpy and we can start inferring parameters in the same way (see Getting Started) as we would do with shipped models.

Wrap a Model Written in C++

There are several frameworks that help you integrating your C++/C code into Python. We showcase examples for

Using Swig

Swig is a tool that creates a Python wrapper for our C++/C code using an interface (file) that we have to specify. We can then import the wrapper and in turn use your C++ code with ABCpy as if it was written in Python.

We go through a complete example to illustrate how to use a simple Gaussian model written in C++ with ABCpy. First, have a look at our C++ model:

1
2
3
4
5
6
7
8
9
void gaussian_model(double* result, unsigned int k, double mu, double sigma, int seed) {
  boost::mt19937 rng(seed);
  boost::normal_distribution<> nd(mu, sigma);
  boost::variate_generator<boost::mt19937, boost::normal_distribution<> > sampler(rng, nd);
  
  for (int i=0; i<k; ++i) {
    result[i] = sampler();
  }
}

To use this code in Python, we need to specify exactly how to expose the C++ function to Python. Therefore, we write a Swig interface file that look as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
%module gaussian_model_simple
%{
  #define SWIG_FILE_WITH_INIT
  
  #include <iostream>
  #include <boost/random.hpp>
  #include <boost/random/normal_distribution.hpp>
  
  extern void gaussian_model(double* result, unsigned int k, double mu, double sigma, int seed);
%}

%include "numpy.i"

%init %{
  import_array();
%}

%apply (double* ARGOUT_ARRAY1, int DIM1 ) {(double* result, unsigned int k)};

extern void gaussian_model(double* result, unsigned int k, double mu, double sigma, int seed);

In the first line we define the module name we later have to import in your ABCpy Python code. Then, in curly brackets, we specify which libraries we want to include and which function we want to expose through the wrapper.

Now comes the tricky part. The model class expects a method forward_simulate that forward-simulates our model and which returns an array of synthetic observations. However, C++/C does not know the concept of returning an array, instead in C++/C we would provide a memory position (pointer) where to write the results. Swig has to translate between the two concepts. We use actually an Swig interface definition from numpy called import_array. The line

1
%apply (double* ARGOUT_ARRAY1, int DIM1 ) {(double* result, unsigned int k)};

states that we want the two parameters result and k of the gaussian_model C++ function be interpreted as an array of length k that is returned. Have a look at the Python code below and observe how the wrapped Python function takes only two instead of four parameters and returns a numpy array.

The first stop to get everything running is to translate the Swig interface file to wrapper code in C++ and Python.

swig -python -c++ -o gaussian_model_simple_wrap.cpp gaussian_model_simple.i

This creates two wrapper files gaussian_model_simple_wrap.cpp and gaussian_model_simple.py. Now the C++ files can be compiled:

g++ -fPIC -I /usr/include/python3.5m -c gaussian_model_simple.cpp -o gaussian_model_simple.o
g++ -fPIC -I /usr/include/python3.5m -c gaussian_model_simple_wrap.cpp -o gaussian_model_simple_wrap.o
g++ -shared gaussian_model_simple.o gaussian_model_simple_wrap.o -o _gaussian_model_simple.so

Note that the include paths might need to be adapted to your system. Finally, we can write a Python model which uses our C++ code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
import logging
import numpy as np
from examples.extensions.models.gaussian_cpp.gaussian_model_simple import \
    gaussian_model  # this is the file produced upon compiling
from abcpy.probabilisticmodels import ProbabilisticModel, Continuous, InputConnector


class Gaussian(ProbabilisticModel, Continuous):

    def __init__(self, parameters, name='Gaussian'):
        # We expect input of type parameters = [mu, sigma]
        if not isinstance(parameters, list):
            raise TypeError('Input of Normal model is of type list')

        if len(parameters) != 2:
            raise RuntimeError('Input list must be of length 2, containing [mu, sigma].')

        input_connector = InputConnector.from_list(parameters)
        super().__init__(input_connector, name)

    def _check_input(self, input_values):
        # Check whether input has correct type or format
        if len(input_values) != 2:
            raise ValueError('Number of parameters of Normal model must be 2.')

        # Check whether input is from correct domain
        mu = input_values[0]
        sigma = input_values[1]
        if sigma < 0:
            return False

        return True

    def _check_output(self, values):
        if not isinstance(values, Number):
            raise ValueError('Output of the normal distribution is always a number.')

        # At this point values is a number (int, float); full domain for Normal is allowed
        return True

    def get_output_dimension(self):
        return 1

    def forward_simulate(self, input_values, k, rng=np.random.RandomState()):
        # Extract the input parameters
        mu = input_values[0]
        sigma = input_values[1]
        seed = rng.randint(np.iinfo(np.int32).max)

        # Do the actual forward simulation
        vector_of_k_samples = gaussian_model(k, mu, sigma, seed)  # call the C++ code

        # Format the output to obey API
        result = [np.array([x]) for x in vector_of_k_samples]
        return result

    def pdf(self, input_values, x):
        mu = input_values[0]
        sigma = input_values[1]
        pdf = np.norm(mu, sigma).pdf(x)
        return pdf

The important lines are where we import the wrapper code as a module (line 3) and call the respective model function (line 48).

The full code is available in examples/extensions/models/gaussion_cpp/. To simplify compilation of SWIG and C++ code we created a Makefile. Note that you might need to adapt some paths in the Makefile.

Wrap a Model Written in R

Statisticians often use the R language to build statistical models. R models can be incorporated within the ABCpy language with the rpy2 Python package. We show how to use the rpy2 package to connect with a model written in R.

Continuing from the previous sections we use a simple Gaussian model as an example. The following R code is the contents of the R file gaussian_model.R:

1
2
3
4
5
simple_gaussian <- function(mu, sigma, k = 1, seed = seed) {
  set.seed(seed)
  output <- rnorm(k, mu, sigma)
  return(output)
}

More complex R models are incorporated in the same way. To include this function within ABCpy we include the following code at the beginning of our Python file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import rpy2.robjects as robjects
import rpy2.robjects.numpy2ri
try:
    robjects.r('''
       source('examples/extensions/models/gaussian_R/gaussian_model.R')
    ''')
except:
    robjects.r('''
           source('gaussian_model.R')
    ''')
r_simple_gaussian = robjects.globalenv['simple_gaussian']

This imports the R function simple_gaussian into the Python environment. We need to build our own model to incorporate this R function as in the previous section. The only difference is in the forward_simulate method of the class Gaussian.

1
vector_of_k_samples = list(r_simple_gaussian(mu, sigma, k, seed=seed))

The default output for R functions in Python is a float vector. This must be converted into a Python numpy array for the purposes of ABCpy.

Wrap a Model Written in FORTRAN

FORTRAN is still a widely used language in some specific application domains. We show here how to wrap a FORTRAN model in ABCpy by exploiting the F2PY tool, which is part of Numpy.

Using this tool is quite simple; first, the FORTRAN code defining the model has to be defined:

module gaussian_model
contains
    subroutine gaussian(output, mu, sigma, k, seed)

specifically, that needs to define a subroutine (here gaussian) in a module (here gaussian_model):

Then, the FORTRAN code needs to be compiled in a way which can be linked to the Python one; by using F2PY, this is as simple as:

python -m numpy.f2py -c -m gaussian_model_simple gaussian_model_simple.f90

which produces an executable (with .so extension on Linux, for instance) with the same name as the FORTRAN file. Finally, an ABCpy model in Python needs to be defined which calls the FORTRAN binary similarly to what done before. Specifically, we import the FORTRAN model in the following way:

from examples.extensions.models.gaussian_f90.gaussian_model_simple import gaussian_model

Note that the name of the object to import is the same as the module name in the original FORTRAN code. Then, in the forward_simulate method of the ABCpy model, you can run the FORTRAN model and obtain its output with the following line:

vector_of_k_samples = np.array(gaussian_model(mu, sigma, k, seed))

A full reproducible example is available in examples/extensions/models/gaussion_f90/; a Makefile with the right compilation commands is also provided.

Implementing a new Distance

We will now explain how you can implement your own distance measure. A new distance is implemented as a new class that derives from Distance and for which the following three methods have to be implemented:

Let us first look at the initializer documentation:

Distance.__init__(statistics_calc)[source]

The constructor of a sub-class must accept a non-optional statistics calculator as a parameter; then, it must call the __init__ method of the parent class. This ensures that the object is initialized correctly so that the _calculate_summary_stat private method can be called when computing the distances.

Parameters:statistics_calc (abcpy.statistics.Statistics) – Statistics extractor object that conforms to the Statistics class.

Distances in ABCpy should act on summary statistics. Therefore, at initialization of a distance calculator, a statistics calculator should be provided. The following header conforms to this idea:

def __init__(self, statistics_calc):
    self.statistics_calc = statistics_calc

    # Since the observations do always stay the same, we can save the
    #  summary statistics of them and not recalculate it each time
    self.s1 = None
    self.data_set = None
    self.dataSame = False

Then, we need to define how the distance is calculated. We need first to compute the summary statistics from the datasets and after compute the distance between the summary statistics. Notice that we use the private method Distance._calculate_summary_stat to compute the statistics from the dataset; internally, this saves the first dataset and the corresponding summary statistics while computing the summary statistics. In fact, we always pass the observed dataset first to the distance function during inference and ,as this does not change, it is efficient to compute it once and store it internally. At each call of the distance method, the first input is compared to the stored one and, only if they differ, the stored statistics is updated.

def distance(self, d1, d2):
    """Calculates the distance between two datasets, by computing Euclidean distance between each element of d1 and
    d2 and taking their average.

    Parameters
    ----------
    d1: Python list
        Contains n1 data points.
    d2: Python list
        Contains n2 data points.

    Returns
    -------
    numpy.float
        The distance between the two input data sets.
    """
    s1, s2 = self._calculate_summary_stat(d1, d2)

    # compute distance between the statistics
    dist = np.zeros(shape=(s1.shape[0], s2.shape[0]))
    for ind1 in range(0, s1.shape[0]):
        for ind2 in range(0, s2.shape[0]):
            dist[ind1, ind2] = np.sqrt(np.sum(pow(s1[ind1, :] - s2[ind2, :], 2)))

    return dist.mean()

Finally, we need to define the maximal distance that can be obtained from this distance measure.

def dist_max(self):
    """
    Returns
    -------
    numpy.float
        The maximal possible value of the desired distance function.
    """
    return np.inf

The newly defined distance class can be used in the same way as the already existing once. The complete example for this tutorial can be found in examples/extensions/distances/default_distance.py.

Implementing a new Perturbation Kernel

To implement a new kernel, we need to implement a new class that derives from abcpy.perturbationkernel.PerturbationKernel and that implements the following abstract methods:

Kernels in ABCpy can be of two types: they can either be derived from the class ContinuousKernel or from DiscreteKernel. In case a continuous kernel is required, the following method must be implemented:

    def pdf(self, accepted_parameters_manager, kernel_index, index, x):

On the other hand, if the kernel is a discrete kernel, we would need the following method:

    def pmf(self, accepted_parameters_manager, kernel_index, index, x):

As an example, we will implement a kernel which perturbs continuous parameters using a multivariate normal distribution (which is already implemented within ABCpy). First, we need to define a constructor.

PerturbationKernel.__init__(models)[source]
Parameters:models (list) – The list of abcpy.probabilisticmodel objects that should be perturbed by this kernel.

Thus, ABCpy expects that the arguments passed to the initializer is of type ProbabilisticModel, which can be seen as the random variables that should be perturbed by this kernel. All these models should be saved on the kernel for future reference.

class MultivariateNormalKernel(PerturbationKernel, ContinuousKernel):
    def __init__(self, models):
        self.models = models

Next, we need the following method:

PerturbationKernel.calculate_cov(accepted_parameters_manager, kernel_index)[source]

Calculates the covariance matrix for the kernel.

Parameters:
  • accepted_parameters_manager (abcpy.acceptedparametersmanager object) – The accepted parameters manager that manages all bds objects.
  • kernel_index (integer) – The index of the kernel in the list of kernels of the joint perturbation kernel.
Returns:

The covariance matrix for the kernel.

Return type:

numpy.ndarray

This method calculates the covariance matrix for your kernel. Of course, not all kernels will have covariance matrices. However, since some kernels do, it is necessary to implement this method for all kernels. If your kernel does not have a covariance matrix, simply return an empty list.

The two arguments passed to this method are the accepted parameters manager and the kernel index. An object of type AcceptedParameterManager is always initialized when an inference method object is instantiated. On this object, the accepted parameters, accepted weights, accepted covariance matrices for all kernels and other information is stored. This is such that various objects can access this information without much hassle centrally. To access any of the quantities mentioned above, you will have to call the .value() method of the corresponding quantity.

The second parameter, the kernel index, specifies the index of the kernel in the list of kernels that the inference method will in the end obtain. Since the user is expected to collect all his kernels in one object, this index will automatically be provided. You do not need any knowledge of what the index actually is. However, it is used to access the values relevant to your kernel, for example the current calculated covariance matrix for a kernel.

Let us now look at the implementation of the method:

def calculate_cov(self, accepted_parameters_manager, kernel_index):
    """
    Calculates the covariance matrix relevant to this kernel.

    Parameters
    ----------
    accepted_parameters_manager: abcpy.AcceptedParametersManager object
        AcceptedParametersManager to be used.
    kernel_index: integer
        The index of the kernel in the list of kernels of the joint kernel.

    Returns
    -------
    list
        The covariance matrix corresponding to this kernel.
    """
    continuous_model = [[] for i in
                        range(len(accepted_parameters_manager.kernel_parameters_bds.value()[kernel_index]))]
    for i in range(len(accepted_parameters_manager.kernel_parameters_bds.value()[kernel_index])):
        if isinstance(accepted_parameters_manager.kernel_parameters_bds.value()[kernel_index][i][0],
                      (np.float, np.float32, np.float64, np.int, np.int32, np.int64)):
            continuous_model[i] = accepted_parameters_manager.kernel_parameters_bds.value()[kernel_index][i]
        else:
            continuous_model[i] = np.concatenate(
                accepted_parameters_manager.kernel_parameters_bds.value()[kernel_index][i])
    continuous_model = np.array(continuous_model).astype(float)

    if accepted_parameters_manager.accepted_weights_bds is not None:
        weights = accepted_parameters_manager.accepted_weights_bds.value()
        cov = np.cov(continuous_model, aweights=weights.reshape(-1).astype(float), rowvar=False)
    else:
        cov = np.cov(continuous_model, rowvar=False)
    return cov

Some of the implemented inference algorithms weigh different sets of parameters differently. Therefore, if such weights are provided, we would like to weight the covariance matrix accordingly. We, therefore, check whether the accepted parameters manager contains any weights. If it does, we retrieve these weights, and calculate the covariance matrix using numpy, the parameters relevant to this kernel and the weights. If there are no weights, we simply calculate an unweighted covariance matrix.

Next, we need the method:

PerturbationKernel.update(accepted_parameters_manager, row_index, rng)[source]

Perturbs the parameters for this kernel.

Parameters:
  • accepted_parameters_manager (abcpy.acceptedparametersmanager object) – The accepted parameters manager that manages all bds objects.
  • row_index (integer) – The index of the accepted parameters bds that should be perturbed.
  • rng (random number generator) – The random number generator to be used.
Returns:

The perturbed parameters.

Return type:

numpy.ndarray

This method perturbs the parameters that are associated with the random variables the kernel should perturb. The method again requires an accepted parameters manager and a kernel index. These have the same meaning as in the last method. In addition to this, a row index is required, as well as a random number generator. The row index specifies which set of parameters should be perturbed. There are usually multiple sets, which should be perturbed by different workers during parallelization. We, again, need not to worry about the actual value of this index.

The random number generator should be a random number generator compatible with numpy. This is due to the fact that other methods will pass their random number generator to this method, and all random number generators used within ABCpy are provided by numpy. Also, note that even if your kernel does not require a random number generator, you still need to pass this argument.

Here the implementation for our kernel:

def update(self, accepted_parameters_manager, kernel_index, row_index, rng=np.random.RandomState()):
    """
    Updates the parameter values contained in the accepted_paramters_manager using a multivariate normal distribution.

    Parameters
    ----------
    accepted_parameters_manager: abcpy.AcceptedParametersManager object
        Defines the AcceptedParametersManager to be used.
    kernel_index: integer
        The index of the kernel in the list of kernels in the joint kernel.
    row_index: integer
        The index of the row that should be considered from the accepted_parameters_bds matrix.
    rng: random number generator
        The random number generator to be used.

    Returns
    -------
    np.ndarray
        The perturbed parameter values.
    """

    # Get all current parameter values relevant for this model and the structure
    continuous_model_values = accepted_parameters_manager.kernel_parameters_bds.value()[kernel_index]

    if isinstance(continuous_model_values[row_index][0],
                  (np.float, np.float32, np.float64, np.int, np.int32, np.int64)):
        # Perturb
        cov = np.array(accepted_parameters_manager.accepted_cov_mats_bds.value()[kernel_index]).astype(float)
        continuous_model_values = np.array(continuous_model_values).astype(float)

        # Perturbed values anc split according to the structure
        perturbed_continuous_values = rng.multivariate_normal(continuous_model_values[row_index], cov)
    else:
        # print('Hello')
        # Learn the structure
        struct = [[] for i in range(len(continuous_model_values[row_index]))]
        for i in range(len(continuous_model_values[row_index])):
            struct[i] = continuous_model_values[row_index][i].shape[0]
        struct = np.array(struct).cumsum()
        continuous_model_values = np.concatenate(continuous_model_values[row_index])

        # Perturb
        cov = np.array(accepted_parameters_manager.accepted_cov_mats_bds.value()[kernel_index]).astype(float)
        continuous_model_values = np.array(continuous_model_values).astype(float)

        # Perturbed values anc split according to the structure
        perturbed_continuous_values = np.split(rng.multivariate_normal(continuous_model_values, cov), struct)[:-1]

    return perturbed_continuous_values

The first line shows how you obtain the values of the parameters that your kernel should perturb. These values are converted to a numpy array. Then, the covariance matrix is retrieved from the accepted parameters manager using a similar function call. Finally, the parameters are perturbed and returned.

Last but not least, each kernel requires a probability density or probability mass function depending on whether it is a Continuous Kernel or a Discrete Kernel:

PerturbationKernel.pdf(accepted_parameters_manager, kernel_index, row_index, x)[source]

Calculates the pdf of the kernel at point x.

Parameters:
  • accepted_parameters_manager (abcpy.acceptedparametersmanager object) – The accepted parameters manager that manages all bds objects.
  • kernel_index (integer) – The index of the kernel in the list of kernels of the joint perturbation kernel.
  • row_index (integer) – The index of the accepted parameters bds for which the pdf should be evaluated.
  • x (list or float) – The point at which the pdf should be evaluated.
Returns:

The pdf evaluated at point x.

Return type:

float

This method is implemented as follows for the multivariate normal:

def pdf(self, accepted_parameters_manager, kernel_index, mean, x):
    """Calculates the pdf of the kernel.
    Commonly used to calculate weights.

    Parameters
    ----------
    accepted_parameters_manager: abcpy.AcceptedParametersManager object
        The AcceptedParametersManager to be used.
    kernel_index: integer
        The index of the kernel in the list of kernels in the joint kernel.
    index: integer
        The row to be considered in the accepted_parameters_bds matrix.
    x: The point at which the pdf should be evaluated.

    Returns
    -------
    float
        The pdf evaluated at point x.
    """

    if isinstance(mean[0], (np.float, np.float32, np.float64, np.int, np.int32, np.int64)):
        mean = np.array(mean).astype(float)
        cov = np.array(accepted_parameters_manager.accepted_cov_mats_bds.value()[kernel_index]).astype(float)
        return multivariate_normal(mean, cov, allow_singular=True).pdf(np.array(x).astype(float))
    else:
        mean = np.array(np.concatenate(mean)).astype(float)
        cov = np.array(accepted_parameters_manager.accepted_cov_mats_bds.value()[kernel_index]).astype(float)
        return multivariate_normal(mean, cov, allow_singular=True).pdf(np.concatenate(x))

We simply obtain the parameter values and covariance matrix for this kernel and calculate the probability density function using SciPy.

Note that after defining your own kernel, you will need to collect all your kernels in a JointPerturbationKernel object in order for inference to work. For an example on how to do this, check the Using perturbation kernels section.

The complete example used in this tutorial can be found in the file examples/extensions/perturbationkernels/multivariate_normal_kernel.py.