Solving PDE-based BIP using core CUQIpy#
Here we build a Bayesian problem in which the forward model is a partial differential equation (PDE) model, the 1D heat problem in particular.
Try to at least run through part 1 to 3 before working on the optional exercises.
Learning objectives of this notebook:#
Solve PDE-based Bayesian problem using CUQIpy.
Use different parametrizations of the Bayesian parameters (e.g. KL expansion, non-linear maps).
Table of contents:#
★ Indicates optional section.
This notebook was run on a local machine and not using github actions for this book due to its long execution time.
This notebook uses MCMC samplers from the new
cuqi.experimental.mcmc
module, which are expected to become the default soon. Check out the documentation for more details.
1. Loading the PDE test problem #
We first import the required python standard packages that we need:
import numpy as np
import matplotlib.pyplot as plt
from math import floor
From CUQIpy we import the classes that we use in this exercise:
from cuqi.testproblem import Heat1D
from cuqi.distribution import Gaussian, JointDistribution
from cuqi.experimental.mcmc import PCN, MH, CWMH
We load the test problem Heat1D
which provides a one dimensional (1D) time-dependent heat model with zero boundary conditions. The model is discretized using finite differences.
The PDE is given by:
where \(u(\xi,\tau)\) is the temperature and \(c^2\) is the thermal diffusivity (assumed to be 1 here). We assume the source term \(f\) is zero. The unknown parameters (random variable) for this test problem is the initial heat profile \(g(\xi):=u(\xi,0)\).
The data \(y\) is a random variable containing the temperature measurements everywhere in the domain at the final time \(\tau^\mathrm{max}\) corrupted by Gaussian noise:
where \(\mathcal{G}(g)\) is the forward model that maps the initial condition \(g\) to the final time solution via solving the 1D time-dependent heat problem. \(\eta\) is the measurement noise.
Given observed data \(y^\text{obs}\) the task is to infer the initial heat profile \(g\).
Before we load the Heat1D
problem, let us set the parameters: final time \(\tau^\mathrm{max}\), number of finite difference nodes \(N\), and the length of the domain \(L\)
N = 30 # Number of finite difference nodes
L = 1 # Length of the domain
tau_max = 0.02 # Final time
We assume that the exact initial condition that we want to infer is a step function with three pieces. Here we use regular Python and NumPy functions to define the initial condition.
n_steps = 3
n_steps_values = [0,1,2]
myExactSolution = np.zeros(N)
start_idx=0
for i in range(n_steps):
end_idx = floor((i+1)*N/n_steps)
myExactSolution[start_idx:end_idx] = n_steps_values[i]
start_idx = end_idx
We plot the exact solution for each node \(\xi_i\):
plt.plot(myExactSolution)
plt.xlabel("i")
Text(0.5, 0, 'i')
While it is possible to create a PDE model from scratch, for this notebook we use the Heat1D
test problem that is already implemented in CUQIpy.
We may extract components of a Heat1D
instance by calling the get_components
method.
model, data, problemInfo = Heat1D(
dim=N,
endpoint=L,
max_time=tau_max,
exactSolution=myExactSolution
).get_components()
Let us take a look at what we obtain from the test problem. We view the model
:
model
CUQI PDEModel: Continuous1D(30,) -> Continuous1D(30,).
Forward parameters: ['x'].
PDE: TimeDependentLinearPDE.
Note that the forward model parameter, named x
here, is the unknown parameter which we want to infer. It represents the initial condition \(g\) above, or a a parameterization of it.
We can look at the returned data
:
data
CUQIarray: NumPy array wrapped with geometry.
---------------------------------------------
Geometry:
Continuous1D(30,)
Parameters:
True
Array:
CUQIarray([0.02155588, 0.04193481, 0.08725646, 0.12451742, 0.1836122 ,
0.24767928, 0.28258524, 0.34038648, 0.40991494, 0.48751798,
0.57255476, 0.69108276, 0.75281419, 0.86716848, 0.92012267,
1.02898355, 1.08892438, 1.17718424, 1.21036726, 1.24497108,
1.24995716, 1.2938437 , 1.24399608, 1.20878174, 1.10657437,
0.98088088, 0.82656687, 0.62407869, 0.44378147, 0.21086027])
And the problemInfo
:
problemInfo
ProblemInfo with the following set attributes:
['infoString', 'exactData', 'exactSolution']
infoString: Noise type: Additive i.i.d. noise with mean zero and signal to noise ratio: 200
Now let us plot the exact solution (exact initial condition) of this inverse problem and the exact and noisy data (the final time solution before and after adding observation noise):
problemInfo.exactSolution.plot()
problemInfo.exactData.plot()
data.plot()
plt.legend(['exact solution', 'exact data', 'noisy data']);
Note that the values of the initial solution and the data at 0 and \(L\) are not included in this plot.
Try yourself (optional)#
The data plotted above was generated from the model. Confirm that the model actually generates this data (the exact data) by applying
model.forward
on the exact solution (the initial heat profile).
# Your code here
Can you view the heat profile at time \(\tau= 0.001\)? At \(0.002, 0.003, ..., 0.02\)? What do you notice? (hint: you can do that by choosing different final time when loading
Heat1D
).
# Your code here
In case the parameters were changed in the exercise, we reset them here
N = 30 # Number of finite difference nodes
L = 1 # Length of the domain
tau_max = 0.02 # Final time
model, data, problemInfo = Heat1D(
dim=N,
endpoint=L,
max_time=tau_max,
exactSolution=myExactSolution
).get_components()
2. Building and solving the Bayesian inverse problem #
The joint distribution of the data \(y\) and the parameter \(x\) (where \(x\) represents the unknown values of the initial condition \(g\) at the grid nodes in this section) is given by
Where \(p(x)\) is the prior pdf, \(p(y|x)\) is the data distribution pdf (likelihood). We start by defining the prior distribution \(p(x)\):
mean = 0
std = 1.2
x = Gaussian(mean, cov=std**2, geometry=model.domain_geometry) # The prior distribution
Try yourself (optional)#
Create prior samples.
Plot the \(95\%\) credibility interval of the prior samples.
Look at the \(95\%\) credibility interval of the PDE model solution to quantify the forward uncertainty.
# Your code here
To define the data distribution \(p(y|x)\), we first estimate the noise level. Because here we know the exact data, we can estimate the noise level as follows:
sigma_noise = np.std(problemInfo.exactData - data)*np.ones(model.range_dim) # noise level
And then define the data distribution \(p(y|x)\):
y = Gaussian(mean=model(x), cov=sigma_noise**2, geometry=model.range_geometry)
Now that we have all the components we need, we can create the joint distribution \(p(x,y)\), from which the posterior distribution can be created by setting \(y=y^\text{obs}\) (we named the observed data as data
in our code):
First, we define the joint distribution \(p(x,y)\):
joint = JointDistribution(y, x)
print(joint)
JointDistribution(
Equation:
p(y,x) = p(y|x)p(x)
Densities:
y ~ CUQI Gaussian. Conditioning variables ['x'].
x ~ CUQI Gaussian.
)
The posterior distribution pdf is given by the Bayes rule: $\( p(x|y=y^\text{obs}) \propto p(y=y^\text{obs}|x)p(x) \)\( By setting \)y=\texttt{data}$ in the joint distribution we obtain the posterior distribution:
posterior = joint(y=data)
print(posterior)
Posterior(
Equation:
p(x|y) ∝ L(x|y)p(x)
Densities:
y ~ CUQI Gaussian Likelihood function. Parameters ['x'].
x ~ CUQI Gaussian.
)
We can now sample the posterior. Let’s try the preconditioned Crank-Nicolson (pCN) sampler (~30 seconds):
MySampler = PCN(posterior)
posterior_samples = MySampler.warmup(30000, tune_freq=0.01).get_samples()
100%|██████████| 30000/30000 [00:32<00:00, 933.43it/s]
Note here we use the warmup
sampling method for the entire sampling run and we set the tuning frequency to 0.01, which means that the scale of the proposal distribution is adapted every 1% of the samples.
Let’s look at the \(95\%\) credible interval:
posterior_samples.plot_ci(95, exact=problemInfo.exactSolution)
[<matplotlib.lines.Line2D at 0x19bb8c820>,
<matplotlib.lines.Line2D at 0x19bb8cb80>,
<matplotlib.collections.PolyCollection at 0x19bb8c430>]
We can see that the mean reconstruction of the initial solution matches the general trend of the exact solution to some extent but it does not capture the piece-wise constant nature of the exact solution.
Also we note that since the heat problem has zero boundary conditions, the initial solution reconstruction tend to go to zero at the right boundary.
3. Parametrizing the unknown parameters via step function expansion #
One way to improve the solution of this Bayesian problem is to use better prior information. Here we assume the prior is a step function with three pieces (which is exactly how we created the exact solution). This also makes the Bayesian inverse problem simpler because now we only have three unknown parameters to infer.
To test this case we pass field_type='Step'
to the constructor of Heat1D
, which creates a StepExpansion
domain geometry for the model during initializing the Heat1D
test problem. It is also possible to change the domain geometry manually after initializing the test problem, but we will not do that here.
The parameter field_params
is a dictionary that is used to pass keyword arguments that the underlying domain field geometry accepts. For example StepExpansion
has a keyword argument n_steps
and thus we can set field_params={'n_steps': 3}
.
n_steps = 3 # number of steps in the step expansion domain geometry
N = 30
model, data, problemInfo = Heat1D(
dim=N,
endpoint=L,
max_time=tau_max,
field_type="Step",
field_params={"n_steps": n_steps},
exactSolution=myExactSolution,
).get_components()
Let’s look at the model
in this case:
model
CUQI PDEModel: StepExpansion(3,) -> Continuous1D(30,).
Forward parameters: ['x'].
PDE: TimeDependentLinearPDE.
We then continue to create the Bayesian inverse problem (prior, data distribution and then posterior) with a prior of dimension = n_steps.
# Prior
x = Gaussian(mean, std**2, geometry=model.domain_geometry)
# Data distribution
sigma_noise = np.std(problemInfo.exactData - data)*np.ones(model.range_dim) # noise level
y = Gaussian(mean=model(x), cov=sigma_noise**2, geometry=model.range_geometry)
And the posterior:
joint = JointDistribution(y, x)
posterior = joint(y=data)
We then sample the posterior using Metropolis Hastings MH
sampler (~30 seconds)
MySampler = MH(posterior)
posterior_samples = MySampler.warmup(30000, tune_freq=0.01).get_samples()
100%|██████████| 30000/30000 [00:27<00:00, 1071.97it/s]
Let’s take a look at the posterior:
posterior_samples.plot_ci(95, exact=problemInfo.exactSolution)
posterior_samples.shape
(3, 30000)
We show the trace plot: a plot of the kernel density estimator (left) and chains (right) of the n_steps
variables:
posterior_samples.plot_trace()
array([[<AxesSubplot: title={'center': 'x0'}>,
<AxesSubplot: title={'center': 'x0'}>],
[<AxesSubplot: title={'center': 'x1'}>,
<AxesSubplot: title={'center': 'x1'}>],
[<AxesSubplot: title={'center': 'x2'}>,
<AxesSubplot: title={'center': 'x2'}>]], dtype=object)
We show pair plot of 2D marginal posterior distributions:
posterior_samples.plot_pair()
array([[<AxesSubplot: ylabel='x1'>, <AxesSubplot: >],
[<AxesSubplot: xlabel='x0', ylabel='x2'>,
<AxesSubplot: xlabel='x1'>]], dtype=object)
We notice that there seems to be some burn-in samples until the chain reaches the high density region. We show the pair plot after removing 200 burn-in:
posterior_samples.burnthin(1000).plot_pair()
array([[<AxesSubplot: ylabel='x1'>, <AxesSubplot: >],
[<AxesSubplot: xlabel='x0', ylabel='x2'>,
<AxesSubplot: xlabel='x1'>]], dtype=object)
We can see that, visually, the burn-in is indeed removed. Another observation here is the clear correlation (or inverse correlation) between each pair of the variables.
We can also see the effect of removing the burn-in if we look at the \(100\%\) credible interval before and after removing the burn-in. Let us look at the \(100\%\) credible interval before:
posterior_samples.plot_ci(100, exact=problemInfo.exactSolution)
[<matplotlib.lines.Line2D at 0x19bf03550>,
<matplotlib.lines.Line2D at 0x19bf038b0>,
<matplotlib.collections.PolyCollection at 0x19bf03160>]
And after removing the burn-in:
posterior_samples.burnthin(1000).plot_ci(100, exact=problemInfo.exactSolution)
[<matplotlib.lines.Line2D at 0x19bf8add0>,
<matplotlib.lines.Line2D at 0x19bf8b130>,
<matplotlib.collections.PolyCollection at 0x19bf8ab00>]
We compute the effective sample size (ESS) which approximately gives the number of independent samples in the chain:
posterior_samples.compute_ess()
array([ 73.16167664, 153.2512918 , 186.62845361])
Try it yourself (optional):#
For this step function parametrization, try to enforce positivity of prior and the posterior samples via log parametrization which can be done by initialing
Heat1D
withmap = lambda x : np.exp(x)
. Then run the Metropolis Hastings sampler again (similar to part 3).
# Your code here
4. ★ Observe on part of the domain #
Here we solve the same problem as in section 3 but with observing the data only on the right half of the domain.
We chose the number of steps to be 4:
N = 30
n_steps = 4 # Number of steps in the StepExpansion geometry.
Then we write the observation_nodes
map which can be passed to the Heat1D
.
It is a lambda function that takes the forward model range grid (range_grid
) as an input and generates a sub grid of the nodes where we have observations (data).
# observe in the right half of the domain
observation_nodes = lambda range_grid: range_grid[np.where(range_grid>L/2)]
We load the Heat1D
problem. Note in this case we do not pass an exactSolution
. If no exactSolution
is passed, the Heat1D
test problem will create an exact solution.
model, data, problemInfo = Heat1D(
dim=N,
endpoint=L,
max_time=tau_max,
field_type="Step",
field_params={"n_steps": n_steps},
observation_grid_map=observation_nodes,
).get_components()
Now let us plot the exact solution of this inverse problem and the exact and noisy data:
problemInfo.exactSolution.plot()
problemInfo.exactData.plot()
data.plot()
plt.legend(['exact solution', 'exact data', 'noisy data']);
We then continue to create the Bayesian inverse problem (prior, data distribution and then posterior) with a prior of dimension = 4.
# Prior
x = Gaussian(1, std**2, geometry=model.domain_geometry)
# Data distribution
sigma_noise = np.std(problemInfo.exactData - data)*np.ones(model.range_dim) # noise level
y = Gaussian(mean=model(x), cov=sigma_noise**2, geometry=model.range_geometry)
And the posterior:
joint = JointDistribution(y, x)
posterior = joint(y=data)
We then sample the posterior using the Metropolis Hastings sampler (~40 seconds)
MySampler = MH(posterior, initial_point=np.ones(posterior.dim))
posterior_samples = MySampler.warmup(20000, tune_freq=0.01).get_samples()
100%|██████████| 20000/20000 [00:18<00:00, 1058.62it/s]
Let’s take a look at the posterior:
posterior_samples.burnthin(1000).plot_ci(95, exact=problemInfo.exactSolution)
[<matplotlib.lines.Line2D at 0x19c17ac80>,
<matplotlib.lines.Line2D at 0x19c17afe0>,
<matplotlib.collections.PolyCollection at 0x19c17a890>]
We see that the credible interval is wider on the side of the domain where data is not available (the left side) and narrower as we get to the right side of the domain.
5. ★ Parametrizing the unknown parameters via KL expansion #
Here we explore the Bayesian inversion for a more general exact solution. We parametrize the Bayesian parameters using Karhunen–Loève (KL) expansion. This will represent the inferred heat initial profile as a linear combination of sine functions.
where \(\xi_j\) is the \(j^\text{th}\) grid point (in a regular grid), \(j=0, 1, 2, 3, ..., N-1\), \(N\) is the number of nodes in the grid, \(\gamma\) is a decay rate, \(\tau_\mathrm{KL}\) is a normalization constant, and \(x_i\) are the expansion coefficients (the Bayesian parameters here).
Let’s load the Heat1D
test case and pass field_type = 'KL'
, which behind the scenes will set the domain geometry of the model to be a KL expansion geometry (See KLExpansion
documentation for details):
N = 35
model, data, problemInfo = Heat1D(
dim=N,
endpoint=L,
max_time=tau_max,
field_type="KL"
).get_components()
Now we inspect the model
:
model
CUQI PDEModel: KLExpansion(35,) -> Continuous1D(35,).
Forward parameters: ['x'].
PDE: TimeDependentLinearPDE.
And the exact solution and the data:
problemInfo.exactSolution.plot()
problemInfo.exactData.plot()
data.plot()
plt.legend(['exact solution', 'exact data', 'noisy data']);
Note that the exact solution here is a general signal that is not constructed from the basis functions. We define the prior \(p(x)\) as an i.i.d multivariate Gaussian with variance \(3^2\). We determined the choice of this variance value through trial and error approach to get a reasonable Bayesian reconstruction of the exact solution:
sigma_prior = 3*np.ones(model.domain_dim)
x = Gaussian(mean, sigma_prior**2, geometry=model.domain_geometry)
We define the data distribution:
sigma_noise = np.std(problemInfo.exactData - data)*np.ones(model.range_dim) # noise level
y = Gaussian(mean=model(x), cov=sigma_noise**2, geometry=model.range_geometry)
And the posterior distribution:
joint = JointDistribution(y, x)
posterior = joint(y=data)
We sample the posterior, here we use Component-wise Metropolis Hastings (~90 seconds):
MySampler = CWMH(posterior, initial_point=np.ones(N))
posterior_samples = MySampler.warmup(2000, tune_freq=0.01).get_samples()
100%|██████████| 2000/2000 [01:13<00:00, 27.20it/s]
And plot the \(95\%\) credibility interval (you can try plotting different credibility intervals, e.g. \(80\%\))
posterior_samples.plot_ci(95, exact=problemInfo.exactSolution)
[<matplotlib.lines.Line2D at 0x19b70b700>,
<matplotlib.lines.Line2D at 0x19b708880>,
<matplotlib.collections.PolyCollection at 0x19b708910>]
The credibility interval can have zero width at some locations where the upper and lower limits seem to intersect and switch order (uppers becomes lower and vice versa). To look into what actually happen here, we plot some samples:
posterior_samples_burnthin = posterior_samples.burnthin(0,10)
for i, s in enumerate(posterior_samples_burnthin):
model.domain_geometry.plot(s)
The samples seem to paint a different picture than what the credibility interval plot shows. Note that the computed credibility interval above, is computed on the domain geometry parameter space, then converted to the function space for plotting. We can alternatively convert the samples to function values first, then compute and plot the credibility interval.
Convert samples to function values:
funvals_samples = posterior_samples.funvals
Then plot the credibility interval computed from the function values:
funvals_samples.plot_ci(95, exact=problemInfo.exactSolution)
[<matplotlib.lines.Line2D at 0x19be6faf0>,
<matplotlib.lines.Line2D at 0x19bea4220>,
<matplotlib.collections.PolyCollection at 0x19be6f6d0>]
We can see that the credibility interval now reflects what the samples plot shows and does not have these locations where the upper and lower bounds intersect.
Let’s look at the effective sample size (ESS):
posterior_samples.compute_ess()
array([ 48.80059703, 171.82597925, 12.21091786, 46.54004929,
22.52401927, 67.68602087, 74.08079598, 58.37290908,
41.83281616, 54.56967814, 52.70384653, 40.01953536,
331.1514258 , 32.0284844 , 65.05234881, 60.14208066,
61.04166084, 9.08819505, 48.02420851, 25.61172436,
93.76350538, 30.75696809, 44.13958884, 9.24421007,
37.46170191, 44.30450236, 62.1372466 , 48.60346056,
15.13719105, 34.13800695, 56.22208743, 71.07707933,
32.12711357, 43.70962222, 31.33962729])
We note that the ESS varies considerably among the variables. We can view the trace plot for, let’s say, the first and the second variables:
posterior_samples.plot_trace([0,1])
array([[<AxesSubplot: title={'center': 'x0'}>,
<AxesSubplot: title={'center': 'x0'}>],
[<AxesSubplot: title={'center': 'x1'}>,
<AxesSubplot: title={'center': 'x1'}>]], dtype=object)
We note that the (as expected from the values of ESS) the chain quality of \(x1\), which corresponds to the second coefficient in the KL expansion, is much better than that of the first variable \(x0\). Low quality chain (where chain samples are highly correlated) indicates difficulty in exploring the corresponding parameter and possibly high sensitivity of the model to that particular parameter. Sampling methods that incorporate gradient information (which we do not explore in this notebook) are expected to work better in this situation.
A third way of looking at the credibility intervals, is to look at the expansion coefficients \(x_i\) credibility intervals. We plot the credibility intervals for these coefficients from both prior and posterior samples by passing the flag plot_par=True
to plot_ci
function:
The prior:
plt.figure()
x.sample(1000).plot_ci(95, plot_par=True)
plt.xticks(np.arange(x.dim)[::5]);
The posterior:
posterior_samples.plot_ci(95, plot_par=True)
plt.xticks(np.arange(x.dim)[::5]);
By comparing the two plots above, we see that the first few coefficients are inferred with higher certainty than the remaining coefficients. In parts, this is due to the nature of the Heat problem where high oscillatory initial heat features (corresponding to the higher modes in the expansion) will be smoothed out (lost) faster and thus are harder to retrieve based on measurements from the final solution.