Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

3. Prior information and Bayesian inverse problems

How do we supply prior information about a solution to an inverse problem, in order to make it well defined and stable with respect to data errors? The key idea is that we must identify important properties of the desired solution, and then impose these properties when solving the problem.

One way to impose such prior information is to use a method called regularization Hansen, 2010Neumaier, 1998, where we minimize a linear combination of a data-fitting term and a regularization term that penalizes solutions that do not adhere to the prior information. As an example we mention Tikhonov regularization, which some readers may already be familiar with, and which takes the form

minxAxb22+λ2x22 .\min_x \| A\, x - b \|_2^2 + \lambda^2 \, \| x \|_2^2 \ .

Here, λ\lambda is a parameter that balances the weight given to the two objectives of fitting the data and penalizing a large norm. Assuming again iid Gaussian noise in the data vector bb, the regularized solution follows a Gaussian distribution with covariance matrix Σ=σ2(ATA+λ2I)1\Sigma = \sigma^2 \bigl( A^TA + \lambda^2 I\bigr)^{-1}. The flexibility of regularization methods lies in the fact that we can often find regularization terms (such as x22\| x \|_2^2) that correspond to our prior information.

A different approach to solving inverse problem - which is the one that underlies uncertainty quantification and CUQIpy - is to formulate the inverse problem in a statistical framework in the form of a Bayesian inverse problem. Here, both the data bb and the solution xx are considered to be a random variables. Unlike traditional regularization methods that seek a single solution, the Bayesian approach treats the inverse problem as a statistical question, producing a full probability distribution for the solution xx. This is called the posterior and it accounts for observational noise, uses prior information about the solution, and provides a complete quantification of uncertainty in the solution. See, e.g., Calvetti & Somersalo (2023) for details.

To understand Bayesian inverse problems and the way they are formulated and used in CUQIpy, we need to introduce four important statistical tools such that they are readily available in the rest of the book. See, e.g., Calvetti & Somersalo (2023, Ch. 1) for more details. For simplicity we consider the linear forward model b=Axb = A\, x where AA is a matrix, bb is the measured data, and xx is the solution.

Data distribution π(bx)\pi(b | x). This is a distribution that specifies how likely it is to observe the data bb given the parameter or parameter-vector xx. A key point is that we assume the noisy data can be explained by a forward model (here b=Axb = A\, x), and therefore the data distribution is conditioned in the variable xx. The second key point is that we assume a statistical description of the noise, and this assumption informs the specific expression for π(bx)\pi(b | x).

Likelihood function L(xb=bobs)L(x | b=b_{\mathrm{obs}}). The likelihood function measures how well the forward model explains the observed data bdatab_\mathrm{data}, and it is a function of the parameters xx. We obtain the likelihood function from the data distribution by fixing the variable bb to be the observed data bobsb_{\mathrm{obs}}. As an example, for our linear forward model with iid Gaussian noise of variance σ2\sigma^2, where m=dim(b)m=\dim(b) is the number of observed data points, the likelihood is

L(xb=bobs)=(2πσ2)m/2exp(Axbobs222σ2) .L(x \mid b=b_{\mathrm{obs}}) = (2\pi\sigma^2)^{-m/2}\,\exp\biggl( - \frac{\|A\, x-b_{\mathrm{obs}}\|_2^2}{2\,\sigma^2} \biggr)\ .

We often write the following proportional form (omitting the factor (2πσ2)m/2(2\pi\sigma^2)^{-m/2}) because that multiplicative constant does not depend on xx:

L(xb=bobs)exp(Axbobs222σ2) .L(x \mid b=b_{\mathrm{obs}}) \propto \exp\biggl( - \frac{\|A\, x-b_{\mathrm{obs}}\|_2^2}{2\,\sigma^2} \biggr)\ .

Prior π(x).\pi(x). This is a distribution that expresses our knowledge, or belief, about the solution xx to the inverse problem. In Bayesian inverse problems it is used to enforce that we prefer solutions that adhere to our prior knowledge, and it plays the same role as the regularization term in classical regularization methods. The prior may be determined from past information or previous experiments, or it can express a subjective assessment of a domain expert Robert, 2007, Ch. 3.

Posterior π(xbobs).\pi(x | b_{\mathrm{obs}}). This distribution expresses the probability of xx according to the measured data as well as our prior information. Hence, it is conditioned on bobsb_{\mathrm{obs}} and it expresses what is known about the solution after the data has been collected and the prior is imposed. The aim of a Bayesian inverse problem is to characterize and compute this probability.

We note that some presentations of Bayesian inverse problems skip the introduction of the data distribution π(bx)\pi(b | x). We include it in CUQIpy and in this book, because we believe this makes it easier and more intuitive to formulate and understand how the forward model and the noise model enter the likelihood function.

Given these important tools, Bayes’ theorem from statistics (see, e.g., Calvetti & Somersalo (2023, sec. 1.2.2)) gives the following expression for the posterior:

π(xbobs)=L(xb=bobs)π(x)π(b) .\pi(x|b_{\mathrm{obs}}) = \frac{L(x | b=b_{\mathrm{obs}}) \, \pi(x)}{\pi(b)} \ .

Here, the denominator (called the marginal likelihood) is a constant with respect to xx; it can always be determined such that the fraction integrates to 1. Throughout, we will therefore write

π(xbobs)L(xb=bobs)π(x) .\pi(x|b_{\mathrm{obs}}) \propto L(x | b=b_{\mathrm{obs}}) \, \pi(x) \ .

The important lesson here is that in order to quantify the uncertainties in xx in a Bayesian inverse problem, we need to specify the data distribution (i.e., the forward model and the distribution of the noise) as well as the prior for the solution. A key point here is that the prior, when properly chosen, helps take care of the violation of the Hadamard conditions in an ill-posed problem. In the following chapters we have much more to say about all this.

In Bayesian inverse problems and uncertainty quantification the goal is thus to determine the posterior for xx. We need to know its “shape” and characteristics because it tells us everything we want to know about the random variable xx. For example, we can produce point estimates of xx, and we can compute variances, covariances, and higher-order moments.

References
  1. Hansen, P. C. (2010). Discrete Inverse Problems: Insight and Algorithms. SIAM.
  2. Neumaier, A. (1998). Solving ill-conditioned and singular linear systems: A tutorial on regularization. SIAM Review, 40(3), 636–666.
  3. Calvetti, D., & Somersalo, E. (2023). Bayesian Scientific Computing (Vol. 215). Springer.
  4. Robert, C. P. (2007). The Bayesian choice: From Decision-Theoretic Foundations to Computational Implementation. Springer.