Optimization¶
In this chapter, we will explain: how elsa solves different problems, the general approach it takes, and some algorithms to solve certain specific and common problem formulations.
X-ray CT as an Inverse Problem¶
Before, we dive in, we need to explain how we arrive at the optimization problem. This will only be a sketch, if you’re unsure, we kindly refer you to one of the references given in the previous chapter. If you already know this part, feel free to skip ahead.
Inverse problems are all kinds of problems. Intuitivly, you can think of the solution of an inverse problem, to finding the cause of an effect, but only knowing the observations. For X-ray CT, the cause of the effect are the X-ray projections, and the effect is the X-rays traveling through the object of interest from different positions around the object.
Mathematically, we describes the effect as some linear operator (in finite dimensions, you can think of it as a matrix). So what we have is a simple system of linear equations:
where $\mathcal{A}$ is the effect, which acts on some quantity $x$, and the observation of the effect on the quantity is $m$. Again, for X-ray CT, the quantities are attenuation coefficients of an object of interest, the observations are X-ray projections, and the effect is the so called X-ray transform, which models the interaction of X-rays with the object of interest.
Now, this is simply a linear system of equation, and one could use classical methods, such as the QR decomposition. However, for X-ray CT, the size of the system is typically very very large. Each row of the matrix, represents the interaction of a single ray with the object of interest. So the number of rows grows quickly with the number of projections, and the resolution of each projection. Storing the matrix densely, would quickly costs many terabytes of data and even using a sprase representation is not sufficient. So for practical implementations, the X-ray transform is computed on the fly. Which makes classical methods impossible to use. Here, we will present one way to solve the above equation.
One the way to a simple optimization problem¶
A mathematically intuitive approach, is to find a solution, for which the error is minimal. This could looks like this:
In this formulation one minimizes the least squares problem. We try to find some $x$, which is as close to the measurements, if we transform it into the measurement space using the operator $\mathcal{A}$. This formulation of least squares is highly important to many imaging modalities. The least squares formulations somehow measures how far away is our $x$ from the measurements we took.
This formulation has some other benefits. If the measurements contain some kind of noise, we can still expect some robustness (if the noise isn’t to strong). This wasn’t the case for the above system of linear equations.
Please note, that this is only one way to arrive at the least squares formulation. There are many different explanations, which are mostly all valid.
Generalization¶
Now, assume we arrived at the above optimization problem with least squares. In elsa, we generalize this further to the following formulation:
where $x$ is the quantity one wishes to reconstruct (e.g. the attenuation coefficients), $D(Ax, m)$ represents the data fidelity term, and $\mathcal{R}_k$ are regularization terms, with the corresponding regularization parameter $\lambda_k$. $\mathcal{A}$ is the forward model (i.e. the X-ray transform).
The data fidelity term represents, how closely our desirded quantity models the measurements. In X-ray CT, if you assume Gaussian noise the least squares functional $\frac{1}{2} || \mathcal{A}x - m ||_2^2$ is most often used. If the dosage level is low, and hence the assumption of Gaussian noise is wrong, one often uses the weighted least squares formulation or the transmission log likelihood.
Regularization enforces some assumed prior information. Common types of regularization are the $\ell^2$, $\ell^1$ and Total Variation (TV) regularization, which all assume different prior information.
Next, we will cover some common types of problems and what kind of algorithms can be used to solve each respective problems.
Least Squares¶
Let us first assume, no regularization terms and the least squares functional
as the data fidelity term. A number of algorithms are developed to solve this
problem. First, as the leasts squares functional is differentiable, all
first-order methods (e.g. FGM
or OGM
) are a possible
options. Gradient descent for the least squares functional is also known as
Landweber
, for which some properties are well studied
(such as step lengths for which the problem is guaranteed to convergece).
Another popular algorithm is Conjugate Gradient
(CG). CG is
different from the usual first-order methods, by not choosing the direct
negative gradient (as with vanilla gradient descent), but always choosing
conjugate directions. This drastically improves performance. Importantly, CG
usually provides pleasing results in few iterations. Other gradient based
methods, usually need in the order of hundreds of iterations.
However, one of the key problems with the least squares approach is noise. If the noise is too severe the reconstruction will quickly overfit the noise.
$\ell^2$ Regularization¶
$\ell^2$ Regularization is an often used an well studied regularization. It assumes that the solutions $\ell^2$-norm is small. As a result, images reconstructed with the $\ell^2$-norm are usually blurry and edges are washed out. However, artifacts which appear as sharp edges are suppressed as well. The problem is often known as Tikhonov regularization as well.
Most of the algorithms discussed for the least squares problem are also applicable for the $\ell^2$ regularization. Specifically, the first-order methods and CG can be used for it.
Interestingly, the regularization term can incoorporate further operators, resulting in the General Tikhonov Regularisation. Specifically, interesting is the gradient operator. The problem then is given as:
To certain degrees this reduces the blurrines of the reconstructed images. However, the assumption of e.g. a small gradient (if $\mathcal{L} = \nabla$) is often not practical.
Edge preserving Regularization¶
Edge preserving regularization improves upon the general Tikhonov problem in some sense. The $\ell^2$-norm is replaced by an edge preserving functional. A quite well known option is the Huber functional. It is a differentiable approximation of the $\ell^1$-norm. Hence, first-order methods are still usable to solve the problem.
This problem formulation creates reasonable good images. However, they must be tuned carefully, as the closer the approximation to the $\ell^1$-norm gets, the larger the Lipschitz constant of the problem becomes, which in turns decreases the possible step size for the first-order methods. On the other hand, if not close enough to the $\ell^1$-norm, it will create images which are very similar to the plan $\ell^2$-regularization.
$\ell^1$ Regularization¶
Optimization problems with $\ell^1$ regularization are of the from
This problem is also ofen know as LASSO (least absolute shrinkage and selection operator). The problem favours sparse solutions, i.e. many components will be zero. For many applications this is beneficial. However, it turns out that thou X-ray CT images are not necessarily sparse, LASSO problems still handle noise quite well.
However, the LASSO problem comes at a cost. All of the above discussed
algorithms no longer work. As the problem is not continuously differentiable
anymore, no first-order methods can be used anymore. Other algorithm have been
developed. Among the most important ones are ISTA and FISTA, which are
specifcal casses of the proximal gradient descent
and
accelerated proximal gradient descent
respectively.
Constrained optimization¶
So far, all problems introduced here are unconstrained. For X-ray CT the non-negativity constrain is a sensible and powerful constraint. I.e. the closed convex set $\mathcal{C} = \mathbb{R}_+^n$. Then the problem is defined as:
It can also be formulated using the indicator function:
This problem can be solved using proximal gradient descent
and accelerated proximal gradient descent
, as the proximal
operator of the indicator function is the projection onto the set.
This problem is a very nice default for X-ray CT. Assuming sufficient and good data, the reconstructions with a non-negativity constrain are usually excellent.
Total Variation Regularization¶
The final for of regularization disucssed here is total variation. The problem is defined as:
The prior information included in this regularization is a sparse gradient, i.e. the problem penalizes frequent changes in the gradient. This results in piecewise smooth images. For many image applications in X-ray CT this is a good default assumption.
This problem is the hardest problem to solve in this collection. None of the
previously mentioned algorithms are able to solve this. There are two popular
algorithms to solve problems with TV regularization, alternating direction
method of multipliers (ADMM) and Primal-Dual Hybrid Gradient (PDHG) can be used
as well. In elsa, we currently implement two forms of ADMM, one with an
explicit least squares term (ADMML2
) and a special version called
linearized ADMM
(see its documentation for
details)
Thou these algorithms can solve a variety of complex and important problems. They also need tuning of multiple variables, for which no real default values exist. This makes them cumbersome to work with.