This tutorial introduces the cyclic proximal point algorithm using again the example of the Riemannian median that can be introduced analogously to the Riemannian center of mass.
Similar to the Riemannian center of mass and the variational formulation introduced by Karcher (missing reference) the Riemannian median of given data points , can be defined by
Note that even the median for real numbers is usually defined to be unique, but the variational formulation is not.
Furthermore, the function is continuous but not differentiable in any of the data points. Though the gradient can be derived, see e.g. [1] for any as
where we only obtain a subdifferential at the points . Hence we can not use a gradient descent algorithm can not be applied. We have to use a subgradient descent algorithm, which was generalized to manifolds in [2].
We will use another approach that was introduced by Baçâk [3] for Hadamard spaces. It is based on the proximal point algoritm introduced on Riemannian manifolds by Ferreira and Oliveira [4] as well as the realvalued analogue of the cyclic proximal point algorithm by Bertsekas,see for example [5].
The main idea is base on two ideas: the proximal map and the cyclic proximal point algorithm.
The proximal map
The proximal map of a function and a positive real value is defined in [4] as
The intuition is, that the proximal map at is a tradeoff (weighted by the parameter ) between minimizing and not being too far away from . Furthermore, if is a minimizer of then the proximal map has a fix point, i.e. , since then the first term is zero and the second is minimal.
Furthermore on the proximal map is known to be nonexpoansive and hence iterates
converge to the minmizer for a convex, lsc. function .
Since we would like to minimize our objective and the minimizer is not known in closed form, it will not help to set , since solving the minimization problem for the proximal map involves itself for this case.
The cyclic proximal point algorihtm
For a function
that consists of summands where the proximal maps can be evaluated “easily” or efficiently the cyclic proximal point algorithm consists of cycling through the proximal maps. To emphasize the cyclic order we use fractional iterate numbers, i.e, we write
where the inner cycle is performed before inceasing .
This algorithms converges to a minimizer of in Euclidean space [5] and on Hadamard manifolds (missing reference) if the sequence is square summable but not summable, i.e. we have
Back to our problem of finding a minimizer of , the Riemannian center of mass: each summand is of the form , for some fixed (given) data , . The corresponding proximal maps can be computed in closed form
where we write equality though the minimizer might not be unique. Namely
when the geodesic connecting and is not unique, the proximal
map consists of all points on such geodesics at
. For the following implementation however,
the evaluation of the geodesic depends on the single valued choice of the
M.log
of the manifold and is hence one of the geodesics in these cases,
which occur only in very few occasions if the data is not local enough.
Implementation
With these preparations we can return to our initial problem and its implementation We start by initializing the MVIRT in the main directory. We further create a manifold, the twodimensional sphere using the code
1
2
3
4
5
6
7
initMVIRT();
M = Sn(2);
pts = 30;
x = ArtificialS2SignalLemniscate(pts,'a',1);
figure(1);
plotS2(x);
title('Given data on S2');
The last two lines perform a plot, which looks like the following figure.
The cyclic proximal point algorithm requires a set of anonymous functions
representing the proximal maps. Since anonymous functions can not be stored
in an array, we require a cell array. While this can be cone using cellfun
,
we use a simple forloop for readability. Similar to the gradient descent,
even with the same interface/parameters in the anonymous function,
we need a stopping criterion. We choose either 100 iterations or
if two iterates are nearer than . The lines
read
Last we define the objective function that we use for debug and later on in
the reference implementaion.
1
2
3
4
5
6
7
8
9
F = @(p) 1/pts*sum( M.dist(repmat(p,[ones(size(M.ItemSize)),pts]),x).^2 );
proximalMaps = cell(pts,1);
for i=1:pts
proximalMaps{i} = @(y,lambda) proxDistance(M,x(M.allDims{:},i),y,lambda);
end
stoppingCriterion = stopCritMaxIterEpsilonCreator(M,2000,10^(9));
[p1,recData1] = cyclicProximalPoint(M,x(M.allDims{:},1),proximalMaps,...
stoppingCriterion,'lambda',pi/64,...
'Record',@(x,xold,iter) [F(x);M.dist(x,xold)]);
The last three lines call the iteration of the cyclic proximal point algorithm,
where further in every iteration we record first the functional value
and the iteration for reference.
By specifying just an initial
value lambda
as ,
the standard iterate is set to . One could also
define a nonstandard iterate using lambdaIterate
, see cyclicProximalPoint.m
The number of iterations (reaching 2000
in the above example) can be reduced
using a pseudorandom order
1
2
3
[p2,recData2] = cyclicProximalPoint(M,x(M.allDims{:},1),proximalMaps,...
stoppingCriterion,'Order','permute','lambda',pi/64,...
'Record',@(x,xold,iter) [F(x);M.dist(x,xold)]);
where in each cycle the order of the proximal maps is randomly permuted.
Furthermore, choosing for each iteration one proximal map (instead of the 30 herein)
can be done setting Order
to Random
.
However this flexibility comes at the disadvantage, that there are 30 anonymous
functions called per iteration.
If we instead would like to use a subgradient algorithm similar to the last tutorial, the (sub)gradient is a little involved, since at any we only have a subdiffential. Since is within this subdifferential, we set
1
2
3
4
5
6
mD = length(M.ItemSize);
gradF = @(p) sum(...
1./(shiftdim(... %the following line avoids division by zero
pts*M.dist(repmat(p,[ones(1,mD),pts]),x) + ...
double(M.dist(repmat(p,[ones(mD,1),pts]),x)==0),mD)).*...
M.log(repmat(p,[ones(mD,1),pts]),x),mD+1);
Note that the lines 3 to 5 are an inline way writing
if this value is defined and else (since then the following M.log
yields
a zero vector anyways). Then calling the subgradient descent reads
1
2
3
4
[p3,recData3] = subGradientDescent(M,x(M.allDims{:},1),F,gradF,...
@(x,descentDir,iter,s) 1/iter,...
stoppingCriterion,...
'Record',@(x,xold,iter) [F(x);M.dist(x,xold)]);
All three algorithms yield the following median, where the CPP algorithms are slower in time but the second on needs fewer iterations than gradient descent obtaining roughly the same accuracy.
The images are again created using exportSpherePCT2Asy({x,p1},{},{},{c1,c2}, 'File', 'InitialPoints.asy', 'Camera', [.3,.3,1]);
and Asymptote for rendering.
Switching to another manifold is again quite straight forward. With
1
2
3
4
M = SymPosDef(3);
data = ArtificialSPDImage(pts,1.5); x = data(:,:,:,18);
figure(2);
plotSPD(x,'EllipsoidPoints',30); %fine ellipses
we define another manifold, the symmetric positive definite matrices with affine metric, and generate set of data namely
This image is created by calling exportSPD2Asymptote(x,'File','myExport.asy']);
and using Asymptote for rendering.
`
Similar to the definitions above, we mainly have to store another manifold within the anonymous functions, but the definitions read the same
1
2
3
4
5
6
F = @(p) 1/pts*sum( M.dist(repmat(p,[ones(size(M.ItemSize)),pts]),x) );
proximalMaps = cell(pts,1);
for i=1:pts
proximalMaps{i} = @(y,lambda) proxDistance(M,y,x(M.allDims{:},i),lambda);
end
stoppingCriterion = stopCritMaxIterEpsilonCreator(M,2000,10^(8));
And also calling the median reads similar
1
2
3
4
[p4,recData4] = cyclicProximalPoint(M,x(M.allDims{:},1),proximalMaps,...
stoppingCriterion,'lambdaIterate',@(iter) pi/8/iter,...
'Order','permute',...
'Record',@(x,xold,iter) [F(x);M.dist(x,xold)]);
yields the mean shown in the following figure.
The method presented in this tutorial is also directly available calling
M.median
on your favorite manifold M
that implements least the abstract
functions.
Note that for this function we perform on a data of items
medians of points each in parallel.
See also
References

Fletcher, P T, Venkatasubramanian, S and Joshi, S (2009). The geometric meadian on Riemannian manifolds with application to
robust atlas estimation. NeuroImage. 45 S143–S152
One of the primary goals of computational anatomy is the statistical analysis of anatomical variability in large populations of images. The study of anatomical shape is inherently related to the construction of transformations of the underlying coordinate space, which map one anatomy to another. It is now well established that representing the geometry of shapes or images in Euclidian spaces undermines our ability to represent natural variability in populations. In our previous work we have extended classical statistical analysis techniques, such as averaging, principal components analysis, and regression, to Riemannian manifolds, which are more appropriate representations for describing anatomical variability. In this paper we extend the notion of robust estimation, a well established and powerful tool in traditional statistical analysis of Euclidian data, to manifoldvalued representations of anatomical variability. In particular, we extend the geometric median, a classic robust estimator of centrality for data in Euclidean spaces. We formulate the geometric median of data on a Riemannian manifold as the minimizer of the sum of geodesic distances to the data points. We prove existence and uniqueness of the geometric median on manifolds with nonpositive sectional curvature and give sufficient conditions for uniqueness on positively curved manifolds. Generalizing the Weiszfeld procedure for finding the geometric median of Euclidean data, we present an algorithm for computing the geometric median on an arbitrary manifold. We show that this algorithm converges to the unique solution when it exists. In this paper we exemplify the robustness of the estimation technique by applying the procedure to various manifolds commonly used in the analysis of medical images. Using this approach, we also present a robust brain atlas estimation technique based on the geometric median in the space of deformable images.@article{FVJ09, author = {Fletcher, P. T. and Venkatasubramanian, S. and Joshi, S.}, title = {The geometric meadian on {R}iemannian manifolds with application to robust atlas estimation}, journal = {NeuroImage}, volume = {45}, number = {1, Supplement 1}, pages = {S143S152}, year = {2009}, doi = {10.1016/j.neuroimage.2008.10.052} }

Ferreira, O P and Oliveira, P R (1998). Subgradient algorithm on Riemannian manifolds. Journal of Optimization Theory and Applications. 97 93–104
The subgradient method is generalized to the context of Riemannian manifolds. The motivation can be seen in nonEuclidean metrics that occur in interiorpoint methods. In that frame, the natural curves for local steps are the geodesies relative to the specific Riemannian manifold. In this paper, the influence of the sectional curvature of the manifold on the convergence of the method is discussed, as well as the proof of convergence if the sectional curvature is nonnegative.@article{FO98, author = {Ferreira, O. P. and Oliveira, P. R.}, title = {Subgradient algorithm on {R}iemannian manifolds}, journal = {Journal of Optimization Theory and Applications}, year = {1998}, volume = {97}, number = {1}, pages = {93104}, doi = {10.1023/A:1022675100677} }

Bačák, M (2014). Computing medians and means in Hadamard Spaces. SIAM Journal on Optimization. 24 1542–66
The geometric median as well as the Fréchet mean of points in a Hadamard space are important in both theory and applications. Surprisingly, no algorithms for their computation are hitherto known. To address this issue, we use a splitting version of the proximal point algorithm for minimizing a sum of convex functions and prove that this algorithm produces a sequence converging to a minimizer of the objective function, which extends a recent result of Bertsekas [Math. Program., 129 (2011), pp. 163–195] into Hadamard spaces. The method is quite robust, and not only does it yield algorithms for the median and the mean, but also it applies to various other optimization problems. We, moreover, show that another algorithm for computing the Fréchet mean can be derived from the law of large numbers due to Sturm [Ann. Probab., 30 (2002), pp. 1195–1222]. In applications, computing medians and means is probably most needed in tree space, which is an instance of a Hadamard space, invented by Billera, Holmes, and Vogtmann [Adv. in Appl. Math., 27 (2001), pp. 733–767] as a tool for averaging phylogenetic trees. Since there now exists a polynomialtime algorithm for computing geodesics in tree space due to Owen and Provan [IEEE/ACM Trans. Comput. Biol. Bioinform., 8 (2011), pp. 2–13], we obtain efficient algorithms for computing medians and means of trees, which can be directly used in practice.@article{Bac14, author = {Ba{\v{c}}{\'a}k, Miroslav}, title = {Computing medians and means in {H}adamard Spaces}, sjournal = {SIAM J. Optim.}, journal = {SIAM Journal on Optimization}, volume = {24}, year = {2014}, number = {3}, pages = {15421566}, eprint = {1210.2145}, eprinttype = {arXiv}, doi = {10.1137/140953393} }

Ferreira, O P and Oliveira, P R (2002). Proximal point algorithm on Riemannian manifolds. Optimization. 51 257–70
@article{FO02, author = {Ferreira, O. P. and Oliveira, P. R.}, title = {Proximal point algorithm on {R}iemannian manifolds}, journal = {Optimization}, year = {2002}, volume = {51}, number = {2}, pages = {257270}, doi = {10.1080/02331930290019413} }

Bertsekas, D P (2010). Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: a Survey. Laboratory for Information and Decision Systems, MIT, Cambridge, MA
We survey incremental methods for minimizing a sum \(\sum_{i=1}^m f_i(x)\)consisting of a large number of convex component functions fi. Our methods consist of iterations applied to single components, and have proved very effective in practice. We introduce a unified algorithmic framework for a variety of such methods, some involving gradient and subgradient iterations, which are known, and some involving combinations of subgradient and proximal methods, which are new and offer greater flexibility in exploiting the special structure of fi. We provide an analysis of the convergence and rate of convergence properties of these methods, including the advantages offered by randomization in the selection of components. We also survey applications in inference/machine learning, signal processing, and largescale and distributed optimization.@techreport{Ber10, author = {Bertsekas, Dimitri P}, title = {Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: a Survey}, year = {2010}, institution = {Laboratory for Information and Decision Systems, MIT}, address = {Cambridge, MA}, number = {LIDSP2848}, eprint = {1507.01030}, eprinttype = {arXiv} }