$\DeclareMathOperator{\arccosh}{arccosh} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator{\Exp}{Exp} \newcommand{\geo}{\gamma_{\overset{\frown}{#1,#2}}} \newcommand{\geoS}{\gamma} \newcommand{\geoD}{\gamma_} \newcommand{\geoL}{\gamma(#2; #1)} \newcommand{\gradM}{\nabla_{\M}} \newcommand{\gradMComp}{\nabla_{\M,#1}} \newcommand{\Grid}{\mathcal G} \DeclareMathOperator{\Log}{Log} \newcommand{\M}{\mathcal M} \newcommand{\N}{\mathcal N} \newcommand{\mat}{\mathbf{#1}} \DeclareMathOperator{\prox}{prox} \newcommand{\PT}{\mathrm{PT}_{#1\to#2}#3} \newcommand{\R}{\mathbb R} \newcommand{\SPD}{\mathcal{P}(#1)} \DeclareMathOperator{\Tr}{Tr} \newcommand{\tT}{\mathrm{T}} \newcommand{\vect}{\mathbf{#1}}$

# Computing the mean with gradient descent

This tutorial introduces the Riemannian center of mass briefly and how to compute this generalization of a mean on an arbitrary manifold using a gradient descent algorithm.

For real-valued points $y_1,\ldots,y_m\in\mathbb R$ the mean is given by $x^* = \displaystyle\frac{1}{m}\sum_{i=1}^m y_i$.

For the more general setting $x_1,\ldots,x_n\in\M$ we do not have the possibility to perform an addition. But we can generalize the mean following the work of Hermann Karcher  by looking at the optimization problem

Since the optimality conditions read

we directly obtain for the real valued case the formula for the mean. Furthermore, we can define the same optimization problem on a manifold, since all involved terms are squared distances $(y-y_i)^{2} = d^{2}_{\M} (y,y_i)$ on $\M=\mathbb R$. This leads to the following minimization problem

where the resulting $x^*$ is called Riemannian center of mass or sometimes Karcher mean. The optimality conditions read

see, for example Afsari  for the derivation of the involved gradient. With this gradient at hand we can start a gradient descent algorithm.

In MVIRT we start by initializing the toolbox in the main directory. We further create a manifold, the two-dimensional sphere $\mathbb S^2$ using the code

1
2
3
4
5
6
7
initMVIRT();
M = Sn(2);
pts = 30;
x = ArtificialS2SignalLemniscate(pts,'a',1);
figure(1);
plotS2(x);
title('Given data on S2');


The last two lines perform a plot, which looks like the following figure. A set of points on the sphere $\mathbb S^2$.

The gradient descent works now as follows. For some start point $x^{(0)}$ we perform

for some step size rule (values) $s_k$.

Besides defining the objective function itself for checks, we have to define the gradient, a step size rule – first let’s stick so setting $s_k=\frac{1}{2}$ and a stopping criterion. For the last we choose either 100 iterations or if two iterates are nearer than $% $. The lines read

1
2
3
4
5
6
7
F = @(p) 1/pts*sum( M.dist(repmat(p,[ones(size(M.ItemSize)),pts]),x).^2 );
gradF = @(p) sum( -2/pts*M.log( ...
repmat(p,[ones(size(M.ItemSize)),pts]),x),length(M.ItemSize)+1);
stepSizeRule = @(x,eta,iter,initial) 1/2;
stoppingCriterion = stopCritMaxIterEpsilonCreator(M,100,10^(-9));
stoppingCriterion,'Record',@(x,xold,iter) [F(x);iter]);


The last two lines call the iteration of the gradient descent, where further in every iteration we record first the functional value $F(x^{(k)})$ and the iteration for reference. Notice, that we reshape using the size of M.ItemSize where we could also just say [1,pts], since the first dimension is the only one of the manifold since on $\mathbb S^n$ we are dealing with vectors. We will see later what the advantage of this form yields.

This algorithm stops after 8 Iterations and the result is drawn in orange in the following image. A set of points on the sphere $\mathbb S^2$ and their mean.

The image is created using exportSpherePCT2Asy({x,p1},{},{},{c1,c2}, 'File', 'InitialPoints.asy', 'Camera', [.3,.3,1]); and Asymptote for rendering. The mentioned export accepts a cell array of points, curves (points drawn with a connecting line) and tuples indicating tangent vectors (with their base points prepended) as cells and a cell array of colors, one for each cell in the three preceeding variables.

We can further improve the result by performing an Armijo step size rule

1
2
3
4
stepSizeRuleA = @(x,eta,iter,initial) ...
stepSizeArmijo(M,F,x,eta,'InitialStepSize',initial,'rho',0.99,'c',0.5);
stoppingCriterion,'Record',@(x,xold,iter) [F(x);iter]);


which finishes with the same point, i.e. M.dist(p1,p2) yields 0, but only requires 5 steps.

Switching to another manifold is quite straight forward. If we want to compute the mean of a set of symmettric positive definite matrices, we just introduce another manifold (overwriting M) and creating new data

1
2
3
4
M = SymPosDef(3);
data = ArtificialSPDImage(pts,1.5); x = data(:,:,:,18);
figure(2);
plotSPD(x,'EllipsoidPoints',30); %fine ellipses


The data is a set of 30 s.p.d matrices which look like A set of points on the manifold $\mathcal P(3)$ of symmetric positive definite matrices visualized as ellipsoids.

This image is created by calling exportSPD2Asymptote(x,'File','myExport.asy']); and using Asymptote for rendering. 

and we again define our objective function F, its gradient gradF, and a stoppingCriterion

1
2
3
F = @(p) 1/pts*sum( M.dist(repmat(p,[ones(size(M.ItemSize)),pts]),x).^2 );
gradF = @(p) sum( -2/pts*M.log(repmat(p,[ones(size(M.ItemSize)),pts]),x),length(M.ItemSize)+1);
stoppingCriterion = stopCritMaxIterEpsilonCreator(M,1000,10^(-9));


They look quite similar, since we again use M.ItemSize and its length. Now M.ItemSize is [3,3] and its length is 2, i.e. the first two dimensions of p or x are the manifold-dimensions of the array. The only reason we have to redefine these three, is, that the manifold M is saved in the local scope of the anonymous function. The code however indicates, that this way many functions can be written without dealing with the array dimensions directly. This further axplains why we don not have to redefine stepSizeRule, it only returns a value. We would have to redefine an stepSizeArmijo basded rule, but we stick for this example to the simple rule.

Then calling

1
2
stoppingCriterion,'Record',@(x,xold,iter) [F(x);iter]);


yields the mean shown in the following figure. The corresponding mean of the $\mathcal P(3)$-valued vector above.

The method presented in this tutorial is also directly available calling M.mean on your favorite manifold M` that implements at least the abstract functions.

2. Afsari, B (2011). Riemannian $L^p$center of mass: Existence, uniqueness, and convexity. Proceedings of the American Mathematical Society. 139 655–73