A Simple Approximation Method for the Fisher-Rao Distance between Multivariate Normal Distributions

Frank Nielsen

2023

Run video full screen to see the difference between Fisher-Rao geodesics (black, computationally time-consuming using geodesic shooting) and our proposed projected SPD geodesic curve which can be calculated fast (green). https://www.mdpi.com/1099-4300/25/4/654
Geodesic shooting from standard normal with initial value conditions

1 The Fisher-Rao distance

Let ℙ(d) denote the set of symmetric positive?definite (SPD) d×d matrices and (d) denote the set of multivariate normal distributions:

{ d2 - 1 ( 1 ⊤ -1 ) d } N (d) := pμ,Σ (x) = (2π) |Σ| 2 exp - 2(x- μ ) Σ (x- μ) : (μ,Σ) ∈ Λ (d) := ℝ × ℙ(d) ,

The Fisher-Rao distance between two normals N(μ₁,Σ₁) and N(μ₂,Σ₂) is the geodesic Riemannian distance on the manifold (,g^Fisher) induced by the Fisher information metric:

ρN (N(λ1),N (λ2)) := inc(ft) {Length(c)}, c(0)=pλ1 c(1)=pλ2

where

∫ 1 Length(c) := dsFisher(c(t))dt, 0

and ds^Fisher(t) := ∘ ⟨˙c(t), ˙c(t)⟩ _c(t) is the Fisher-Rao length element. The inner product ⟨v₁,v₂⟩_N for v₁,v₂ ∈ T_N at normal N is the called the Fisher-Rao norm (with tangent planes T_N is identified to ℝ^d × Sym(d) where Sym(d) be the set of d × d symmetric matrices). The statistical model (d) is of dimension m = dim(Λ(d)) = d + d(d+1)
2 = d(d+3)
2 and identifiable: there is a one-to-one correspondence λ ↔ p_λ(x) between λ ∈ Λ(d) and N(μ,Σ) ∈(d).

When d = 1, the Fisher-Rao distance is known in closed form: $√- ρN(N1,N2) = 2 2 arctanh(Δ(μ1,σ1;μ2,σ2)),$
where Δ(a,b;c,d) = is a Möbius distance and arctanh(u) := log for 0 ≤ u < 1. The Fisher-Rao geodesics are semi-ellipses with centers located on the x-axis:
When the normal distributions belongs to the same submodel _μ = {N(μ,Σ) : Σ ∈(d)}⊂ of normal distributions sharing the same mean μ, we have:
$┌│ ---d-------------- ρ (N ,N ) = │∘ 1∑ log2λ (Σ -1Σ ), Nμ 1 2 2i=1 i 1 2$
where λ_i(M) denotes the i-th generalized largest eigenvalue of matrix M, where the generalized eigenvalues are solutions of the equation |Σ₁ - λΣ₂| = 0. The submanifold (_μ,g^Fisher) is totally geodesic in (,g^Fisher).
When the normal distributions belongs to the same submodel _Σ = {N(μ,Σ) : Σ ∈(d)}⊂ of normal distributions sharing the same covariance matrix Σ we have $√ - ( ) 2arccosh 1+ 1 Δ2Σ(μ1,μ2) , 4$
where Δ_Σ is the Mahalanobis distance:
$∘ --------------------- Δ Σ(μ1,μ2) := (μ2 - μ1)⊤Σ -1(μ2 - μ1).$

However, in the general case, the Fisher-Rao distance between normals is not known in closed form.

2 Isometric embedding into the higher-dimensional SPD cone

Calvo and Oller show how to embed N(μ,Σ) ∈(d) = { }
P¯= fβ(μ,Σ) : (μ,Σ) ∈ N (d) = ℝd × P(d) into a SPD matrix of ℙ(d + 1):

[ ⊤ ] P¯(N) = f(N ) = Σ +⊤ μμ μ μ 1

so that the manifold ((d),g^Fisher) is isometrically embedded into the submanifold (,g^trace) of the cone equipped with the trace metric

trace 1 -1 -1 gP (P1,P2) := 2tr(P P1P P2).

However, the submanifold ⊂ ℙ(d + 1) is not totally geodesic. Thus Calvo and Oller derived a lower bound on the Fisher-Rao distance:

∘--┌│ ------------------ 1│∘ 1 ∑d 2 -1 ρCO(N1,N2) = 2 2 log λi(P¯1 ¯P2) i=1

which is also metric distance.

3 A simple approximation method

Our method consists in projecting the SPD geodesic γ_ℙ(d+1)(P₁^-1P₂) onto and then maps back the SPD projected curve into by using f^-1:

- 1( - 1 ) cCO(N1,N2;t) = f projN-(γℙ(d+1)(¯P1 P¯2;t)) .

Indeed, the geodesic γ_ℙ(d+1)(P₁^-1P₂) has closed-form equation

1 ( - 1 - 1)t 1 γℙ(d+1)(¯P-11P¯2) = ¯P21 ¯P1 2¯P2 ¯P1 2 ¯P21 .

Now, we need to estimate the Fisher-Rao length of the curve c_CO(N₁,N₂;t) by discretizing the curve at T positions:

( ( ) ( )) 1-T∑-1 i- i+-1 ρ˜CO (N1, N2) ≤ T ρN c T ,c T , i=1

and approximate for nearby normals their Fisher-Rao distances by the square root of their Jeffreys divergence:

( ( ) ( )) ∘ ---[-(--)---(-----)]- ρ c -i ,c i+-1 ≈ D c i- ,c i-+1 , N T T J T T

where

( -1 -1 ) -1 -1 D [p : p ] = tr Σ-2-Σ1 +-Σ-1-Σ2 - I + (μ - μ )⊤ Σ1-+-Σ-2-(μ - μ ). J (μ1,Σ1) (μ2,Σ2) 2 2 1 2 2 1

The projection of a SPD matrix P ∈ ℙ(d + 1) onto is done as follows: Let β = P_d+1,d+1 and write P = [ ]
Σ+ βμ μ⊤ βμ
βμ⊤ β . Then the orthogonal projection at P ∈ onto is:

[ ] ¯ -- Σ + μμ⊤ μ⊤ P ⊥ := projN (P) = μ 1 ,

and the SPD distance between P and P_⊥ is

ρP(P, ¯P⊥) = √1-|logβ|. 2

Here are some examples of the curves c_CO (in green) compared to the Fisher-Rao geodesics (in black):

More details and quantitative analysis: https://www.mdpi.com/1099-4300/25/4/654