Information geometry and divergences: Classical and Quantum
Foundations, Applications, and Software APIs
Historically, Information Geometry (IG, tutorials,
textbooks and monographs) aimed at unravelling the geometric structures
of families of probability distributions called the statistical models.
A statistical model can either be
 parametric (eg., family of normal distributions),
 semiparametric
(eg., family of Gaussian mixture models) or
 nonparametric (family of mutually absolutely continuous smooth densities).
A parametric statistical model is said regular when the Fisher information matrix is positivedefinite (and welldefined).
Otherwise, the statistical model is irregular (eg., infinite Fisher information and semipositive definite Fisher information when the model is not identifiable).
The FisherRao manifold of a statistical parametric model
is a Riemannian manifold equipped
with the Fisher information metric.
The geodesic length on a FisherRao manifold is called Rao's distance [Hotelling 1930]
[Rao 1945].
More generally, Amari proposed the dualistic structure of IG which consists
of a pair of torsionfree affine connections coupled to the Fisher metric [Amari 1980's].
Given a dualistic structure, we can build generically a oneparameter family of
dualistic informationgeometric structures,
called the αgeometry.
When both connections are flat, the informationgeometric space is said dually flat:
For example, the Amari's ±1structures of exponential families and mixture families
are famous examples of dually flat spaces in information geometry.
In differential geometry, geodesics are defined as autoparallel curves with respect to a connection.
When using the default LeviCivita metric connection derived from the Fisher metric on FisherRao manifolds,
we get Rao's distance which are locally minimizing geodesics.
Eguchi showed how to build from any smooth distortion (originally called a contrast function)
a dualistic structure: The information geometry of divergences [Eguchi 1982].
The information geometry of Bregman divergences yields dually flat spaces:
It is a special cases of Hessian manifolds which are differentiable manifolds
equipped with a metric tensor being a Hessian metric and a flat connection [Shima 2007].
Since geometric structures scaffold spaces independently of any applications,
these pure informationgeometric FisherRao structure and αstructures of statistical models
can also be used in nonstatistical contexts too:
For example, for analyzing interior point methods with barrier functions in optimization, or for studying timeseries models, etc.
Statistical divergences between parametric statistical models amount to
parameter divergences on which we can use the Eguchi's divergence information geometry
to get a dualistic structure.
A projective divergence is a divergence which is invariant by independent rescaling of its parameters.
A statistical projective divergence is thus useful for estimating computationally
intractable statistical models (eg., gamma divergences, CauchySchwarz divergence and Hölder divergences, or singlysided projective Hyvärinen divergence).
A conformal divergence is a divergence scaled by a conformal factor which may depend on one or two of its arguments.
The metric tensor obtained from Eguchi's information divergence of a conformal divergence
is a conformal metric of the metric obtained from the divergence, hence its name.
By analogy to total least squares vs least squares, a
total divergence is a divergence which is invariant wrt. to rotations (eg., total Bregman divergences).
An important property of divergences on the probability simplex is to be monotone by coarsegraining.
That is, merging bins and considering reduced histograms should give a distance less or equal than the distance on the full resolution histograms.
This information monotonicity property holds for fdivergences (called invariant divergences in information geometry),
Hilbert log crossratio distance, or Aitchison distance for example.
Some statistical divergences are upper bounded
(eg., JensenShannon divergence) while others are not (eg., Jeffreys' divergence).
Optimal transport distances require a ground base distance on the sample space.
A diversity index generalizes a twopoint distance to a family of parameters/distributions.
It usually measures the dispersion around a center point (eg., like variance measures
the dispersion around the centroid).
Browsing geometric structures:
[tutorials]
[Software/API]
[FisherRao manifolds]
[Cones]
[Finsler manifolds]
[Hessian manifolds]
[Exponential families and mixture families]
[Categorical distributions/probability simplex]
[Time series]
[Hilbert geometry]
[Hyperbolic geometry and Siegel spaces]
[Applications]
[Natural gradient]
[centroids and clustering]
[Miscellaneous applications]
Browsing [Dissimilarities]:
[JensenShannon divergence]
[fdivergences]
[Bregman divergences]
[Jensen divergences]
[conformal divergences]
[projective divergences]
[optimal transport]
[entropies]
[Chernoff information]
Information geometry: Tutorials and surveys
 The Many Faces of Information Geometry, AMS Notices (9 pages), 2022.
A gentle short introduction to information geometry
 An Elementary Introduction to Information Geometry, Entropy (61 pages), 2020.
A selfcontained introduction to classic parametric information geometry with applications and basics of differential geometry
 What is an information projection?, AMS Notices, (65) 3 (4 pages), 2018.
Information projections are the workhorses of algorithms using the framework of information geometry.
A projection is defined according to geodesics (wrt a connection) and orthogonality (wrt a metric tensor).
In dually flat spaces, information projections can be interpreted as minimum Bregman divergences (Bregman projections).
Unicity theorems for exponential families and mixture families.
 On Geodesic Triangles with Right Angles in a Dually Flat Space, Chapter in edited book "Progress in Information Geometry", Springer, 2021.
A selfcontained introduction to dually flat spaces which we call Bregman manifolds. The generalized Pythagorean theorem is derived from the 3parameter Bregman identity.
The 4parameter Bregman identity is also explained
 CramérRao Lower Bound and Information Geometry, Connected at Infinity II: On the work of Indian mathematicians (R. Bhatia and C.S. Rajan, Eds.), special volume of Texts and Readings In Mathematics (TRIM), Hindustan Book Agency, 2013
A description of the pathbreaking paper of Calyampudi Radhakrishna Rao (1945): "Information and the accuracy attainable in the estimation of statistical parameters", 1945.

Statistical exponential families: A digest with flash cards, 2009

Pattern Learning and Recognition on Statistical Manifolds: An InformationGeometric Review, SIMBAD 2013
 Legendre transformation and information geometry, memo, 2010
FisherRao manifolds (Riemannian manifolds and Hamadard manifolds)
Cones
Finsler manifolds
Finsler manifolds are proposed to model irregular parametric statistical models (where Fisher information can be infinite)
Bregman manifolds/Hessian manifolds
Exponential families and Mixture families
Continuous or discrete exponential families
Online kMLE for Mixture Modeling with Exponential Families, GSI 2015
kMLE: A fast algorithm for learning statistical mixture models, IEEE ICASSP 2012
Fast Learning of Gamma Mixture Models with kMLE, SIMBAD 2013: 235249
kMLE for mixtures of generalized Gaussians,
ICPR 2012: 28252828
Simplification and hierarchical representations of mixtures of exponential families,
Signal Process. 90(12): 31973212 (2010)
The analytic dually flat space of the statistical mixture family of two prescribed distinct Cauchy components
On the Geometry of Mixtures of Prescribed Distributions, IEEE ICASSP 2018
Information geometry of deformed exponential families
qdeformed exponential families, qGaussians, etc.
Information geometry of the probability simplex
Clustering in Hilbert simplex geometry
[project page]
Geometry of the probability simplex and
its connection to the maximum entropy method,
Journal of Applied Mathematics, Statistics and Informatics 16(1):2535, 2020
BruhatTits space
open access (publisher)
Information geometry of singular statistical models
A Geometric Modeling of Occam's Razor in Deep Learning
Towards Modeling and Resolving Singular Parameter Spaces using Stratifolds, Neurips OPT workshop 2021
Geometry of time series and correlations/dependences
Hilbert geometry
Hilbert geometry are induced by a bounded convex open domain.
Hilbert geometry generalize the Klein model of hyperbolic geometry and the CayleyKlein geometry
Beware that Hilbert geometry are never Hilbert spaces!
Hyperbolic geometry and geometry of Siegel domains
Some applications of information geometry
Natural gradient
Centers and clustering
Miscellaneous applications
Dissimilarities, distances, divergences and diversities
JensenShannon divergence
Probability simplex
fdivergences
Bregman divergences and some generalizations

Statistical Divergences between Densities of Truncated Exponential Families with Nested Supports: Duo Bregman
and Duo Jensen Divergences, Entropy 2022, 24(3)
Duo FenchelYoung divergence , arXiv:2202.10726.
We define duo Bregman divergences, duo FenchelYoung divergences, duo Jensen divergences.
We show how those divergences occur naturally when calculating the KullbackLeibler divergence and
skew Bhattacharyya distances between densities belonging to nested exponential families.
We report the KLD between truncated normal distributions as a duo Bregman divergence.
Demo code:
Jensen divergences and some generalizations
Conformal divergences
Projective divergences
Optimal transport/Wasserstein distances/Sinkhorn distances
Earth mover distances (EMD), Wasserstein distances

Sinkhorn AutoEncoders, UAI 2019

On The Chain Rule Optimal Transport Distance, Progress in Information Geometry, 2021

Tsallis Regularized Optimal Transport and Ecological Inference, AAAI 2017

Clustering Patterns Connecting COVID19 Dynamics and Human Mobility Using Optimal Transport,
March 2021, Sankhya B 83(S1)

Optimal copula transport for clustering multivariate time series, IEEE ICASSP 2016

Exploring and measuring nonlinear correlations: Copulas, Lightspeed Transportation and Clustering,
Time Series Workshop at NeurIPS 2016

Earth Mover Distance on superpixels, IEEE ICIP 2010
Shannon, Rényi, Tsallis, SharmalMittal entropies, crossentropies and divergences
Chernoff information
Other dissimilarities
Loss functions and proper scoring rules
Divergences between statistical mixtures

Fast Approximations of the Jeffreys Divergence between Univariate Gaussian Mixtures
via Mixture Conversions to ExponentialPolynomial Distributions,
Entropy 2021, 23(11), 1417

The Statistical Minkowski Distances: ClosedForm Formula for Gaussian Mixture Models, GSI 2019

Guaranteed Deterministic Bounds on the total variation Distance between univariate mixtures, IEEE MLSP 2018

Comix: Joint estimation and lightspeed comparison of mixture models
, IEEE ICASSP 2016

BagofComponents: An Online Algorithm for Batch Learning of Mixture Models,
GSI 2015

Guaranteed Bounds on InformationTheoretic Measures of Univariate Mixtures Using Piecewise LogSumExp Inequalities
, Entropy 2016, 18(12), 442

Combinatorial bounds on the αdivergence of univariate mixture models, ICASSP 2017

Closedform informationtheoretic divergences for statistical mixtures, ICPR 2012
Edited books and proceedings
Monographs and textbooks on information geometry
 Geometry and Statistics (Handbook of Statistics, Volume 46),
Frank Nielsen, Arni Srinivasa Rao, C.R. Rao (2022)

https://link.springer.com/book/10.1007/9789813369917
A Tribute to the Legend of Professor C. R. Rao:
The Centenary Volume. Editors: Arijit Chaudhuri, Sat N. Gupta, Rajkumar Roychoudhury

Methodology and Applications of Statistics
A Volume in Honor of C.R. Rao on the Occasion of his 100th Birthday.
Editors: Barry C. Arnold, Narayanaswamy Balakrishnan, Carlos A. Coelho

Entropy, Divergence, and Majorization in Classical and Quantum Thermodynamics,
Takahiro Sagawa (2022)
 Minimum Divergence Methods in Statistical
Machine Learning: From an Information Geometric Viewpoint,
Shinto Eguchi, Osamu Komori (2022)
 Methodology and Applications of Statistics,
A Volume in Honor of C.R. Rao on the Occasion of his 100th Birthday,
Barry C. ArnoldNarayanaswamy BalakrishnanCarlos A. Coelho (Eds) (2021)
 Information geometry, Arni S.R. Srinivasa Rao, C.R. Rao, Angelo Plastino (2020)
 Information geometry, Nihat Ay, Jürgen Jost, Hông Vân Lê, Lorenz Schwachhöfer (2017)
Advanced and rigorous foundations of information geometry
 Information geometry and its applications, Shunichi Amari (2016)
A first reading, wellbalanced between concepts and applications by the founder of the field
 Riemannian Geometry and Statistical Machine Learning, Guy Lebanon, (2015)
 Geometric Modeling in Probability and Statistics, Ovidiu Calin, Constantin Udrişte (2014)

Finance At Fields by Matheus R Grasselli (Editor), Lane Palmer Hughston (Editor).
, https://doi.org/10.1142/8507, World Scientific, 2013. Mentions information geometry (p. 248)

Mathematical foundations of infinitedimensional statistical models,
Richard Nickl and Evarist Giné,
Cambridge University Press (2016).
book web page (including book PDF)
A nice intermediate textbook which also give provides proofs using calculus on connections of differential geometry
 Methods of Information Geometry, Shunichi Amari, Hiroshi Nagaoka (2000)
Advanced book with an emphasis on statistical inference,
english translation of the Japanese textbook of 1993
 Differential Geometry and Statistics, Michael K. Murray, John W. Rice (1993)
A classic textbook
 Geometrical Foundations of Asymptotic Inference,
Robert E. Kass, Paul W. Vos (1997)
A classic textbook
 Statistical decision rules and optimal inference,
N. N. Chentsov (1982)
The first textbook on information geometry, originally published in 1972 in Russian, and later translated in english by the AMS
 Information geometry: Near randomness and near independence,
Khadiga A. Arwini and Christopher T. J. Dodson (2008)
Books in Japanese
志摩裕彦, へッセ幾何学， 裳華房， 2001
入門 情報幾何: 統計的モデルをひもとく微分幾何学
Atsushi Fujioka (藤岡 敦著)
Kyoritsu publisher, 2021
Introduction to Information Geometry: Differential Geometry of Statistical Models
情報幾何学の基礎: 情報の内的構造を捉える新たな地平
Akio Fujiwara (藤原 彰夫)
Fundamentals of Information Geometry: A New Horizon for Capturing the Intrinsic Structure of Information
Kyoritsu publisher, 2021
情報幾何学の基礎 (数理情報科学シリーズ)
Akio Fujiwara (藤原 彰夫)
牧野書店 Makino bookstore.
情報幾何学の新展開 (SGCライブラリ)
Shunichi Amari (甘利 俊一)
Saiensu publisher, 2019.
A book compiling the columns which appeared in the
数理科学
(Mathematical sciences of Saiensu magazine)

情報幾何の方法, Shunichi Amari (甘利 俊一) and Hiroshi Nagaoka (長岡 浩司)
Iwanami press (1993, reprinted 2017)

情報理論, Shunichi Amari (甘利 俊一)
Information Theory
Chikuma Shobō (筑摩書房), 2011
Quantum information theory (QIT)/quantum information geometry (QIG)
Quantum Computation and Quantum Information
(10th Anniversary Edition) by Michael A. Nielsen, Isaac L. Chuang, Cambridge University Press, 2010
Quantum
Information Theory: Mathematical Foundation
by Masahito Hayashi, Springer, 2017

Quantum Riemannian Geometry
by Edwin J. Beggs , Shahn Majid, Springer, 2020

Geometry of Quantum States: An Introduction to Quantum Entanglement
by Ingemar Bengtsson and Karol Życzkowski, Cambridge University Press, 2009
Second edition

Quantum Information Theory and Quantum Statistics
by Dénes Petz, Springer, 2008
Other related books
Software/APIs
Category theory and information geometry
Since the seminal work of Chentsov who introduced the category of Markov kernels,
category theory plays an essential role in the very foundations of information geometry.
Below are some papers and links to explore this topic.
Home page of Evan Patterson
Home page of Geometric Science of Information
April 2022.