Historically, Information Geometry (IG, tutorials,
textbooks and monographs and how to get started) aimed at unravelling the geometric structures
of families of probability distributions called the statistical models.
A statistical model can either be
The Fisher-Rao manifold of a statistical parametric model is a Riemannian manifold equipped with the Fisher information metric. The geodesic length on a Fisher-Rao manifold is called Rao's distance [Hotelling 1930] [Rao 1945]. More generally, Amari proposed the dualistic structure of IG which consists of a pair of torsion-free affine connections coupled to the Fisher metric [Amari 1980's]. Given a dualistic structure, we can build generically a one-parameter family of dualistic information-geometric structures, called the α-geometry. When both connections are flat, the information-geometric space is said dually flat: For example, the Amari's ±1-structures of exponential families and mixture families are famous examples of dually flat spaces in information geometry. In differential geometry, geodesics are defined as autoparallel curves with respect to a connection. When using the default Levi-Civita metric connection derived from the Fisher metric on Fisher-Rao manifolds, we get Rao's distance which are locally minimizing geodesics. Eguchi showed how to build from any smooth distortion (originally called a contrast function) a dualistic structure: The information geometry of divergences [Eguchi 1982]. The information geometry of Bregman divergences yields dually flat spaces: It is a special cases of Hessian manifolds which are differentiable manifolds equipped with a metric tensor being a Hessian metric and a flat connection [Shima 2007]. Since geometric structures scaffold spaces independently of any applications, these pure information-geometric Fisher-Rao structure and α-structures of statistical models can also be used in non-statistical contexts too: For example, for analyzing interior point methods with barrier functions in optimization, or for studying time-series models, etc.
Statistical divergences between parametric statistical models amount to parameter divergences on which we can use the Eguchi's divergence information geometry to get a dualistic structure. A projective divergence is a divergence which is invariant by independent rescaling of its parameters. A statistical projective divergence is thus useful for estimating computationally intractable statistical models (eg., gamma divergences, Cauchy-Schwarz divergence and Hölder divergences, or singly-sided projective Hyvärinen divergence). A conformal divergence is a divergence scaled by a conformal factor which may depend on one or two of its arguments. The metric tensor obtained from Eguchi's information divergence of a conformal divergence is a conformal metric of the metric obtained from the divergence, hence its name. By analogy to total least squares vs least squares, a total divergence is a divergence which is invariant wrt. to rotations (eg., total Bregman divergences). An important property of divergences on the probability simplex is to be monotone by coarse-graining. That is, merging bins and considering reduced histograms should give a distance less or equal than the distance on the full resolution histograms. This information monotonicity property holds for f-divergences (called invariant divergences in information geometry), Hilbert log cross-ratio distance, or Aitchison distance for example. Some statistical divergences are upper bounded (eg., Jensen-Shannon divergence) while others are not (eg., Jeffreys' divergence). Optimal transport distances require a ground base distance on the sample space. A diversity index generalizes a two-point distance to a family of parameters/distributions. It usually measures the dispersion around a center point (eg., like variance measures the dispersion around the centroid).
A selected list of
Browsing geometric structures: [tutorials] [Software/API] [Fisher-Rao manifolds] [Cones] [Finsler manifolds] [Hessian manifolds] [Exponential families and mixture families] [Categorical distributions/probability simplex] [Time series] [Hilbert geometry] [Hyperbolic geometry and Siegel spaces] [Applications] [Natural gradient] [centroids and clustering] [Miscellaneous applications]
Browsing [Dissimilarities]: [Jensen-Shannon divergence] [f-divergences] [Bregman divergences] [Jensen divergences] [conformal divergences] [projective divergences] [optimal transport] [entropies] [Chernoff information]
The Many Faces of Information Geometry, AMS Notices (9 pages), 2022.
A gentle short introduction to information geometry
A self-contained introduction to classic parametric information geometry with applications and basics of differential geometry

Information projections are the workhorses of algorithms using the framework of information geometry. A projection is defined according to geodesics (wrt a connection) and orthogonality (wrt a metric tensor). In dually flat spaces, information projections can be interpreted as minimum Bregman divergences (Bregman projections). Unicity theorems for exponential families and mixture families.
A self-contained introduction to dually flat spaces which we call Bregman manifolds. The generalized Pythagorean theorem is derived from the 3-parameter Bregman identity. The 4-parameter Bregman identity is also explained
A description of the pathbreaking paper of Calyampudi Radhakrishna Rao (1945): "Information and the accuracy attainable in the estimation of statistical parameters", 1945.
Fisher-Rao manifold of the categorical distributions can be isometrically embedded on the positive orthant of the sphere of radius 2 in Euclidean space. Rao distance thus corresponds to the length of great circle arc, and relaxing the embedded distributions to positive measures, we get twice the Hellinger divergence. (explanation)
Dynamic geometry with relative Fisher information metric
A Geometric Modeling of Occam's Razor in Deep Learning
Degenerates metric and lightlike Fisher-Rao manifolds
Approximating the smallest enclosing Riemannian ball by a simple iterative algorithm which proves the existence of coresets. Applications to Hadamard manifolds (hyperbolic geometry and Riemannian manifold of symmetric positive-definite matrices equipped with the trace metric.
Bregman Voronoi diagrams (or VDs in dually flat spaces) are affine diagrams which can be built from equivalent power diagrams (Laguerre geometry). Generalize the paraboloid lifting transform of Euclidean geometry to potentials functions induced by the convex Bregman generators.
The analytic dually flat space of the statistical mixture family of two prescribed distinct Cauchy components
q-deformed exponential families, q-Gaussians, etc.
A Geometric Modeling of Occam's Razor in Deep Learning
Towards Modeling and Resolving Singular Parameter Spaces using Stratifolds, Neurips OPT workshop 2021
Statistical Divergences between Densities of Truncated Exponential Families with Nested Supports: Duo Bregman
and Duo Jensen Divergences, Entropy 2022, 24(3)
We define duo Bregman divergences, duo Fenchel-Young divergences, duo Jensen divergences.
We show how those divergences occur naturally when calculating the Kullback-Leibler divergence and
skew Bhattacharyya distances between densities belonging to nested exponential families.
We report the KLD between truncated normal distributions as a duo Bregman divergence.
Demo code:
Handbook of Geometry and Statistics, Elsevier 2022
Progress in Information Geometry: Theory and Applications
, Springer 2021
Geometric Structures of Information, Springer 2019
Computational Information Geometry
For Image and Signal Processing, Springer 2017
Differential Geometrical Theory of Statistics, MDPI Entropy, Special issue, 2017
Geometric Theory of Information, Springer 2014
Matrix Information Geometry, Springer 2013
Geometry and Statistics (Handbook of Statistics, Volume 46),
Frank Nielsen, Arni Srinivasa Rao, C.R. Rao (2022)
https://link.springer.com/book/10.1007/978-981-33-6991-7
A Tribute to the Legend of Professor C. R. Rao:
The Centenary Volume. Editors: Arijit Chaudhuri, Sat N. Gupta, Rajkumar Roychoudhury
Methodology and Applications of Statistics
A Volume in Honor of C.R. Rao on the Occasion of his 100th Birthday.
Editors: Barry C. Arnold, Narayanaswamy Balakrishnan, Carlos A. Coelho
Entropy, Divergence, and Majorization in Classical and Quantum Thermodynamics,
Takahiro Sagawa (2022)
Minimum Divergence Methods in Statistical
Machine Learning: From an Information Geometric Viewpoint,
Shinto Eguchi, Osamu Komori (2022)
Methodology and Applications of Statistics,
A Volume in Honor of C.R. Rao on the Occasion of his 100th Birthday,
Barry C. ArnoldNarayanaswamy BalakrishnanCarlos A. Coelho (Eds) (2021)
Information geometry, Arni S.R. Srinivasa Rao, C.R. Rao, Angelo Plastino (2020)
Information geometry, Nihat Ay, Jürgen Jost, Hông Vân Lê, Lorenz Schwachhöfer (2017)
Advanced and rigorous foundations of information geometry
Information geometry and its applications, Shun-ichi Amari (2016)
A first reading, well-balanced between concepts and applications by the founder of the field
Riemannian Geometry and Statistical Machine Learning, Guy Lebanon, (2015)
Geometric Modeling in Probability and Statistics, Ovidiu Calin, Constantin Udrişte (2014)
Finance At Fields by Matheus R Grasselli (Editor), Lane Palmer Hughston (Editor).
, https://doi.org/10.1142/8507, World Scientific, 2013. Mentions information geometry (p. 248)
Mathematical foundations of infinite-dimensional statistical models,
Richard Nickl and Evarist Giné,
Cambridge University Press (2016).
book web page (including book PDF)
A nice intermediate textbook which also give provides proofs using calculus on connections of differential geometry
Methods of Information Geometry, Shun-ichi Amari, Hiroshi Nagaoka (2000)
Advanced book with an emphasis on statistical inference, english translation of the Japanese textbook of 1993
Differential Geometry and Statistics, Michael K. Murray, John W. Rice (1993)
A classic textbook
Geometrical Foundations of Asymptotic Inference,
Robert E. Kass, Paul W. Vos (1997)
A classic textbook
Statistical decision rules and optimal inference,
N. N. Chentsov (1982)
The first textbook on information geometry, originally published in 1972 in Russian, and later translated in english by the AMS
Information geometry: Near randomness and near independence,
Khadiga A. Arwini and Christopher T. J. Dodson (2008)













Deep Learning Architectures
A Mathematical Approach, Ovidiu Calin (2020)
Information Geometry and Population Genetics
The Mathematical Structure of the Wright-Fisher Model,
Julian Hofrichter, Jürgen Jost, Tat Dat Tran (2017)
emphasizes the interest of Kolmogorov for exploring statistical divergences