Information geometry and divergences: Classical and Quantum

Foundations, Applications, and Software APIs

Historically, Information Geometry (IG, tutorials, textbooks and monographs and how to get started) aimed at unravelling the geometric structures of families of probability distributions called the statistical models.
A statistical model can either be

parametric (eg., family of normal distributions),
semi-parametric (eg., family of Gaussian mixture models) or
non-parametric (family of mutually absolutely continuous smooth densities).

A parametric statistical model is said regular when the Fisher information matrix is positive-definite (and well-defined). Otherwise, the statistical model is irregular (eg., infinite Fisher information and semi-positive definite Fisher information when the model is not identifiable).

The Fisher-Rao manifold of a statistical parametric model is a Riemannian manifold equipped with the Fisher information metric. The geodesic length on a Fisher-Rao manifold is called Rao's distance [Hotelling 1930] [Rao 1945]. More generally, Amari proposed the dualistic structure of IG which consists of a pair of torsion-free affine connections coupled to the Fisher metric [Amari 1980's]. Given a dualistic structure, we can build generically a one-parameter family of dualistic information-geometric structures, called the α-geometry. When both connections are flat, the information-geometric space is said dually flat: For example, the Amari's ±1-structures of exponential families and mixture families are famous examples of dually flat spaces in information geometry. In differential geometry, geodesics are defined as autoparallel curves with respect to a connection. When using the default Levi-Civita metric connection derived from the Fisher metric on Fisher-Rao manifolds, we get Rao's distance which are locally minimizing geodesics. Eguchi showed how to build from any smooth distortion (originally called a contrast function) a dualistic structure: The information geometry of divergences [Eguchi 1982]. The information geometry of Bregman divergences yields dually flat spaces: It is a special cases of Hessian manifolds which are differentiable manifolds equipped with a metric tensor being a Hessian metric and a flat connection [Shima 2007]. Since geometric structures scaffold spaces independently of any applications, these pure information-geometric Fisher-Rao structure and α-structures of statistical models can also be used in non-statistical contexts too: For example, for analyzing interior point methods with barrier functions in optimization, or for studying time-series models, etc.

Statistical divergences between parametric statistical models amount to parameter divergences on which we can use the Eguchi's divergence information geometry to get a dualistic structure. A projective divergence is a divergence which is invariant by independent rescaling of its parameters. A statistical projective divergence is thus useful for estimating computationally intractable statistical models (eg., gamma divergences, Cauchy-Schwarz divergence and Hölder divergences, or singly-sided projective Hyvärinen divergence). A conformal divergence is a divergence scaled by a conformal factor which may depend on one or two of its arguments. The metric tensor obtained from Eguchi's information divergence of a conformal divergence is a conformal metric of the metric obtained from the divergence, hence its name. By analogy to total least squares vs least squares, a total divergence is a divergence which is invariant wrt. to rotations (eg., total Bregman divergences). An important property of divergences on the probability simplex is to be monotone by coarse-graining. That is, merging bins and considering reduced histograms should give a distance less or equal than the distance on the full resolution histograms. This information monotonicity property holds for f-divergences (called invariant divergences in information geometry), Hilbert log cross-ratio distance, or Aitchison distance for example. Some statistical divergences are upper bounded (eg., Jensen-Shannon divergence) while others are not (eg., Jeffreys' divergence). Optimal transport distances require a ground base distance on the sample space. A diversity index generalizes a two-point distance to a family of parameters/distributions. It usually measures the dispersion around a center point (eg., like variance measures the dispersion around the centroid).

A selected list of ten great articles in information geometry introducing various concepts like statistical invariance, divergences, dual geometric structures, information projections, dual foliations, information decomposition, mixed coordinates, etc.

Browsing geometric structures: [tutorials] [Software/API] [Fisher-Rao manifolds] [Cones] [Finsler manifolds] [Hessian manifolds] [Exponential families and mixture families] [Categorical distributions/probability simplex] [Time series] [Hilbert geometry] [Hyperbolic geometry and Siegel spaces] [Applications] [Natural gradient] [centroids and clustering] [Miscellaneous applications]

Information geometry: Tutorials and surveys

The Many Faces of Information Geometry, AMS Notices (9 pages), 2022.
A gentle short introduction to information geometry
An Elementary Introduction to Information Geometry, Entropy (61 pages), 2020.
A self-contained introduction to classic parametric information geometry with applications and basics of differential geometry
What is an information projection?, AMS Notices, (65) 3 (4 pages), 2018.
Information projections are the workhorses of algorithms using the framework of information geometry. A projection is defined according to geodesics (wrt a connection) and orthogonality (wrt a metric tensor). In dually flat spaces, information projections can be interpreted as minimum Bregman divergences (Bregman projections). Unicity theorems for exponential families and mixture families.
On Geodesic Triangles with Right Angles in a Dually Flat Space, Chapter in edited book "Progress in Information Geometry", Springer, 2021.
A self-contained introduction to dually flat spaces which we call Bregman manifolds. The generalized Pythagorean theorem is derived from the 3-parameter Bregman identity. The 4-parameter Bregman identity is also explained
Cramér-Rao Lower Bound and Information Geometry, Connected at Infinity II: On the work of Indian mathematicians (R. Bhatia and C.S. Rajan, Eds.), special volume of Texts and Readings In Mathematics (TRIM), Hindustan Book Agency, 2013
A description of the pathbreaking paper of Calyampudi Radhakrishna Rao (1945): "Information and the accuracy attainable in the estimation of statistical parameters", 1945.
Statistical exponential families: A digest with flash cards, 2009
Pattern Learning and Recognition on Statistical Manifolds: An Information-Geometric Review, SIMBAD 2013
Legendre transformation and information geometry, memo, 2010

Fisher-Rao manifolds (Riemannian manifolds and Hamadard manifolds)

Fisher-Rao manifold of the categorical distributions
Fisher-Rao manifold of the categorical distributions can be isometrically embedded on the positive orthant of the sphere of radius 2 in Euclidean space. Rao distance thus corresponds to the length of great circle arc, and relaxing the embedded distributions to positive measures, we get twice the Hellinger divergence. (explanation)
Relative Fisher Information and Natural Gradient for Learning Large Modular Models, ICML 2017
Dynamic geometry with relative Fisher information metric
- talk video (17 min.)
- [Project]
- Voronoi diagrams on Fisher-Rao manifolds of location-scale families amount to hyperbolic Voronoi diagrams. Hyperbolic Voronoi diagrams made easy (ICCSA 2010)
A Geometric Modeling of Occam's Razor in Deep Learning
Degenerates metric and lightlike Fisher-Rao manifolds
On Approximating the Riemannian 1-Center , Computational Geometry, 2013.
Approximating the smallest enclosing Riemannian ball by a simple iterative algorithm which proves the existence of coresets. Applications to Hadamard manifolds (hyperbolic geometry and Riemannian manifold of symmetric positive-definite matrices equipped with the trace metric.
GLSR-VAE: Geodesic latent space regularization for variational autoencoder architectures , IEEE SSCI 2017.

Cones

Fast (1+ε)-approximation of the Löwner extremal matrices of high-dimensional symmetric matrices

Finsler manifolds

Finsler manifolds are proposed to model irregular parametric statistical models (where Fisher information can be infinite)

Medians and means in Finsler geometry, LMS Journal of Computation and Mathematics 15 (2012): 23-37.

Bregman manifolds/Hessian manifolds

Bregman Voronoi diagrams
Bregman Voronoi diagrams (or VDs in dually flat spaces) are affine diagrams which can be built from equivalent power diagrams (Laguerre geometry). Generalize the paraboloid lifting transform of Euclidean geometry to potentials functions induced by the convex Bregman generators.
Visualizing Bregman Voronoi diagrams, SoCG 2007: 121-122
Geometry and Fixed-Rate Quantization in Riemannian Metric Spaces Induced by Separable Bregman Divergences , GSI 2019
https://www.researchgate.net/publication/221112384_Fitting_the_Smallest_Enclosing_Bregman_Ball, ECML 2005
Bregman proximity data-structures and queries [web page]:
- Bregman Vantage Point Trees for Efficient Nearest Neighbor Queries, ICME 2009.
- Tailored Bregman Ball Trees for Effective Nearest Neighbors, Euro CG 2010.
Mining Matrix Data with Bregman Matrix Divergences for Portfolio Selection
On the smallest enclosing information disk, Inf. Process. Lett. 105(3): 93-97 (2008)

Exponential families and Mixture families

Continuous or discrete exponential families

The Kullback–Leibler Divergence Between Lattice Gaussian Distributions, Journal of the Indian Institute of Science (2022)
arXiv:2109.14920 [project page]
Likelihood ratio exponential families, NeurIPS Workshop on Deep Learning through Information Geometry, 2020
q-Paths: Generalizing the Geometric Annealing Path using Power Means, UAI 2021
[ paper UAI 2021]
Statistical exponential families: A digest with flash cards, 2009
A note on some information-theoretic divergences between Zeta distributions
Monte Carlo Information-Geometric Structures, Geometric Structures of Information, Springer 2019
k-MLE:

Online k-MLE for Mixture Modeling with Exponential Families, GSI 2015

k-MLE: A fast algorithm for learning statistical mixture models, IEEE ICASSP 2012

Fast Learning of Gamma Mixture Models with k-MLE, SIMBAD 2013: 235-249

k-MLE for mixtures of generalized Gaussians, ICPR 2012: 2825-2828

Simplification and hierarchical representations of mixtures of exponential families, Signal Process. 90(12): 3197-3212 (2010)

The analytic dually flat space of the statistical mixture family of two prescribed distinct Cauchy components

On the Geometry of Mixtures of Prescribed Distributions, IEEE ICASSP 2018

Information geometry of deformed exponential families

q-deformed exponential families, q-Gaussians, etc.

On Voronoi Diagrams on the Information-Geometric Cauchy Manifolds, Entropy 2020, 22(7), 713

Information geometry of the probability simplex

Clustering in Hilbert simplex geometry [project page]

Geometry of the probability simplex and its connection to the maximum entropy method, Journal of Applied Mathematics, Statistics and Informatics 16(1):25-35, 2020
Bruhat-Tits space
open access (publisher)

Information geometry of singular statistical models

A Geometric Modeling of Occam's Razor in Deep Learning

Towards Modeling and Resolving Singular Parameter Spaces using Stratifolds, Neurips OPT workshop 2021

Geometry of time series and correlations/dependences

Hilbert geometry

Hilbert geometry are induced by a bounded convex open domain. Hilbert geometry generalize the Klein model of hyperbolic geometry and the Cayley-Klein geometry Beware that Hilbert geometry are never Hilbert spaces!

Clustering in Hilbert simplex geometry [project page]
Classification with mixtures of curved mahalanobis metrics, IEEE ICIP 2016
On Balls in a Hilbert Polygonal Geometry, SoCG 2017.

Hyperbolic geometry and geometry of Siegel domains

The Siegel–Klein Disk: Hilbert Geometry of the Siegel Disk Domain, Entropy 2020
Classification in the Siegel Space for Vectorial Autoregressive Data, GSI 2021.
Hyperbolic Voronoi Diagrams Made Easy, ICCSA 2010
Model centroids for the simplification of Kernel Density estimators, ICASSP 2012: 737-740
Visualizing hyperbolic Voronoi diagrams, SoCG 2014: 90
Approximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry

Some applications of information geometry

Natural gradient

Centers and clustering

On conformal divergences and their population minimizers, IEEE Transactions Information Theory, 62.1 (2015): 527-538
The Burbea-Rao and Bhattacharyya centroids, IEEE Transactions on Information Theory, 57(8), 5455-5466 (2011)
Sided and Symmetrized Bregman Centroids, IEEE transactions on Information Theory 55.6 (2009): 2882-2904 Corrigendum and addendum on "Sided and Symmetrized Bregman Centroids"
Optimal Interval Clustering: Application to Bregman Clustering and Statistical Mixture Learning, IEEE SPL 2014
Jeffreys Centroids: A Closed-Form Expression for Positive Histograms and a Guaranteed Tight Approximation for Frequency Histograms , IEEE Signal Process. Lett. 20(7): 657-660 (2013)
On Clustering Histograms with k-Means by Using Mixed α-Divergences, Entropy 2014

Miscellaneous applications

q-Paths: Generalizing the geometric annealing path using power means, UAI 2021
An Information-Geometric Characterization of Chernoff Information , IEEE SPL 2013
Computational Information Geometry for Binary Classification of High-Dimensional Random Tensors, Entropy 20(3): 203 (2018)
Information geometry metric for random signal detection in large random sensing systems , ICASSP 2017: 4471-4475
Information-geometric lenses for multiple foci+contexts interfaces, SIGGRAPH ASIA Technical Briefs 2013: 18:1-18:4

Dissimilarities, distances, divergences and diversities

Jensen-Shannon divergence

On a Variational Definition for the Jensen-Shannon Symmetrization of Distances Based on the Information Radius, Entropy 2021.
On a Generalization of the Jensen-Shannon Divergence and the Jensen–Shannon Centroid, Entropy 2020
On the Jensen-Shannon Symmetrization of Distances Relying on Abstract Means, Entropy 2019
A family of statistical symmetric divergences based on Jensen's inequality, arXiv:1009.4004 2010.

Probability simplex

alpha geodesics
On Clustering Histograms with k-Means by Using Mixed α-Divergences, Entropy 2014.

f-divergences

On f-divergences between Cauchy distributions, arXiv:2101.12459
On the Chi square and higher-order Chi distances for approximating f-divergences, IEEE Signal Processing Letters, 2013
α-divergences:
- A generalization of the α-divergences based on comparable and distinct weighted means, arXiv:2001.09660
- Non-flat clustering with alpha-divergences, ICASSP 2011: 2100-2103

Bregman divergences and some generalizations

Statistical Divergences between Densities of Truncated Exponential Families with Nested Supports: Duo Bregman and Duo Jensen Divergences, Entropy 2022, 24(3)
Duo Fenchel-Young divergence , arXiv:2202.10726.
We define duo Bregman divergences, duo Fenchel-Young divergences, duo Jensen divergences. We show how those divergences occur naturally when calculating the Kullback-Leibler divergence and skew Bhattacharyya distances between densities belonging to nested exponential families. We report the KLD between truncated normal distributions as a duo Bregman divergence.
Demo code:
- KLDTruncatedNormalDistributions.java
- KLDTruncatedExponentialDistributions.java
- Computing Statistical Divergences with Sigma Points, GSI 2021
- Quasiconvex Jensen Divergences and Quasiconvex Bregman Divergences, SPIGL 2021
- Generalizing Skew Jensen Divergences and Bregman Divergences With Comparative Convexity, IEEE Signal Process. Lett. 24(8): 1123-1127 (2017)
- Reranking with Contextual Dissimilarity Measures from Representational Bregman k-Means, VISAPP (1) 2010: 118-123
- Bregman Divergences and Surrogates for Learning, IEEE Trans. Pattern Anal. Mach. Intell. 31(11): 2048-2059 (2009)
- The Dual Voronoi Diagrams with Respect to Representational Bregman Divergences, ISVD 2009: 71-78
Jensen divergences and some generalizations
- The Chord Gap Divergence and a Generalization of the Bhattacharyya Distance, IEEE ICASSP 2018
- Skew Jensen-Bregman Voronoi Diagrams, Trans. Comput. Sci. 14: 102-128 (2011)
- Jensen-Bregman Voronoi Diagrams and Centroidal Tessellations, ISVD 2010: 56-65
Conformal divergences
- On conformal divergences and their population minimizers, IEEE Transactions Information Theory, 62.1 (2015): 527-538
- Total Jensen divergences: Definition, Properties and k-Means++ Clustering, IEEE ICASSP 2015
- Total Bregman Divergence and its Applications to Shape Retrieval, IEEE CVPR 2010
- Total Bregman divergence and its applications to DTI analysis, IEEE Transactions on Medical Imaging 30(2):475-83, 2011
- Shape Retrieval Using Hierarchical Total Bregman Soft Clustering, IEEE Trans. Pattern Anal. Mach. Intell. 34(12): 2407-2419 (2012)
Projective divergences
- On Hölder Projective Divergences, Entropy 2017, 19(3), 122
- Patch Matching with Polynomial Exponential Families and Projective Divergences, SISAP 2016 (projective gamma divergence)
Optimal transport/Wasserstein distances/Sinkhorn distances
Earth mover distances (EMD), Wasserstein distances
- Sinkhorn AutoEncoders, UAI 2019
- On The Chain Rule Optimal Transport Distance, Progress in Information Geometry, 2021
- Tsallis Regularized Optimal Transport and Ecological Inference, AAAI 2017
- Clustering Patterns Connecting COVID-19 Dynamics and Human Mobility Using Optimal Transport, March 2021, Sankhya B 83(S1)
- Optimal copula transport for clustering multivariate time series, IEEE ICASSP 2016
- Exploring and measuring non-linear correlations: Copulas, Lightspeed Transportation and Clustering, Time Series Workshop at NeurIPS 2016
- Earth Mover Distance on superpixels, IEEE ICIP 2010
Shannon, Rényi, Tsallis, Sharmal-Mittal entropies, cross-entropies and divergences
- MaxEnt Upper Bounds for the Differential Entropy of Univariate Continuous Distributions, IEEE Signal Process. Lett. 24(4): 402-406 (2017)
- A closed-form expression for the Sharma-Mittal entropy of exponential families, Journal of Physics A: Mathematical and Theoretical 45.3 (2011).
- On Rényi and Tsallis entropies and divergences for exponential families, arXiv:1105.3259
- Entropies and cross-entropies of exponential families, IEEE ICIP 2010
- Texture Regimes for Entropy-Based Multiscale Image Analysis, ECCV (3) 2010: 692-705
Chernoff information
- Generalized Bhattacharyya and Chernoff upper bounds on Bayes error using quasi-arithmetic means , Pattern Recognition Letters 42(1), 2014
- An Information-Geometric Characterization of Chernoff Information , IEEE Signal Processing Letters, 2013
- Hypothesis Testing, Information Divergence and Computational Geometry , GSI 2013
Other dissimilarities
- A note on Onicescu's informational energy and correlation coefficient in exponential families , arXiv:2003.13199
- Deep rank-based transposition-invariant distances on musical sequences , arXiv:1709.00740
- Quantifying the Invariance and Robustness of Permutation-Based Indexing Schemes , SISAP 2016: 79-92
Loss functions and proper scoring rules
- Loss factorization, weakly supervised learning and label noise robustness, ICML 2016: 708-717
- Gentle Nearest Neighbors Boosting over Proper Scoring Rules, IEEE Trans. Pattern Anal. Mach. Intell. 37(1): 80-93 (2015), TPAMI
Divergences between statistical mixtures
- Fast Approximations of the Jeffreys Divergence between Univariate Gaussian Mixtures via Mixture Conversions to Exponential-Polynomial Distributions, Entropy 2021, 23(11), 1417
- The Statistical Minkowski Distances: Closed-Form Formula for Gaussian Mixture Models, GSI 2019
- Guaranteed Deterministic Bounds on the total variation Distance between univariate mixtures, IEEE MLSP 2018
- Comix: Joint estimation and lightspeed comparison of mixture models , IEEE ICASSP 2016
- Bag-of-Components: An Online Algorithm for Batch Learning of Mixture Models, GSI 2015
- Guaranteed Bounds on Information-Theoretic Measures of Univariate Mixtures Using Piecewise Log-Sum-Exp Inequalities , Entropy 2016, 18(12), 442
- Combinatorial bounds on the α-divergence of univariate mixture models, ICASSP 2017
- Closed-form information-theoretic divergences for statistical mixtures, ICPR 2012
Edited books and proceedings
- Handbook of Geometry and Statistics, Elsevier 2022
- Progress in Information Geometry: Theory and Applications , Springer 2021
- Geometric Structures of Information, Springer 2019
- Computational Information Geometry For Image and Signal Processing, Springer 2017
- Differential Geometrical Theory of Statistics, MDPI Entropy, Special issue, 2017
- Geometric Theory of Information, Springer 2014
- Matrix Information Geometry, Springer 2013
- Proceedings: ETVC 2008 GSI'13 GSI'15 GSI'17 GSI'19
Monographs and textbooks on information geometry
- Geometry and Statistics (Handbook of Statistics, Volume 46), Frank Nielsen, Arni Srinivasa Rao, C.R. Rao (2022)
- https://link.springer.com/book/10.1007/978-981-33-6991-7 A Tribute to the Legend of Professor C. R. Rao: The Centenary Volume. Editors: Arijit Chaudhuri, Sat N. Gupta, Rajkumar Roychoudhury
- Methodology and Applications of Statistics A Volume in Honor of C.R. Rao on the Occasion of his 100th Birthday. Editors: Barry C. Arnold, Narayanaswamy Balakrishnan, Carlos A. Coelho
- Entropy, Divergence, and Majorization in Classical and Quantum Thermodynamics, Takahiro Sagawa (2022)
- Minimum Divergence Methods in Statistical Machine Learning: From an Information Geometric Viewpoint, Shinto Eguchi, Osamu Komori (2022)
- Methodology and Applications of Statistics, A Volume in Honor of C.R. Rao on the Occasion of his 100th Birthday, Barry C. ArnoldNarayanaswamy BalakrishnanCarlos A. Coelho (Eds) (2021)
- Information geometry, Arni S.R. Srinivasa Rao, C.R. Rao, Angelo Plastino (2020)
- Information geometry, Nihat Ay, Jürgen Jost, Hông Vân Lê, Lorenz Schwachhöfer (2017)
  Advanced and rigorous foundations of information geometry
- Information geometry and its applications, Shun-ichi Amari (2016)
  A first reading, well-balanced between concepts and applications by the founder of the field
- Riemannian Geometry and Statistical Machine Learning, Guy Lebanon, (2015)
- Geometric Modeling in Probability and Statistics, Ovidiu Calin, Constantin Udrişte (2014)
- Finance At Fields by Matheus R Grasselli (Editor), Lane Palmer Hughston (Editor). , https://doi.org/10.1142/8507, World Scientific, 2013. Mentions information geometry (p. 248)
- Mathematical foundations of infinite-dimensional statistical models, Richard Nickl and Evarist Giné, Cambridge University Press (2016). book web page (including book PDF)
  A nice intermediate textbook which also give provides proofs using calculus on connections of differential geometry
- Methods of Information Geometry, Shun-ichi Amari, Hiroshi Nagaoka (2000)
  Advanced book with an emphasis on statistical inference, english translation of the Japanese textbook of 1993
- Differential Geometry and Statistics, Michael K. Murray, John W. Rice (1993)
  A classic textbook
- Geometrical Foundations of Asymptotic Inference, Robert E. Kass, Paul W. Vos (1997)
  A classic textbook
- Statistical decision rules and optimal inference, N. N. Chentsov (1982)
  The first textbook on information geometry, originally published in 1972 in Russian, and later translated in english by the AMS
- Information geometry: Near randomness and near independence, Khadiga A. Arwini and Christopher T. J. Dodson (2008)
Books in Japanese
志摩裕彦, へッセ幾何学，裳華房， 2001
入門情報幾何: 統計的モデルをひもとく微分幾何学
 Atsushi Fujioka (藤岡敦著)
Kyoritsu publisher, 2021
Introduction to Information Geometry: Differential Geometry of Statistical Models
情報幾何学の基礎: 情報の内的構造を捉える新たな地平
 Akio Fujiwara (藤原彰夫)
Fundamentals of Information Geometry: A New Horizon for Capturing the Intrinsic Structure of Information
Kyoritsu publisher, 2021
情報幾何学の基礎 (数理情報科学シリーズ)
Akio Fujiwara (藤原彰夫)
牧野書店 Makino bookstore.
情報幾何学の新展開 (SGCライブラリ)
Shun-ichi Amari (甘利俊一)
Saiensu publisher, 2019. A book compiling the columns which appeared in the 数理科学 (Mathematical sciences of Saiensu magazine)
情報幾何の方法, Shun-ichi Amari (甘利俊一) and Hiroshi Nagaoka (長岡浩司)
Iwanami press (1993, reprinted 2017)
情報理論, Shun-ichi Amari (甘利俊一)
Information Theory
Chikuma Shobō (筑摩書房), 2011
Quantum information theory (QIT)/quantum information geometry (QIG)
- Quantum Computation and Quantum Information (10th Anniversary Edition) by Michael A. Nielsen, Isaac L. Chuang, Cambridge University Press, 2010
- Quantum Information Theory: Mathematical Foundation by Masahito Hayashi, Springer, 2017
- Quantum Riemannian Geometry by Edwin J. Beggs , Shahn Majid, Springer, 2020
- Geometry of Quantum States: An Introduction to Quantum Entanglement by Ingemar Bengtsson and Karol Życzkowski, Cambridge University Press, 2009
  
  Second edition
- Quantum Information Theory and Quantum Statistics by Dénes Petz, Springer, 2008
Other related books
- Deep Learning Architectures A Mathematical Approach, Ovidiu Calin (2020)
- Information Geometry and Population Genetics The Mathematical Structure of the Wright-Fisher Model, Julian Hofrichter, Jürgen Jost, Tat Dat Tran (2017)
Software/APIs
- Geomstatsopen-source Python package for computations and statistics on nonlinear manifolds. Geomstats has a tutorial on Information geometry.
- pyRiemann: Biosignals classification with Riemannian Geometry
- ITE (information theoretical estimators): free and open source, multi-platform, Matlab/Octave toolbox that is capable of estimating various entropy, mutual information, divergence, association measures and cross quantities.
- Manopt: Toolboxes for optimization on manifolds and matrices
- jMEF: A Java library to create, process and manage mixtures of exponential families
- SVG export of several 2D space partitioning structures including Bregman vantage point trees.
- euclid: Exact Computation Geometry Framework Based on 'CGAL', basic computational geometry in R

Category theory and information geometry

Since the seminal work of Chentsov who introduced the category of Markov kernels, category theory plays an essential role in the very foundations of information geometry. Below are some papers and links to explore this topic.

A note on statistical equivalence by Richard Sacksteder, The Annals of Mathematical Statistics 38.3 (1967): 787-794
The categories of mathematical statistics by N. N. Chentsov, Doklady Akademii Nauk SSSR, 1965, Volume 164, Number 3, Pages 511–514
The unfathomable influence of Kolmogorov, by Nikolai N. Chentsov, The Annals of Statistics (1990): 987-998.

emphasizes the interest of Kolmogorov for exploring statistical divergences

Home page of Evan Patterson

Finsler geometry

Home page of Geometric Science of Information

April 2022.

Information geometry and divergences: Classical and Quantum

Foundations, Applications, and Software APIs

Information geometry: Tutorials and surveys

Fisher-Rao manifolds (Riemannian manifolds and Hamadard manifolds)

Cones

Finsler manifolds

Bregman manifolds/Hessian manifolds

Exponential families and Mixture families

Information geometry of deformed exponential families

Information geometry of the probability simplex

Information geometry of singular statistical models

Geometry of time series and correlations/dependences

Hilbert geometry

Hyperbolic geometry and geometry of Siegel domains

Some applications of information geometry

Natural gradient

Centers and clustering

Miscellaneous applications

Dissimilarities, distances, divergences and diversities

Jensen-Shannon divergence

Probability simplex

f-divergences

Bregman divergences and some generalizations

Jensen divergences and some generalizations

Conformal divergences

Projective divergences

Optimal transport/Wasserstein distances/Sinkhorn distances

Shannon, Rényi, Tsallis, Sharmal-Mittal entropies, cross-entropies and divergences

Chernoff information

Other dissimilarities

Loss functions and proper scoring rules

Divergences between statistical mixtures

Edited books and proceedings

Monographs and textbooks on information geometry

Books in Japanese

Quantum information theory (QIT)/quantum information geometry (QIG)

Other related books

Software/APIs

Category theory and information geometry