Historically, Information Geometry (IG) aimed at unravelling the geometric structures
of families of probability distributions called the statistical models.
A statistical model can either be
The Fisher-Rao manifold of a statistical parametric model is a Riemannian manifold equipped with the Fisher information metric. The geodesic length on a Fisher-Rao manifold is called Rao's distance [Hotelling 1930][Rao 1945]. More generally, Amari proposed the dualistic structure of IG which consists of a pair of torsion-free affine connections coupled to the Fisher metric [Amari 1980's]. Given a dualistic structure, we can build generically a one-parameter family of dualistic information-geometric structures, called the α-geometry. When both connections are flat, the information-geometric space is said dually flat: For example, the Amari's ±1-structures of exponential families and mixture families are famous examples of dually flat spaces in information geometry. In differential geometry, geodesics are defined as autoparallel curves with respect to a connection. When using the default Levi-Civita metric connection derived from the Fisher metric on Fisher-Rao manifolds, we get Rao's distance which are locally minimizing geodesics. Eguchi showed how to build from any smooth distortion (originally called a contrast function) a dualistic structure: The information geometry of divergences [Eguchi 1982]. The information geometry of Bregman divergences yields dually flat spaces: It is a special cases of Hessian manifolds which are differentiable manifolds equipped with a metric tensor being a Hessian metric and a flat connection [Shima 2007]. Since geometric structures scaffold spaces independently of any applications, these pure information-geometric Fisher-Rao structure and α-structures of statistical models can also be used in non-statistical contexts too: For example, for analyzing interior point methods with barrier functions in optimization, or for studying time-series models, etc.
Statistical divergences between parametric statistical models amount to parameter divergences on which we can use the Eguchi's divergence information geometry to get a dualistic structure. A projective divergence is a divergence which is invariant by independent rescaling of its parameters. A statistical projective divergence is thus useful for estimating computationally intractable statistical models (eg., gamma divergences, Cauchy-Schwarz and Hölder divergences, or singly-sided projective Hyvärinen divergence). A conformal divergence is a divergence scaled by a conformal factor which may depend on one or two of its arguments. The metric tensor obtained from Eguchi's information divergence of a conformal divergence is a conformal metric of the metric obtained from the divergence, hence its name. By analogy to total least squares vs least squares, a total divergence is a divergence which is invariant wrt. to rotations (eg., total Bregman divergences). An important property of divergences on the probability simplex is to be monotone by coarse-graining. That is, merging bins and considering reduced histograms should give a distance less or equal than the distance on the full resolution histograms. This information monotonicity property holds for f-divergences (called invariant divergences in information geometry), Hilbert log cross-ratio distance, or Aitchison distance for example. Some statistical divergences are upper bounded (eg., Jensen-Shannon divergences) while others are not (eg., Jeffreys' divergence). Optimal transport distances require a ground base distance on the sample space. A diversity index generalizes a two-point distance to a family of parameters/distributions. It usually measures the dispersion around a center point (eg., like variance measures the dispersion around the centroid).
A gentle short introduction to information geometry
A self-contained introduction to classic parametric information geometry with applications and basics of differential geometry
Information projections are the workhorses of algorithms using the framework of information geometry. A projection is defined according to geodesics (wrt a connection) and orthogonality (wrt a metric tensor). In dually flat spaces, information projections can be interpreted as minimum Bregman divergences (Bregman projections). Unicity theorems for exponential families and mixture families.
A self-contained introduction to dually flat spaces which we call Bregman manifolds. The generalized Pythagorean theorem is derived from the 3-parameter Bregman identity. The 4-parameter Bregman identity is also explained
A description of the pathbreaking paper of Calyampudi Radhakrishna Rao (1945): "Information and the accuracy attainable in the estimation of statistical parameters", 1945.
Fisher-Rao manifold of the categorical distributions can be isometrically embedded on the positive orthant of the sphere of radius 2 in Euclidean space. Rao distance thus corresponds to the length of great circle arc, and relaxing the embedded distributions to positive measures, we get twice the Hellinger divergence. (explanation)
Dynamic geometry with relative Fisher information metric
Approximating the smallest enclosing Riemannian ball by a simple iterative algorithm which proves the existence of coresets. Applications to Hadamard manifolds (hyperbolic geometry and Riemannian manifold of symmetric positive-definite matrices equipped with the trace metric.
Bregman Voronoi diagrams (or VDs in dually flat spaces) are affine diagrams which can be built from equivalent power diagrams (Laguerre geometry). Generalize the paraboloid lifting transform of Euclidean geometry to potentials functions induced by the convex Bregman generators.
q-deformed exponential families, q-Gaussians, etc.