Ten great articles in information geometry

Disclaimer: The list below is a personal selection of historical articles which is highly suggestive as so many good papers have been published in information geometry. Yet, I think reading/browsing those 10 papers could be useful for researchers/engineers starting in information geometry. The papers are not listed in chronological order.
  1. Algebraic foundation of mathematical statistics, Nikolai Nikolaevich Čencov (Chentsov), Statistics: A Journal of Theoretical and Applied Statistics 9.2 (1978): 267-276.

    A very readable synthesis of Chentsov monograph results defining statistical invariance, total variation, Kullback-Leibler, and Chernoff divergences, etc

  2. Defining the Curvature of a Statistical Problem (with Applications to Second Order Efficiency), Bradley Efron, The Annals of Statistics (1975): 1189-1242.

    Statistical inference on curved exponential families linked with a novel notion of statistical curvature, information loss, etc. Great discussions of the paper by many Statisticians.

  3. PDF Differential geometry of smooth families of probability distributions, Hiroshi Nagaoka and Shun-ichi Amari, METR (1982): 82-7.

    New theory of dual connections and key theorems (Pythagoras theorem, projections, etc).

  4. Natural gradient works efficiently in learning, Shun-ichi Amari, Neural computation 10.2 (1998): 251-276.

    Introduce natural gradient on Riemannian manifolds and demonstrate theoretically its efficiency.

  5. Second order efficiency of minimum contrast estimators in a curved exponential family, Shinto Eguchi, The Annals of Statistics (1983): 793-803.

    Introduce information geometry of constrast functions/divergences.

  6. A characterization of monotone and regular divergences, José Manuel Corcuera and Federica Giummole, Annals of the Institute of Statistical Mathematics 50 (1998): 433-450.

    Characterize monotone divergences and regular divergences. Taylor expansions.

  7. PDF Dependence, correlation and gaussianity in independent component analysis, Jean-François Cardoso, The Journal of Machine Learning Research 4 (2003): 1177-1203.

    Information geometry in action for ICA

  8. Information geometry on hierarchy of probability distributions, Shun-ichi Amari, IEEE transactions on information theory 47.5 (2001): 1701-1711.

    Mixed primal/dual coordinate systems, dual foliations, and divergence decomposition.

  9. PDF Information geometry of Boltzmann machines, Shun-ichi Amari, Koji Kurata and Hiroshi Nagaoka, IEEE Transactions on neural networks 3.2 (1992): 260-271.

    Information geometry in action for neural networks

  10. PDF Exponentially concave functions and a new information geometry, Soumik Pal, and Ting-Kam Leonard Wong, The Annals of probability 46.2 (2018): 1070-1113.

    Logarithmic divergences extends Bregman divergences and are canonical divergences in constant sectional curvature manifolds