Natural gradient

The natural gradient descent (NGD) is the steepest descent on a statistical manifold (Fisher-Rao manifold or more generally any Riemannian manifold). NGD is connected with ordinary GD, mirror descent (MD) and Riemannian gradient descent (RGD).

Natural gradient descent as a first-order Riemannian gradient descent (RGD)
Bregman Mirror descent and natural gradient descent
Natural gradient descent as ordinary gradient descent on dually parameterized function in a dually flat space

The Fisher information Riemannian metric is defined from the Fisher information matrix (FIM). FIM is positive-definite for regular statistical models but can be singular (= rank deficient) or not well-defined (when the covariance of the score is unbounded). NGD is a second-order optimization method which uses the inverse FIM. The main merits of NGD is to be invariant to reparameterization (but FIM is covariant to reparameterization) and provably avoiding plateau in online learning. Thus many works consider ways to bypass the computational of the inverse FIM (FIM), or to efficiently compute products of IFIM with vectors. Moreover NGD optimization on submanifolds is important in applications (e.g., NGD on sparse or rank-deficient positive-definite matrices of the SPD cone)

Some works and notes on natural gradient methods

Simplifying Momentum-based Riemannian Submanifold Optimization , arXiv: 2302.09738
Practical Structured Riemannian Optimization with Momentum by using Generalized Normal Coordinates, NeurReps Workshop of NeurIPS 2022.
Tractable structured natural-gradient descent using local parameterizations, ICML 2021: 6680-6691
arXiv:2102.07405
Relative Fisher Information and Natural Gradient for Learning Large Modular Models, ICML 2017: 3289-3298
arXiv:1606.06069 (Relative Natural Gradient for Learning Large Complex Models)
On the Influence of Enforcing Model Identifiability on Learning dynamics of Gaussian Mixture Models, arXiv:2206.08598
A note on the natural gradient and its connections with the Riemannian gradient, the mirror descent, and the ordinary gradient
Natural gradient in machine learning, a collaborative blog by Wu Lin (with feedback from Emtiyaz Khan, Mark Schmidt and Frank Nielsen)

Frank Nielsen, February 2023