Natural gradient
The natural gradient descent (NGD) is the steepest descent on a statistical manifold (Fisher-Rao manifold or more generally
any Riemannian manifold).
NGD is connected with ordinary GD, mirror descent (MD) and Riemannian gradient descent (RGD).
- Natural gradient descent as a first-order Riemannian gradient descent (RGD)
- Bregman Mirror descent and natural gradient descent
- Natural gradient descent as ordinary gradient descent on dually parameterized function in a dually flat space
The Fisher information Riemannian metric is defined from the Fisher information matrix (FIM).
FIM is positive-definite for regular statistical models but can be singular (= rank deficient) or not well-defined (when the covariance of the score is unbounded).
NGD is a second-order optimization method which uses the inverse FIM.
The main merits of NGD is to be invariant to reparameterization (but FIM is covariant to reparameterization) and provably avoiding plateau in online learning.
Thus many works consider ways to bypass the computational of the inverse FIM (FIM),
or to efficiently compute products of IFIM with vectors.
Moreover NGD optimization on submanifolds is important in applications
(e.g., NGD on sparse or rank-deficient positive-definite matrices of the SPD cone)
Some works and notes on natural gradient methods
Frank Nielsen, February 2023