This note illustrates how to apply the generic formula of the Kullback-Leibler divergence between two densities of two different exponential families [2].
This column is also available as the file KLPoissonGeometricDistributions.pdf.
It is well-known that the Kullback-Leibler between two densities Pθ1 and Pθ2 of the same exponential family amounts to a reverse Bregman divergence between the corresponding natural parameters for the Bregman generator set to the cumulant function F(θ) [1]:
The following formula for the Kullback-Leibler divergence (KLD) between two densities Pθ and Qθ′ of two
different exponential families (with cumulant function F
) and
(with cumulant function F
) was reported
in [2] (Proposition 5):
![]() | (1) |
When =
(and F = F
= F
), we recover the reverse Fenchel-Young divergence which corresponds to the
reverse Bregman divergence:
Consider the KLD between a Poisson probability mass function (pmf) and a geometric pmf. The canonical decomposition of the Poisson and geometric pmfs are summarized in Table 1.
Poisson family ![]() | Geometric family ![]() |
|
support | ℕ ∪{0} | ℕ ∪{0} |
base measure | counting measure | counting measure |
ordinary parameter | rate λ > 0 | success probability p ∈ (0,1) |
pmf | ![]() | (1 - p)xp |
sufficient statistic | t![]() | t![]() |
natural parameter | θ(λ) = log λ | θ(p) = log(1 - p) |
cumulant function | F![]() | F![]() |
F![]() | F![]() |
|
auxiliary measure term | k![]() | k![]() |
moment parameter η = E[t(x)] | η = λ | η = ![]() ![]() |
negentropy (convex conjugate) | F![]() | F![]() ![]() |
(F*(η) = θ ⋅ η - F(θ)) |
Thus we calculate the KLD between two geometric distributions Qp1 and Qp2 as
That is, we have
The following code in Maxima (https://maxima.sourceforge.io/) check the above formula.
Geometric(x,p):=((1-p)**x)*p; nbterms:50; KLGeometricSeries(p1,p2):=sum((Geometric(x,p1)*log(Geometric(x,p1)/Geometric(x,p2))),x,0,nbterms); KLGeometricFormula(p1,p2):=log(p1/p2)-log((1-p2)/(1-p1))*((1/p1)-1); p1:0.2; p2:0.6; float(KLGeometricSeries(p1,p2)); float(KLGeometricFormula(p1,p2));
Evaluating the above code, we get:
(%o7) 1.673553688712277 (%o8) 1.673976433571672
Thus we have the KLD between a Poisson pmf pλ and a geometric pmf qp is equal to
Since Epλ[-log x!] = -∑
k=0∞e-λ, we have
We check in Maxima the above formula:
Poisson(x,lambda):=(lambda**x)*exp(-lambda)/x!; KLseries(lambda,p):=sum((Poisson(x,lambda)*log(Poisson(x,lambda)/Geometric(x,p))),x,0,nbterms); KLformula(lambda,p):=-log(p)+lambda*log(lambda)-lambda-lambda*log(1-p) -sum(exp(-lambda)*(lambda**x)*log(x!)/x!,x,0,nbterms); lambda:5.6; p:0.3; float(KLseries(lambda,p)); float(KLformula(lambda,p));
Evaluating the above code, we get
(%o14) 0.9378529269681795 (%o15) 0.9378529269681785
[1] Arindam Banerjee, Srujana Merugu, Inderjit S Dhillon, Joydeep Ghosh, and John Lafferty. Clustering with Bregman divergences. Journal of machine learning research, 6(10), 2005.
[2] Frank Nielsen. On a Variational Definition for the Jensen-Shannon Symmetrization of Distances Based on the Information Radius. Entropy, 23(4):464, 2021.