This note illustrates how to apply the generic formula of the Kullback-Leibler divergence between two densities of two different exponential families [2].
This column is also available as the file KLPoissonGeometricDistributions.pdf.
It is well-known that the Kullback-Leibler between two densities Pθ1 and Pθ2 of the same exponential family amounts to a reverse Bregman divergence between the corresponding natural parameters for the Bregman generator set to the cumulant function F(θ) [1]:
The following formula for the Kullback-Leibler divergence (KLD) between two densities Pθ and Qθ′ of two different exponential families (with cumulant function F) and (with cumulant function F) was reported in [2] (Proposition 5):
| (1) |
When = (and F = F = F), we recover the reverse Fenchel-Young divergence which corresponds to the reverse Bregman divergence:
Consider the KLD between a Poisson probability mass function (pmf) and a geometric pmf. The canonical decomposition of the Poisson and geometric pmfs are summarized in Table 1.
Poisson family | Geometric family | |
support | ℕ ∪{0} | ℕ ∪{0} |
base measure | counting measure | counting measure |
ordinary parameter | rate λ > 0 | success probability p ∈ (0,1) |
pmf | exp(-λ) | (1 - p)xp |
sufficient statistic | t(x) = x | t(x) = x |
natural parameter | θ(λ) = log λ | θ(p) = log(1 - p) |
cumulant function | F(θ) = exp(θ) | F(θ) = -log(1 - exp(θ)) |
F(λ) = λ | F(p) = -log(p) | |
auxiliary measure term | k(x) = -log x! | k(x) = 0 |
moment parameter η = E[t(x)] | η = λ | η = = - 1 |
negentropy (convex conjugate) | F*(θ(λ)) = λlog λ - λ | F*(θ(p)) = (1 -)log(1 - p) + log p |
(F*(η) = θ ⋅ η - F(θ)) |
Thus we calculate the KLD between two geometric distributions Qp1 and Qp2 as
That is, we have
The following code in Maxima (https://maxima.sourceforge.io/) check the above formula.
Geometric(x,p):=((1-p)**x)*p; nbterms:50; KLGeometricSeries(p1,p2):=sum((Geometric(x,p1)*log(Geometric(x,p1)/Geometric(x,p2))),x,0,nbterms); KLGeometricFormula(p1,p2):=log(p1/p2)-log((1-p2)/(1-p1))*((1/p1)-1); p1:0.2; p2:0.6; float(KLGeometricSeries(p1,p2)); float(KLGeometricFormula(p1,p2));
Evaluating the above code, we get:
(%o7) 1.673553688712277 (%o8) 1.673976433571672
Thus we have the KLD between a Poisson pmf pλ and a geometric pmf qp is equal to
Since Epλ[-log x!] = -∑ k=0∞e-λ, we have
We check in Maxima the above formula:
Poisson(x,lambda):=(lambda**x)*exp(-lambda)/x!; KLseries(lambda,p):=sum((Poisson(x,lambda)*log(Poisson(x,lambda)/Geometric(x,p))),x,0,nbterms); KLformula(lambda,p):=-log(p)+lambda*log(lambda)-lambda-lambda*log(1-p) -sum(exp(-lambda)*(lambda**x)*log(x!)/x!,x,0,nbterms); lambda:5.6; p:0.3; float(KLseries(lambda,p)); float(KLformula(lambda,p));
Evaluating the above code, we get
(%o14) 0.9378529269681795 (%o15) 0.9378529269681785
[1] Arindam Banerjee, Srujana Merugu, Inderjit S Dhillon, Joydeep Ghosh, and John Lafferty. Clustering with Bregman divergences. Journal of machine learning research, 6(10), 2005.
[2] Frank Nielsen. On a Variational Definition for the Jensen-Shannon Symmetrization of Distances Based on the Information Radius. Entropy, 23(4):464, 2021.