The Kullback-Leibler divergence between a Poisson distribution and a geometric distribution

Frank Nielsen
Frank.Nielsen@acm.org

October 4, 2021

Abstract

This note illustrates how to apply the generic formula of the Kullback-Leibler divergence between two densities of two different exponential families [2].

This column is also available as the file KLPoissonGeometricDistributions.pdf.

It is well-known that the Kullback-Leibler between two densities P_θ₁ and P_θ₂ of the same exponential family amounts to a reverse Bregman divergence between the corresponding natural parameters for the Bregman generator set to the cumulant function F(θ) [1]:

DKL [Pθ : Pθ] = BF *(θ1 : θ2) = BF (θ2 : θ1) := F (θ2) - F(θ1)- (θ2 - θ1)⋅∇F (θ1). 1 2

The following formula for the Kullback-Leibler divergence (KLD) between two densities P_θ and Q_θ′ of two different exponential families (with cumulant function F) and (with cumulant function F) was reported in [2] (Proposition 5):

DKL [Pθ : Q θ′] = FQ (θ′)+ F *(η )- EP [tQ(x)]⋅θ′ + EP [kP (x )- kQ(x)]. P θ θ

(1)

When = (and F = F = F), we recover the reverse Fenchel-Young divergence which corresponds to the reverse Bregman divergence:

D [P : P ′] = F (θ′)+ F*(η)- η⋅θ′ =: Y *(θ′ : η) = Y * (η : θ′). KL θ θ F,F F ,F

Consider the KLD between a Poisson probability mass function (pmf) and a geometric pmf. The canonical decomposition of the Poisson and geometric pmfs are summarized in Table 1.

	Poisson family	Geometric family

support	ℕ ∪{0}	ℕ ∪{0}
base measure	counting measure	counting measure
ordinary parameter	rate λ > 0	success probability p ∈ (0,1)
pmf	exp(-λ)	(1 - p)^xp
sufficient statistic	t(x) = x	t(x) = x
natural parameter	θ(λ) = log λ	θ(p) = log(1 - p)
cumulant function	F(θ) = exp(θ)	F(θ) = -log(1 - exp(θ))
	F(λ) = λ	F(p) = -log(p)
auxiliary measure term	k(x) = -log x!	k(x) = 0
moment parameter η = E[t(x)]	η = λ	η = = - 1
negentropy (convex conjugate)	F^*(θ(λ)) = λlog λ - λ	F^*(θ(p)) = (1 -)log(1 - p) + log p
(F^*(η) = θ ⋅ η - F(θ))

Table 1: Canonical decomposition of the Poisson and the geometric discrete exponential families.

Thus we calculate the KLD between two geometric distributions Q_p₁ and Q_p₂ as

DKL [Qp1 : Qp2] = BFQ (θ(p2) : θ(p1)), = FQ (θ(p2))- FQ(θ(p1))- (θ(p2)- θ(p1))η(p1),

That is, we have

|------------------(---)--(------)----------| | p1 -1 1-- p1| |DKL [Qp1 : Qp2] = log p2 - 1- p1 log1 - p2 . --------------------------------------------

The following code in Maxima (https://maxima.sourceforge.io/) check the above formula.

Geometric(x,p):=((1-p)**x)*p;
nbterms:50;
KLGeometricSeries(p1,p2):=sum((Geometric(x,p1)*log(Geometric(x,p1)/Geometric(x,p2))),x,0,nbterms);
KLGeometricFormula(p1,p2):=log(p1/p2)-log((1-p2)/(1-p1))*((1/p1)-1);
p1:0.2;
p2:0.6;
float(KLGeometricSeries(p1,p2));
float(KLGeometricFormula(p1,p2));

Evaluating the above code, we get:

(%o7) 1.673553688712277
(%o8) 1.673976433571672

Thus we have the KLD between a Poisson pmf p_λ and a geometric pmf q_p is equal to

DKL [P λ : Qp ] = FQ (θ′)+ F *P(η)- EPθ[tQ(x)]⋅θ′ + EPθ[kP (x )- kQ(x)], (2) = - logp+ λ logλ - λ(1+ p)- E [log x!] (3) Pλ

Since E_{p_λ}[-log x!] = -∑ _k=0^∞e^-λ λklog(k!)
k! , we have

|---------------------------------------------∞∑------k-------| |DKL [P λ : Qp ] = - logp +λ log λ- λ - λlog(1 - p) - e- λλ-log(k!) | k=0 k! | -------------------------------------------------------------

We check in Maxima the above formula:

Poisson(x,lambda):=(lambda**x)*exp(-lambda)/x!;
KLseries(lambda,p):=sum((Poisson(x,lambda)*log(Poisson(x,lambda)/Geometric(x,p))),x,0,nbterms);
KLformula(lambda,p):=-log(p)+lambda*log(lambda)-lambda-lambda*log(1-p)
-sum(exp(-lambda)*(lambda**x)*log(x!)/x!,x,0,nbterms);
lambda:5.6;
p:0.3;
float(KLseries(lambda,p));
float(KLformula(lambda,p));

Evaluating the above code, we get

(%o14) 0.9378529269681795
(%o15) 0.9378529269681785

References

[1] Arindam Banerjee, Srujana Merugu, Inderjit S Dhillon, Joydeep Ghosh, and John Lafferty. Clustering with Bregman divergences. Journal of machine learning research, 6(10), 2005.

[2] Frank Nielsen. On a Variational Definition for the Jensen-Shannon Symmetrization of Distances Based on the Information Radius. Entropy, 23(4):464, 2021.