Bangda Sun

Practice makes perfect

Probability Review (2)

Probability Distributions Review & Cheatsheet (2): Discrete Distributions, Continuous Distributions, Exponential Family Distributions, Conjugate Priors.

1. Discrete Distributions

1.1 Bernoulli Distribution

Suppose X has Bernoulli distribution, XBernoulli(p), it is about a trial performed with probability of p to be “success”, and X takes 1 if it is success and 0 if it is failure.

P(X=k)=pk(1p)1k,k{0,1}.

The expectation and variance of Bernoulli(p) is p and p(1p).

1.2 Binomial Distribution

Suppose X has Binomial distribution, XBinomial(n,p), it models n independent trials with probability of p to be “success”. Therefore it could be regarded as the sum of n i.i.d. Bernoulli(p) random variables.

P(X=k)=(nk)pk(1p)nk,k=0,1,2,,n.

The sum of n independent binomial distributions with parameter ni and p is an another binomial distribution with parameter i=1nni and p.

The expectation and variance of Binomial(n,p) is np and np(1p).

1.3 Poisson Distribution

Suppose X has Poisson distribution, XPoisson(λ). It models the number of times that rare event occurs with an average rate λ (per unit time).

P(X=k)=λkk!eλ,k=0,1,2,,.

Poisson distribution is an approximation to Binomial distribution with large n and very small p. The sum of n independent Poisson distribution with parameter λi is another Poisson distribution with parameter i=1nλi.

The expectation and variance of Poisson(λ) are both λ.

1.4 Geometric Distribution

Suppose X has Geometric distribution, XGeometric(p), it models the number of trials related to first success. There are two scenarios: (1) X is the number of trials before first success, i.e. total number of failures before first success; (2) X is the total number of trials until first success. The distribution of second scenario is actually a “shifted” version of first scenario. The PMF of first scenario is
P(X=k)=(1p)kp,k=0,1,2,.

The expectation and variance of Geometric distribution with parameter p is

E(X)=1pp,  Var(X)=1pp2.

If the Geometric distribution is for the total number of trials (second scenario), since it is a “shifted” of first scenario, therefore the expectation will increase by one unit, i.e.
E(X)=1p,

and variance remains same.

1.5 Negative Binomial Distribution

Suppose X has Negative Binomial distribution, XNegBin(r,p). It models number of failures before r-th success.

P(X=k)=(r+k1k)(1p)kpr,k=0,1,2,.

As i.i.d. Bernoulli distribution sum up to Binomial distribution, here i.i.d. Geometric distribution sum up to Geometric distribution.

The expectation and variance of NegBin(r,p) is

E(X)=r(1p)p,  Var(X)=r(1p)p2.

2. Continuous Distributions

2.1 Uniform Distribution

Suppose X has Uniform distribution, XUnif(α,β), with PDF

f(x)={1βα,for α<x<β0,otherwise.

The expectation and variance of Uniform(α,β) is

E(X)=1βα,  Var(X)=(βα)212.

2.2 Exponential Distribution

Suppose X has Exponential distribution, XExp(λ), with PDF

f(x)={λeλx,for x>00,otherwise.

The shape of PDF is strict decreasing with decay rate λ. The CDF is given by

F(x)=0xλeλtdt=1eλx.

Exponential distribuion could be used to model lifetimes and time between events. The expectation and variance is

E(X)=1λ,  Var(X)=1λ2.

Exponential distribution has memoryless property, i.e. P(X>s+t|X>s)=P(X>t).

2.3 Gamma Distribution

Suppose X has Gamma distribution, XGamma(α,λ), with PDF

f(x)={λαΓ(α)eλxxα1,for x>00,otherwise.

It is easy to see that when α=1 it is an Exponential distribution Exp(λ). Here Gamma function is defined as

Γ(α)=0xα1exdx,

where Γ(α)=(α1)!, Γ(α)=(α1)Γ(α1).

Also, Γ(n/2,1/2) is actually χ2(n) distribution.

2.4 Beta Distribution

Suppose X has Beta distribution, XBeta(α,β), with PDF

f(x)={1B(α,β)xα1(1x)β1,for x[0,1]0,otherwise.

where

B(α,β)=Γ(α)Γ(β)Γ(α+β).

2.5 Normal Distribution

Suppose X has Normal distribution, XN(μ,σ2), with PDF

f(x)=12πσe(xμ)22σ2,xR.

2.6 Exponential Family Distributions

This is a family of distributions, distributions like bernoulli distribution, poisson distribution, exponential distribution, gamma distribution, beta distribution, normal distribution all belong to exponential family.

p(x|θ)=h(x)Z(θ)eS(x)θ.

Where S is the sufficient statistic. The data x and parameter θ interact through the linear term in the exponent. The MLE of θ satisfies

θlogZ(θ^)=1ni=1nS(xi)=Ep(x|θ^)[S(x)].

3. Conjugate Prior (Bayesian Statistics)

List of commonly used conjugate prior.

3.1 Model Binomial Data

If θBeta(α,β) and y|θBinomial(n,θ), then θ|yBeta(α+y,β+ny).

Beta prior is conjugate for Binomial likelihood, means posterior has same parameteric form as prior. Beta prior has interpretation as “prior data” of α success and α+β tries.

The mean of the posterior is a weighted average of prior mean and likelihood mean.

E(θ|y)=α+βα+β+nαα+β+nα+β+nyn

3.2 Model Event Count Data

If θGamma(α,β) and y1,,yn|θPoisson(θ), then
θ|y1,,ynGamma(α+ny¯,β+n).

The mean of the posterior is a weighted average of prior mean and likelihood mean.

E(θ|y)=ββ+nαβ+nβ+ny¯.