Probability Distibutions

Notation

We use a notation that applies equally to discrete and continuous distributions

A distribution function, or cumulative distribution function, is denoted by a capital letter e.g. F(x). It must satisfy:

F(x) must exist for all but a countable number of values of x
F(-inf) = 0, F(+inf) = 1
F(x) must increase monotonically with x

If F(x) is differentiable, its derivative is denoted f(x) and is called a frequency function or probability density function (pdf). We have dF = dF(x)/dx * dx = f(x)dx.
A local maximum of f(x) is a mode.
j = sqrt(-1)

Properties of Distributions

Characteristic Function

The characteristic function of a distribution is the conjugate of the fourier transform of its pdf: g(t) = Integral ( exp(jtx) dF, x=-inf...+inf). For discrete distributions it is g(t) = Sum ( p(x_k) exp(jtx_k)) over all values x_k.

The usefulness of characteristic functions arises because the characteristic function of the sum of two independent random variables equals the product of the two characteristic functions concerned.

Moments

The moments of a distribution (about the origin) are given by m0_i = Integral (xⁱ dF, x=-inf...+inf) = (-j)ⁱ (dⁱg/dtⁱ)|_t=0= (dⁱ(-jg)/dtⁱ)|_t=0.

m0₀ always equals 1.
m0₁ equals the mean of the distribution and, if the integral converges, is denoted by m.
m0_i equals the coefficient of (jt)^i/i! in the power series expansion of the characteristic function g(jt).
m0_i >= 0 for all even i.

The moments about the mean are given by m_i = Integral((x-m)ⁱ dP, x=-inf...+inf).

m₀ always equals 1.
m₁ always equals 0 providing the integral converges.
m₂ is the variance, v. The standard deviation, s, equals sqrt(v).
The skewness is defined as m₃/s³.
The kurtosis is defined as (m₄/s⁴-3). The sign determines whether a distribution is platykurtic (<0), mesokurtic (=0) or leptokurtic (>0). Relative to a gaussian, platykurtic distributions are generally less peaky and leptokurtic distributions more peaky.
m_i >= 0 for all even i.
if m_k exists, then so do all m_i for i<k.

The moments about an arbitrary point, x, may be obtained by formally exanding mx_i = (m_* + (m-x))ⁱ and then replacing m_*ⁱ by m_i.

In particular if x=0:

m0₂ = m₂+m²
m0₃ = m₃ + 3m m₂ + m³
m0₄ = m₄ + 4m m₃ + 6m² m₂ + m⁴

Inverse relationships are:

m₂ = m0₂ - m²
m₃ = m0₃ - 3m m0₂ + 2m³
m₄ = m0₄ - 4m m0₃ + 6m² m0₂ - 3m⁴

Cumulants

The r'th cumulant of a distribution, k_r, is the coefficient of (jt)^r/r! in the power series expansion of log_e of the characteristic function, i.e. of ln(Integral ( exp(jtx) dF, x=-inf...+inf). It may be obtained from the characteristic function as k_r=(-j)ⁱ (dⁱln(g)/dtⁱ)|_t=0=(dⁱln(-jg)/dtⁱ)|_t=0

The cumulants are related to the moments as follows:

m = k₁
m₂ = k₂ = v = s²
m₃ = k₃
m₄ = k₄ + 3k₂²
m₅ = k₅ + 10k₃k₂
m₆ = k₆ + 15k₄k₂ + 10k₃² + 15k₂³

The formula for m_r contains all terms of the form k_a^A * k_b^B * k_c^C * ... where aA+bB+cC + ... = r and A,B,C,... are all >= 1 and 2 <= a,b,c,... <= r. The coefficient for a general term is r!/(A! * a!^A * B! * b!^B * C! * c!^C * ...).

The inverse relationships are

k₁ = m
k₂ = m₂ = v = s²
k₃ = m₃
k₄ = m₄ - 3m₂²
k₅ = m₅ - 10m₃m₂
k₆ = m₆ - 15m₄m₂ - 10m₃² + 30m₂³

The formula for k_r contains all terms of the form m_a^A * m_b^B * m_c^C * ... where aA+bB+cC + ... = r and A,B,C,... are all >= 1 and 1 < a,b,c,... <= r. The coefficient for a general term is (-1)^k-1(k-1)! r!/(A! * a!^A * B! * b!^B * C! * c!^C * ...) where k=A+B+C+.... The two sets of coefficients are the same when k=1 and the same but for their sign when k=2.

We can also define the normalised cumulants g_r = k_r/s^r:

Skewness: g₃ = k₃/s³ = m₃/s³
Kurtosis: g₄ = k₄/s⁴ = m₄/s⁴ - 3

Bounds

Chebyshev Inequality: Pr(|X-m|>=d) = 1 - F(m+d) + F(m-d) <= (s/d)² gives a rather weak bound on the sum of the two tail probabilities.

Transforming Distributions

Linear Transformation: Suppose Y=aX+b where X has a pdf f(x)=dF(x)/dx with mean m and standard deviation s and a characteristic function g(t), then:

Y has mean am+b and standard deviation as
The pdf of Y is f((y-b)/a)/a
The cdf of Y is F((y-b)/a)
The characteristic function of Y is e^jbtg(at)
The cumulants of the two distributions are related by
- k₁^(Y) = ak₁^(X)+b
- k_r^(Y) = a^r k_r^(X) for r>1
The normalised cumulants satisfy
- g_r^(Y) = g_r^(X) for r>1

Probability Identities

The identities below are expressed in terms of discrete distributions. The also work for continuous distributions with sums replaced by integrals. S_x() denotes the sum over all values of x.

p(x,y) = p(y|x)p(x) = p(x|y)p(y)
- p(y|x) = p(x|y)p(y)p(x)^-1. This is Bayes rule.
p(x) = S_y(p(x,y))
p(x|y)=S_z(p(x,z|y))=S_z(p(x|y,z)p(z))

Discrete Distributions

In these distributions, the variable r takes integer values.

Binomial Distribution

p(r) = a^r (1-a)^n-r n! / (r! (n-r)!) for 0<=r<=n where a is a constant 0<=a<=1.
Characteristic function g(t) = (1-a+ae^jt)ⁿ
m=na, v=na(1-a), skewness = (1-2a)/sqrt(na(1-a)), kurtosis = (1-6a(1-a))/(na(1-a))

Poisson Distribution

p(r) = e^-a a^r / r! for r>=0
Characteristic function g(t) = exp(a(e^jt-1))
k_r = a for all r>=1
m=a, v=a, skewness = a^-½, kurtosis = a^-1

Continuous Distributions

Beta Distribution

Cauchy Distribution

f(x)=Pi^-1/(1+x²)
F(x)=Pi^-1tan^-1(x)
Characteristic function g(t) = exp(-|t|)
Mode=0
Mean = Undefined
Variance = Undefined
Skewness = Undefined
Kurtosis = Undefined

Chi-Squared Distribution

This is the distribution of the sum of the squares of n independent standard gaussian random variables. If Y=½X, then Y has a gamma distribution with parameter p=½n.

f(x)=2^-½nx^½n-1e^-½x/(½n-1)! for x>=0. Parameter n>=0.
[n even] F(x) = 1-Ce^-½x where C=sum(2^-^kx^k/k!, k=0..(½n-1))
Characteristic function g(t) = (1-2jt)^-½n
Cumulants: k_r=n2^r-1(r-1)!
Mode=n-2
Mean = n
Variance = 2n
Skewness = sqrt(8/n)
Kurtosis = 12/n
[n=2] X has an exponential distribution, f(x)=½e^-½x and Y=sqrt(X) has a Rayleigh distribution.

Non-central Chi-squared Distribution

This is the distribution of the sum of the squares of n independent gaussian random variables with unit variances non-zero means. The non-centrality parameter d is the sum of the squares of the means [some people call this d²].

f(x)=2^-½ne^-½(x+d)Sum((½d^)kx^½n+k-1/(k!(½n+k-1)!);i=0..infinity) for x>=0. Parameters n,d>=0.
Mean = n+d
Variance = 2n+4d
Skewness = sqrt(8(n+3d)²(n+2d)^-3)
Kurtosis = 12(n+4d)(n+2d)^-2

Exponential Distribution

f(x)=exp(-x) for x>=0.
F(x)=1-exp(-x)
Mode=0
Mean = 1
Variance = 1
Skewness = 2
Kurtosis = 6

Fisher's F Distribution

Fisher's z Distribution

Gamma Distribution (Pearson Type III Distribution)

f(x)=x^p-1exp(-x)/(p-1)! for x>=0. Parameter p>=0.
Mode=p-1
Mean = p
Variance = p
Skewness = 2/sqrt(p)
Kurtosis = 6/p

Gaussian or Normal Distribution

f(x) = (2 Pi)^-½ exp(-½x²)
Characteristic function g(t) = exp(-½t²)
Mode=0, Mean=0, Variance=1, Skewness=0, Kurtosis=0
Moments: m_i= i!/(2^i/2(i/2)!) for even i and m_i= 0 for odd i
Cumulants: k₂= 1 and k_i= 0 for i>2

Laplace Distribution

f(x)=½exp(-|x|)
F(x)=½(1+Sgn(x)(1-exp(-|x|)))
Mode=0
Mean = 0
Variance = 2
Skewness = 0
Kurtosis = 3

Lognormal Distribution

This is a distribution such that ln(x) has a gaussian distribution with mean a and standard deviation b.

f(x) = (2 Pi)^-½ exp(-½((ln(x)-a)/b)²)/(bx)
Mode: exp(a-b²)
Median: exp(a)
Mean: exp(a+½b²)
Variance: exp(2a+b²) (exp(b²)-1)
Moments about 0: m0_r = exp(ra+½r²b²)

Nakagami Distribution

f(x)=2(k/w)^k x^2k-1exp(-kx²/w)/Gamma(k)
m0₂ = w
m0_r = (w/k)^r/2 Gamma(k+r/2)/Gamma(k)

Rayleigh Distribution

This distribution arises in communications theory as the magnitude of a component of the fourier transform of white noise.

f(x) = x exp(-½x²) for x>=0.
F(x) = 1-exp(-½x²)
Mode=1
Mean = Sqrt(½Pi)
Variance = 2 - ½Pi
Skewness = (Pi-3) Sqrt(4Pi/(4-Pi)³) = 0.631..
Kurtosis = (80-24Pi)/(4-Pi)² - 6 = 0.245..

Rectangular Distribution

f(x)=1 for -½ < x < ½
F(x)=½+x for -½ < x < ½
Mean = 0
Variance = 1/12 = 0.08333
Skewness = 0
Kurtosis = -1.2

Rician Distribution

This distribution arises in communications theory as the magnitude of the fourier transform of a cosine wave (of amplitude A) corrupted by additive white noise.

f(x)=x exp(-½(x²+A²)) I₀(xA)

Students t Distribution

Multivariate Gaussian

If x is an n-dimensional multivariate gaussian with mean m and covariance matrix S then

its pdf is given by (2*pi)^-n/2 |S|^-1/2exp( (x-m)^T S^-1 (x-m) )
its characteristic function is g(t)=exp(jt^Tm-½t^TSt)