# Probability Distibutions

## Notation

We use a notation that applies equally to discrete and continuous distributions

• A distribution function, or cumulative distribution function, is denoted by a capital letter e.g. F(x). It must satisfy:
• F(x) must exist for all but a countable number of values of x
• F(-inf) = 0, F(+inf) = 1
• F(x) must increase monotonically with x
• If F(x) is differentiable, its derivative is denoted f(x) and is called a frequency function or probability density function (pdf). We have dF = dF(x)/dx * dx = f(x)dx.
• A local maximum of f(x) is a mode.
• j = sqrt(-1)

## Properties of Distributions

### Characteristic Function

The characteristic function of a distribution is the conjugate of the fourier transform of its pdf: g(t) = Integral ( exp(jtx) dF, x=-inf...+inf). For discrete distributions it is g(t) = Sum ( p(xk) exp(jtxk)) over all values xk.

The usefulness of characteristic functions arises because the characteristic function of the sum of two independent random variables equals the product of the two characteristic functions concerned.

### Moments

The moments of a distribution (about the origin) are given by m0i = Integral (xi dF, x=-inf...+inf) = (-j)i (dig/dti)|t=0= (di(-jg)/dti)|t=0.

• m00 always equals 1.
• m01 equals the mean of the distribution and, if the integral converges, is denoted by m.
• m0i equals the coefficient of (jt)^i/i! in the power series expansion of the characteristic function g(jt).
• m0i >= 0 for all even i.

The moments about the mean are given by mi = Integral((x-m)i dP, x=-inf...+inf).

• m0 always equals 1.
• m1 always equals 0 providing the integral converges.
• m2 is the variance, v. The standard deviation, s, equals sqrt(v).
• The skewness is defined as m3/s3.
• The kurtosis is defined as (m4/s4-3). The sign determines whether a distribution is platykurtic (<0), mesokurtic (=0) or leptokurtic (>0). Relative to a gaussian, platykurtic distributions are generally less peaky and leptokurtic distributions more peaky.
• mi >= 0 for all even i.
• if mk exists, then so do all mi for i<k.

The moments about an arbitrary point, x, may be obtained by formally exanding mxi = (m* + (m-x))i and then replacing m*i by mi.

• In particular if x=0:
• m02 = m2+m2
• m03 = m3 + 3m m2 + m3
• m04 = m4 + 4m m3 + 6m2 m2 + m4
• Inverse relationships are:
• m2 = m02 - m2
• m3 = m03 - 3m m02 + 2m3
• m4 = m04 - 4m m03 + 6m2 m02 - 3m4

### Cumulants

The r'th cumulant of a distribution, kr, is the coefficient of (jt)r/r! in the power series expansion of loge of the characteristic function, i.e. of ln(Integral ( exp(jtx) dF, x=-inf...+inf). It may be obtained from the characteristic function as kr=(-j)i (diln(g)/dti)|t=0=(diln(-jg)/dti)|t=0

The cumulants are related to the moments as follows:

• m = k1
• m2 = k2 = v = s2
• m3 = k3
• m4 = k4 + 3k22
• m5 = k5 + 10k3k2
• m6 = k6 + 15k4k2 + 10k32 + 15k23

The formula for mr contains all terms of the form kaA * kbB * kcC * ... where aA+bB+cC + ... = r and A,B,C,... are all >= 1 and 2 <= a,b,c,... <= r. The coefficient for a general term is r!/(A! * a!A * B! * b!B * C! * c!C * ...).

The inverse relationships are

• k1 = m
• k2 = m2 = v = s2
• k3 = m3
• k4 = m4 - 3m22
• k5 = m5 - 10m3m2
• k6 = m6 - 15m4m2 - 10m32 + 30m23

The formula for kr contains all terms of the form maA * mbB * mcC * ... where aA+bB+cC + ... = r and A,B,C,... are all >= 1 and 1 < a,b,c,... <= r. The coefficient for a general term is (-1)k-1(k-1)! r!/(A! * a!A * B! * b!B * C! * c!C * ...) where k=A+B+C+.... The two sets of coefficients are the same when k=1 and the same but for their sign when k=2.

We can also define the normalised cumulants gr = kr/sr:

• Skewness: g3 = k3/s3 = m3/s3
• Kurtosis: g4 = k4/s4 = m4/s4 - 3

## Bounds

Chebyshev Inequality: Pr(|X-m|>=d) = 1 - F(m+d) + F(m-d) <= (s/d)2 gives a rather weak bound on the sum of the two tail probabilities.

## Transforming Distributions

Linear Transformation: Suppose Y=aX+b where X has a pdf f(x)=dF(x)/dx with mean m and standard deviation s and a characteristic function g(t), then:

• Y has mean am+b and standard deviation as
• The pdf of Y is f((y-b)/a)/a
• The cdf of Y is F((y-b)/a)
• The characteristic function of Y is ejbtg(at)
• The cumulants of the two distributions are related by
• k1(Y) = ak1(X)+b
• kr(Y) = ar kr(X) for r>1
• The normalised cumulants satisfy
• gr(Y) = gr(X) for r>1

## Probability Identities

The identities below are expressed in terms of discrete distributions. The also work for continuous distributions with sums replaced by integrals. Sx() denotes the sum over all values of x.

• p(x,y) = p(y|x)p(x) = p(x|y)p(y)
• p(y|x) = p(x|y)p(y)p(x)-1. This is Bayes rule.
• p(x) = Sy(p(x,y))
• p(x|y)=Sz(p(x,z|y))=Sz(p(x|y,z)p(z))

## Discrete Distributions

In these distributions, the variable r takes integer values.

### Binomial Distribution

• p(r) = ar (1-a)n-r n! / (r! (n-r)!) for 0<=r<=n where a is a constant 0<=a<=1.
• Characteristic function g(t) = (1-a+aejt)n
• m=na, v=na(1-a), skewness = (1-2a)/sqrt(na(1-a)), kurtosis = (1-6a(1-a))/(na(1-a))

### Poisson Distribution

• p(r) = e-a ar / r!  for r>=0
• Characteristic function g(t) = exp(a(ejt-1))
• kr = a for all r>=1
• m=a, v=a, skewness = a, kurtosis = a-1

## Continuous Distributions

### Cauchy Distribution

• f(x)=Pi-1/(1+x2)
• F(x)=Pi-1tan-1(x)
• Characteristic function g(t) = exp(-|t|)
• Mode=0
• Mean = Undefined
• Variance = Undefined
• Skewness = Undefined
• Kurtosis = Undefined

### Chi-Squared Distribution

This is the distribution of the sum of the squares of n independent standard gaussian random variables. If Y=½X, then Y has a gamma distribution with parameter p=½n.

• f(x)=2-½nx½n-1e-½x/(½n-1)! for x>=0. Parameter n>=0.
• [n even] F(x) = 1-Ce-½x where C=sum(2-kxk/k!, k=0..(½n-1))
• Characteristic function g(t) = (1-2jt)-½n
• Cumulants: kr=n2r-1(r-1)!
• Mode=n-2
• Mean = n
• Variance = 2n
• Skewness = sqrt(8/n)
• Kurtosis = 12/n
• [n=2] X has an exponential distribution, f(x)=½e-½x and Y=sqrt(X) has a Rayleigh distribution.

### Non-central Chi-squared Distribution

This is the distribution of the sum of the squares of n independent gaussian random variables with unit variances non-zero means. The non-centrality parameter d is the sum of the squares of the means [some people call this d2].

• f(x)=2-½ne-½(x+d)Sum((½d)kx½n+k-1/(k!(½n+k-1)!);i=0..infinity) for x>=0. Parameters n,d>=0.
• Mean = n+d
• Variance = 2n+4d
• Skewness = sqrt(8(n+3d)2(n+2d)-3)
• Kurtosis = 12(n+4d)(n+2d)-2

### Exponential Distribution

• f(x)=exp(-x) for x>=0.
• F(x)=1-exp(-x)
• Mode=0
• Mean = 1
• Variance = 1
• Skewness = 2
• Kurtosis = 6

### Gamma Distribution (Pearson Type III Distribution)

• f(x)=xp-1exp(-x)/(p-1)! for x>=0. Parameter p>=0.
• Mode=p-1
• Mean = p
• Variance = p
• Skewness = 2/sqrt(p)
• Kurtosis = 6/p

### Gaussian or Normal Distribution

• f(x) = (2 Pi) exp(-½x2)
• Characteristic function g(t) = exp(-½t2)
• Mode=0, Mean=0, Variance=1, Skewness=0, Kurtosis=0
• Moments: mi = i!/(2i/2(i/2)!) for even i and mi = 0 for odd i
• Cumulants: k2 = 1 and ki = 0 for i>2

### Laplace Distribution

• f(x)=½exp(-|x|)
• F(x)=½(1+Sgn(x)(1-exp(-|x|)))
• Mode=0
• Mean = 0
• Variance = 2
• Skewness = 0
• Kurtosis = 3

### Lognormal Distribution

This is a distribution such that ln(x) has a gaussian distribution with mean a and standard deviation b.

• f(x) = (2 Pi) exp(-½((ln(x)-a)/b)2)/(bx)
• Mode: exp(a-b2)
• Median: exp(a)
• Mean: exp(ab2)
• Variance: exp(2a+b2) (exp(b2)-1)
• Moments about 0: m0r = exp(rar2b2)

### Nakagami Distribution

• f(x)=2(k/w)k x2k-1exp(-kx2/w)/Gamma(k)
• m02 = w
• m0r = (w/k)r/2 Gamma(k+r/2)/Gamma(k)

### Rayleigh Distribution

This distribution arises in communications theory as the magnitude of a component of the fourier transform of white noise.

• f(x) = x exp(-½x2) for x>=0.
• F(x) = 1-exp(-½x2)
• Mode=1
• Mean = Sqrt(½Pi)
• Variance = 2 - ½Pi
• Skewness = (Pi-3) Sqrt(4Pi/(4-Pi)3) = 0.631..
• Kurtosis = (80-24Pi)/(4-Pi)2 - 6 = 0.245..

### Rectangular Distribution

• f(x)=1 for -½ < x < ½
• F(x)=½+x for -½ < x < ½
• Mean = 0
• Variance = 1/12 = 0.08333
• Skewness = 0
• Kurtosis = -1.2

### Rician Distribution

This distribution arises in communications theory as the magnitude of the fourier transform of a cosine wave (of amplitude A) corrupted by additive white noise.

• f(x)=x exp(-½(x2+A2)) I0(xA)

•

## Multivariate Gaussian

If x is an n-dimensional multivariate gaussian with mean m and covariance matrix S then

• its pdf is given by (2*pi)-n/2 |S|-1/2exp( (x-m)T S-1 (x-m) )
• its characteristic function is g(t)=exp(jtTmtTSt)