Go to: Introduction, Notation, Index
In all the expressions below, x is a vector of real or complex random
variables with whose mean vector and covariance matrix are given by:
E(x) = m and
Cov(x)=E((x-m)(x-m)H) = S.
Vectors and matrices a, A, b, B, c, C, d and D are constant
(i.e. not dependent on x).
- The covariance matrix S is Hermitian and positive semi-definite.
- S is strictly positive
definite unless there is a deterministic relation between the elements of
x of the form aHx = 0 for some non-zero
a.
- If the elements of x are uniformly spaced samples from a continuous
signal, then S is Toeplitz.
- The symmetric correlation coefficient matrix (also called
correlation matrix) is Corr(x) =
DIAG(S)-½ S
DIAG(S)-½.
WARNING: Correlation matrix is also used for the matrix
E(xxT) = S + mmT.
- The Correlation Coefficient between xi and
xj equals Corr(x)i,j =
E(xi
xj)/sqrt(E(xi2)E(xi2))
and has magnitude <= 1. [5.11]
- The precision matrix is T = S-1.
Special Distributions
The expressions for cubic and quartic expectations given below are
restricted to the following special distributions:
Independent
- [x:Real Independent] means that the components of x are real and
independent. In particular, we require that
E(xipxkq)=E(xip)E(xkq).
We define mr=E((x-m)r) where
the r’th power of the vector is elementwise. Note that
S=DIAG(m2) and
m1=0.
- [x:Real Gaussian] means that the
components of x[n] are Real and have a multivariate
Gaussian pdf: x ~ N(x ; m, S) =
|2×pi×S|-½exp(
-½(x-m)T S-1 (x-m) )
where S is symmetric and +ve semidefinite.
- ln(p(x)) = -½ ln(det(2pi×S)) -
½(x-m)T S-1 (x-m)
- If x is both Gaussian and Independent then
mr = diag(
(½S)½r
r! / (½r)!) for even r and 0 for odd
r.
- N(x ; m, S) = N(m ; x, S) =
|A| N(Ax+b ; Am+b, ASAT) for any
b and non-singular A.
- N(x ; m, S) = an
N(ax+b ; am+b, a2S)
for any b and non-zero a.
- N(x ; m, S) = N(-m ; -x, S) =
N(m ; x, S) = N(x+a ; m+a, S) for any
a.
- N(Ax; u, R) = N(x; m, S) ×
N(0; u, R) / N(0; m, S) where
S =
(ATR-1A)-1 and
m = SATR-1u for any
A (not necessarily square) with full column rank.
- If x ~ N(x; m, S) then
- y = Ax+b ~ N(y; Am+b,
ASAT) [5.10]
- y = ax+b ~ N(y;
am+b, a2S)
- y = F-T(x-m) ~ N(y;
0, I) where FTF= S. It is
not necessary for F to be symmetric but it can always be chosen to be
[see Hermitian].
- If SQ = QD with Q an orthonormal set of eigenvectors
and D diagonal the corresponding positive eigenvalues, then we can
define F= D½QT giving
F-T= QD-½.
- x | ATx=b ~ N(x;
(I-HAT)m+Hb,
(I-HAT)S) where
H=SA(ATSA)+ where
()+ denotes the pseudoinverse or, if non-singular, the
inverse. The symmetry of the covariance may be shown explicitly by writing
(I-HAT)S) = S -
SA(ATSA)+ATS.
[5.9]
- x | aTx=b ~ N(x,
m+(b -
aTm)(aTSa)-1×Sa,
S -
(aTSa)-1×SaaTS)
- Joint Distribution
If [x; y] ~ N([x; y]; [p; q],
[P RT; R Q]) then in the sections below,
we define the regression coefficient matrix of x on y as
F=RTQ-1 .
- Linear Sum
- z = Ax+By+c ~ N(z;
Ap+Bq+c, APAT +
BRAT +
ARTBT +
BQBT)
- Conditional Distruibutions
- x | y ~ N(x;
p+F(y-q), P - FR). [5.8]
- The mean, p+F(y-q), is the regression
function of x on y.
- The covariance, C = P - FR is is the Schur complement of Q in [P
RT; R Q].
- y | x ~ N(y;
q+RP-1(x-p), Q -
RP-1RT). The covariance is the
Schur complement of P in [P
RT; R Q].
- Independence: The following are equivalent:
- x and y are independent
- R = 0
- W = 0 in the precision matrix: T = S-1 = [P
RT; R Q]-1 = [U
WT; W V]
- N([x; y]; [p; q], [P
RT; R Q]) = N(x; p, P)
× N(y; q, Q)
- Multiple Correlation
Coefficients
The vector of multiple correlation coefficients between x and
y has the same dimension as x and is given by
gx|y = sqrt(diag(FR ÷
P) ) where the sqrt() function and ÷
are elementwise.
- The minimum (over A) of tr(Cov(x -
ATy)) is obtained when A =
FT and is equal to tr(P - FR).
- The maximum (over a) of the correlation between xi
and aTy is obtained when a =
(FT)i =
Q-1ri and is equal to
(gx|y)i. [5.13]
- All elements of gx|y lie in the range 0 to 1.
- diag(Cov(x|y) ÷ Cov(x)) =
diag((P - FR) ÷ P) = (1-
gx|y • gx|y) where •
and ÷ denote elementwise multiplication and division.
- Var(xi|y) =
(1-(gx|y)i2)
Var(xi) showing that conditioning reduces
variance.
- (gx|y)i2 = 1 -
pii-1 det([pii ,
riT; ri ,
Q]) det(Q)-1 [5.14]
- Precision Matrix:
The precision matrix T = S-1 = [P
RT; R Q]-1 = [U
WT; W V].
- If we write the and define, then [3.5]
- U = Cov(x | y)-1 = (P -
FR)-1
- W = -FTU
- V = Q-1+FTUF
- The elements of T are given by
- tii = Var(xi |
x\i) where x\i denotes the
vector x with xi deleted.
- tij = -Corr(xi, xj |
x\i,j)×(tii
tjj)½
- tij = 0 iff xi and xj
are conditionally independent given x\i,j.
- If I, J and K are a partitioning of the indices
1:n, then
- The submatrix TI,J = 0 iff
xI and xJ are conditionally
independent: p(xI, xJ
| xK) = p(xI |
xK) × p(xJ |
xK)
- Product of Gaussians:
N(x ; a, A) N(x ; b, B) = N(a
; b , A+B) × N(x ; c, C)
[5.1] where
- C = (A-1+B-1)-1 =
A(A+B)-1B =
B(A+B)-1A
- c =
C(A-1a+B-1b) =
A(A+B)-1b +
B(A+B)-1a
- Power of Gaussian:
- N(x ; a, A)m = N(0 ;
0,
m(2×pi)m-2Am-1) ×
N(x ; a, m-1A) [5.2]
- N(x ; a, A)2 = N(0 ; 0,
2A) × N(x ; a, ½A)
- Quotient of Gaussians:
N(x ; c, C) / N(x ; a, A) =
N(x ; b, B) / N(a ; b , A+B)
provided (A-C) is non-singular, where
- B = (C-1-A-1)-1
= A(A-C)-1C =
C(A-C)-1A
- b =
B(C-1c-A-1a) =
A(A-C)-1c -
C(A-C)-1a
- Characteristic Function and Generating Functions:
- The characteristic function of x is a function of the real
vector t and is phi(t) =
E(exp(jtTx)) =
E(cos(tTx)) + j
E(sin(tTx)) =
exp(jtTm -
½tTSt) where j=sqrt(-1).
- The moment generating function of x is a function of the real
vector t and is M(t) =
E(exp(tTx)) =
exp(tTm +
½tTSt). The moments of x
are the derivatives of M(t) evaluated at t=0.
- The cumulant generating function of x is a function of the
real vector t and is g(t) = ln
E(exp(tTx)) =
tTm +
½tTSt. The cumulants of x are the
derivatives of g(t) evaluated at t=0.
- Differential
Entropy
The differential entropy of x is h(x) =
-E{ln(p(x)} = ½ ln(det(2 pi e S)) nats =
½ log2(det(2 pi e S)) bits
- Cramer-Rao bound
Suppose
m[n#1] and S[n#n] are
functions of a parameter vector q[p#1] and that we
take k independent samples of x to form the columns of a data
matrix X[n#k]. In the expressions below,
¤ denotes the kroneker product, :
denotes vectorization and
dS/dq is a matrix of dimension
n2#p (see derivatives).
- ln(p(X)) = -½k ln(det(2pi×S)) -
½ tr((X-M)T S-1
(X-M)) where M[n#k] =
m×1[k#1]T.
- The Fisher Score vector, v, is defined by
vT = d/dq
(ln(p(X))
=
1[k#1]T(X-M)TS-1
dm/dq - ½(k S-1 -
S-1(X-M)(X-M)TS-1):T
dS/dq
=
1[k#1]T(X-M)TS-1
dm/dq - ½(k
S-1:T -
((X-M)(X-M)T):T
(S-1 ¤ S-1))
dS/dq [5.15]
- E(v) = 0
- [k=1] vT =
d/dq (ln(p(X)) =
(x-m)TS-1
dm/dq - ½(S-1 -
S-1(x-m)(x-m)TS-1):T
dS/dq
- The Fisher Information Matrix is defined by J =
E(vvT) = k
((dm/dq)T S-1
dm/dq +
½(dS/dq)T (S
¤ S)-1 dS/dq )
[5.16]
- The i,j element of J is given by Jij
= k ((dm/dqi)T
S-1 dm/dqj + ½
tr(RiS-1RjS-1))
where Ri satisfies Ri: =
dS/dqi
- Cramer-Rao bound: If f[r#1](X) is a
function of X with mean value g(q), then Cov(f)
>= dg/dq J-1
(dg/dq)T where >= represents
the Loewner partial order.
[5.17]
- If g(q) = aq then Cov(f) >=
a2 J-1
Definition: In this section, <=> represents the Complex-to-Real isomporphism and <->
represents the related vector mapping.
[x[n]:Complex Gaussian]
means that if x[n] <=>
y[2n] , then y ~ N(y ; a,
½K) for some complex
m[n] <-> a[2n] and +ve
definite hermitian S[n#n] <=>
K[2n#2n]. In other words, the real and
imaginary components of x are jointly gaussian with a symmetric
covariance matrix that lies in the range of the complex-to-real isomorphism.
- E(x) = m[n] <->
a[2n]
- Cov(x) =
E((x-m)(x-m)H) =
E(xxH) - mmH =
S[n#n] <=>
K[2n#2n] [5.3]
- N(y ; a, ½K) =
|pi×K|-½exp( -(y-a)T
K-1 (y-a) ) = |pi×S|-1exp(
-(x-m)H S-1 (x-m) )
- If S is diagonal (and hence also real) then N(x ; m,
S) = N(y ; a, ½K) =
N(|x-m| ; 0, ½S) ×
|pi×S|-½. Thus we can express a complex pdf as a
truncated real pdf of the same dimension if the components of x are
independent.
- K[2n#2n] may be divided into 2#2
toeplitz blocks of the form [a -b; b a]
(see Givens Rotation)
- All the 2#2 blocks of K that lie on the main diagonal are positive
multiples of I. That is, for each component of x, the real and
imaginary parts have the same variance and are uncorrelated.
- Distribution of Real and Imaginary
Parts:
- E(xR) =
mR
- E(xI) =
mI
- Cov(xR) = Cov(xI) =
E((xR-mR)(xR-mR)T)
=E((xI-mI)(xI-mI)T)
= ½SR
- Distribution of Magnitude:
We define s=sqrt(diag(S))
to be a positive real-valued vector of standard deviations and •,
÷ and •2 to be elementwise operators
- E(abs(x)) = ½ pi½ s•
exp(-abs(m÷s)•2)
• 1F1(1.5 , 1 ;
abs(m÷s)•2)
where
1F1(a,b;z)=M(a,b,z)
is the Confluent Hypergeometric or Kummer function (hypergeom.m in
MATLAB) [R.16]
- [m=0] E(abs(x))
= ½ pi½
s
- [m=0]
Cov(abs(x)) = ¼ pi ssT • (
2F1([-0.5,-0.5] , 1 ;
(ABS(S)÷ssT
)•2) - 1) where
2F1(a,b;z) is
the Confluent Hypergeometric function (hypergeom.m in MATLAB) [R.16]
Linear Expectations
- E(Ax + b) = Am + b
- E(Ax) = Am
- E(x + b) = m + b
- Cov(Ax + b) = ASAT
- E(tr(Y)) = tr(E(Y)) where Y depends on x.
Quadratic Expectations
- E((Ax + a)(Bx + b)H) =
ASBH + (Am+a)(Bm+b)H
- E(xxH) = S + mmH
- E(xaH x) = (S +
mmH)a
- E(xH axH) =
aH(S + mmH)
- E((Ax)(Ax)H) = A(S +
mmH)AH
- E((x + a)(x + a)H) = S +
(m+a)(m+a)H
- E((Ax+a)H (Bx+b)) =
tr(ASBH) + (Am+a)H
(Bm+b)
- E(xH x) = tr(S) +
mH m
- E(xHAx) = tr(AS) +
mHAm
- E((Ax)H (Ax)) =
tr(ASAH) + (Am)H
(Am)
- E((x+a)H (x+a)) = tr(S) +
(m+a)H (m+a)
- E((Ax + a) ¤ (Bx + b)) = (A ¤ B)
S: + (Am + a) ¤ (Bm + b)
For [x:Real Gaussian] :
- E(x • x) = diag(S) + m •
m = diag(S + m •
mT) [5.6]
- Cov(x • x) = 2 S • (S +
2mmT) [5.7]
For [x:Complex Gaussian] :
- E(xxT) = mmT
[5.3]
- E(x • xC) = diag(S) +
m • mC [5.4]
- Cov(x • xC) = E((x •
xC)(xT •
xH)) - E(x •
xC)E(x •
xC)T = S •
SC + 2(mmH •
ST)R [5.5]
Cubic Expectations
For [x:Real Independent] :
- E((Ax + a)(Bx + b)T (Cx + c)) = A
DIAG(BT C) m3 +
tr(BSCT)×(Am+a) +
ASCT (Bm+b) + (ASBT
+(Am+a)(Bm+b)T)(Cm+c)
- E(xxT x) = m3 +
2Sm + (tr(S)+ mT m)×m
- E((Ax + a)(Ax + a)T(Ax + a)) = A
DIAG(AT A) m3 +
(2ASAT +
(Am+a)(Am+a)T)(Am+a) +
tr(ASAT)×(Am+a)
- E((Ax + a)bT(Cx + c)(Dx
+ d)T ) =
(Am+a)bT(CSDT+(Cm+c)
(Dm+d)T) +
(ASCT+(Am+a)(Cm+c)T) b
(Dm+d)T + bT(Cm+c)*
(ASDT - (Am+a)(Dm+d)T)
-
E(xbTxxT)
= mbT(S+mmT) +
(S+mmT) bmT +
bTm* (S - mmT)
For [x:Real Gaussian] :
- E((Ax + a)(Bx + b)T(Cx + c)) =
ASBT(Cm+c) + ASCT(Bm+b)
+ tr(BSCT)×(Am+a) +
(Am+a)(Bm+b)T(Cm+c)
- E(xxTx) = 2Sm + (tr(S)+
mTm)×m
- E((Ax + a)(Ax + a)T(Ax + a)) =
(2ASAT +
(Am+a)(Am+a)T)(Am+a) +
tr(ASAT)×(Am+a)
Quartic Expectations
For [x:Independent] :
For [x:Real Gaussian] :
- E((Ax + a)(Bx + b)T(Cx + c) (Dx +
d)T) =
(ASBT+(Am+a)(Bm+b)T)(CSDT+(Cm+c)
(Dm+d)T) +
(ASCT+(Am+a)(Cm+c)T)(BSDT+(Bm+b)
(Dm+d)T) +
(Bm+b)T(Cm+c)×(ASDT
- (Am+a)(Dm+d)T) +
tr(BSCT)*(ASDT +
(Am+a)(Dm+d)T)
- E(xxTxxT) =
2(S+mmT)^2 +
mTm×(S - mmT)
+ tr(S)×(S + mmT)
- E(xxTAxxT) =
E((xTAx) * xxT)
=(S+mmT)(A+AT)(S+mmT)
+ mTAm * (S - mmT) +
tr(AS)×(S + mmT)
- E(xxTAxxT) = [m=0] SAS + SATS +
tr(AS)×S
- E((Ax + a)(Ax + a)T(Ax + a) (Ax +
a)T) =
2(ASAT+(Am+a)(Am+a)T)2
+
(Am+a)T(Am+a)×(ASAT
- (Am+a)(Am+a)T) +
tr(ASAT)×(ASAT
+ (Am+a)(Am+a)T)
- E((Ax + a)T(Bx + b) (Cx +
c)T(Dx + d)) =
tr(AS(CTD+DTC)SBT)
+ ((Am+a)TB +
(Bm+b)TA)S(CT(Dm+d) +
DT(Cm+c)) +
(tr(ASBT)+(Am+a)T(Bm+b))(tr(CSDT)+(Cm+c)T(Dm+d))
- E(xTxxTx) =
2tr(S2) + 4mTSm +
(tr(S) + mTm)2
- E(xTAxxTBx) =
tr(AS(B+BT)S) +
mT(A + AT)S(B +
BT)m +
(tr(AS)+mTAm)(tr(BS)+mTBm)
- E(xTAxxTBx) =
[m=0]
tr(AS(B+BT)S) +
tr(AS)×tr(BS)
-
E(aTxbTxcTxdTx)
=
(aT(S+mmT)b)(cT(S+mmT)d)+(aT(S+mmT)c)(bT(S+mmT)d)+(aT(S+mmT)d)(bT(S+mmT)c)-2aTmbTmcTmdTm
- E((Ax + a)T(Ax + a) (Ax +
a)T(Ax + a)) =
2tr(ASATASAT) +
4(Am+a)TASAT(Am+a) +
(tr(ASAT) +
(Am+a)T(Am+a))2
High Powers
For [x:Real Gaussian] :
- [n: odd]
E(prod(x[n]-m)) = 0. [5.18]
- [n:
even] E(prod(x[n]-m)) =
(½n)!-12-½n
sumv(sv(1),v(2)sv(3),v(4)...sv(n-1),v(n))
where the sum is over all n! permutations v of the numbers
1:n. [5.18]
Note that each term in the summation arises
(½n)!2½n times since the
½n factors sij can be rearranged in
(½n)! orders and for each factor sij =
sji since S is symmetric. Thus an equivalent formula
is to omit the normalizing factor,
(½n)!-12-½n, and restrict the
summation to all distinct pairings of the numbers 1:n. This is Wick's
theorem.
This page is part of The Matrix Reference
Manual. Copyright © 1998-2005 Mike Brookes, Imperial
College, London, UK. See the file gfl.html for copying
instructions. Please send any comments or suggestions to "mike.brookes" at
"imperial.ac.uk".
Updated: $Id: expect.html 2851 2013-03-27 09:11:24Z dmb $