# Stochastic Matrices

Go to: Introduction, Notation, Index

## Notation

In all the expressions below, x is a vector of real or complex random variables with whose mean vector and covariance matrix are given by: <x> = m and Cov(x)=<(x-m)(x-m)H> = S. We define the real-valued vector of variances s=diag(S).

•,  ÷, •2, ½, abs() and exp() are elementwise operators for multiplication, division, square, square root, absolute value and exponentiation. ⊗ denotes the Kroneker product and j dentotes (-1)½. <Y> denotes the expected value of Y. ||y|| = √(yHy) is the Euclidean vector norm and |Y| is the matrix determinant.

Vectors and matrices a, A, b, B, c, C, d and D are constant (i.e. not dependent on x).

## General Properties

• The covariance matrix S is Hermitian and positive semi-definite.
• S is strictly positive definite unless there is a deterministic relation between the elements of x of the form aHx = 0 for some non-zero a.
• If the elements of x are uniformly spaced samples from a continuous signal, then S is Toeplitz.
• The symmetric correlation coefficient matrix  (also called correlation matrix) is Corr(x) = S÷(ssT)½ = DIAG(s•-½) S DIAG(s•-½).
WARNING: Correlation matrix is also used by some to dentote the matrix <xxT> = S + mmT.
• The Correlation Coefficient between xi and xj equals Corr(x)i,j = <xi xj>/âˆš(<xi2><xi2>) and has magnitude <= 1.  [5.11]
• The precision matrix is T = S-1.

## Special Distributions

The expressions for cubic and quartic expectations given below are restricted to the following special distributions:

### Independent

• [x:Real Independent] means that the components of x are real and independent. In particular, we require that <xipxkq>=<xip><xkq>. We define mr=<(x-m)r> where the râ€™th power of the vector is elementwise. Note that S=DIAG(m2) and m1=0.

### Real Gaussian

• [x:Real Gaussian] means that the components of x[n] are Real and have a multivariate Gaussian pdf: x ~ N(x ; m, S) =  |2πS|exp( -½(x-m)T S-1 (x-m) ) where S is symmetric and +ve semidefinite.
• ln(p(x)) = -½ ln(det(2π×S)) - ½(x-m)T S-1 (x-m)
• If x is both Gaussian and Independent then mr = diag( (½S)½r r! / (½r)!) for even r and 0 for odd r.
• N(x ; m, S) = N(m ; x, S) =  |A| N(Ax+b ; Am+b, ASAT) for any b and non-singular A.
• N(x ; m, S) = an N(ax+b ; am+b, a2S)  for any b and non-zero a.
• N(x ; m, S) = N(-m ; -x, S) = N(m ; x, S) = N(x+a ; m+a, S) for any a.
• N(Ax+b; m, S) = N(x; u, C) × N(b; m, S) / N(0; u, C) where C = (ATS-1A)-1 and u = CATS-1(m-b) for any A (not necessarily square) with full column rank.
• N(ax+b; m, S) = N(x; u, c2) × N(b; m, S) / N(0; u, c2) where c2 = (aTS-1a)-1 and u = c2aTS-1(m-b)
• If x ~ N(x; m, S) then
• y = Ax+b  ~  N(y; Am+b, ASAT)   [5.10]
• y = ax+b  ~  N(y; am+b, a2S)
• y = F-T(x-m) ~ N(y; 0, I) where FTF= S. It is not necessary for F to be symmetric but it can always be chosen to be [see Hermitian].
• If SQ = QD with Q an orthonormal set of eigenvectors and D diagonal the corresponding positive eigenvalues, then we can define F= D½QT giving F-T= QD.
• x | ATx=b ~ N(x; (I-HAT)m+Hb, (I-HAT)S) where H=SA(ATSA)+ where ()+ denotes the pseudoinverse or, if non-singular, the inverse. The symmetry of the covariance may be shown explicitly by writing (I-HAT)S) = S - SA(ATSA)+ATS.   [5.9]
• x | aTx=b ~ N(x; m+(b - aTm)(aTSa)-1×Sa, S - (aTSa)-1×SaaTS)
• Orthant Probabilities
• Define P(x>m) to be the probability that all elements of x exceed the corresponding element of m and define si,j and si to be elements of S and s respectively. The following formulae do not generalize to n>3:
• [n=2]: P(x>m) = 0.25+(2π)-1arcsin(s1,2(s1s2))
• [n=3]: P(x>m) = 0.125+(4π)-1(arcsin(s1,2(s1s2)) + arcsin(s1,3(s1s3)) + arcsin(s2,3(s2s3)))
• Joint Distribution
If [x; y] ~ N([x; y]; [p; q], [P RT; R Q]) then in the sections below, we define the regression coefficient matrix of x on y as F=RTQ-1 .
• Linear Sum
• z = Ax+By+c ~ N(z; Ap+Bq+c, APAT + BRAT + ARTBT + BQBT)
• Conditional Distributions
• x | y ~ N(x; p+F(y-q), P - FR).  [5.8]
• The mean, p+F(y-q), is the regression function of x on y.
• The covariance, C = P - FR is is the Schur complement of Q in [P RT; R Q].
• y | x ~ N(y; q+RP-1(x-p), Q - RP-1RT). The covariance is the Schur complement of P in [P RT; R Q].
• Independence: The following are equivalent:
1. x and y are independent
2. R = 0
3. W = 0 in the precision matrix: T = S-1 = [P RT; R Q]-1 = [U WT; W V]
4. N([x; y]; [p; q], [P RT; R Q]) = N(x; p, P) × N(y; q, Q)
• Multiple Correlation Coefficients
The vector of multiple correlation coefficients between x and y has the same dimension as x and is given by gx|y = âˆš(diag(FR ÷ P) ) where the âˆš() function and  ÷  are elementwise.
• The minimum (over A) of tr(Cov(x - ATy)) is obtained when A = FT and is equal to tr(P - FR).
• The maximum (over a) of the correlation between xi and aTy is obtained when a = (FT)i = Q-1ri and is equal to (gx|y)i.   [5.13]
• All elements of gx|y lie in the range 0 to 1.
• diag(Cov(x|y) ÷ Cov(x)) = diag((P - FR) ÷ P) = (1- gx|ygx|y) where  • and  ÷ denote elementwise multiplication and division.
• Var(xi|y) = (1-(gx|y)i2) Var(xi) showing that conditioning reduces variance.
• (gx|y)i2 = 1 - pii-1 det([pii , riT; ri , Q]) det(Q)-1  [5.14]
• Precision Matrix:
The precision matrix T = S-1 = [P RT; R Q]-1 = [U WT; W V].
• If we write the and define, then   [3.5]
• U = Cov(x | y)-1 = (P - FR)-1
• W = -FTU
• V = Q-1+FTUF
• The elements of T are given by
• tii = Var(xi | x\i) where x\i denotes the vector x with xi deleted.
• tij = -Corr(xi, xj | x\i,j)×(tii tjj)½
• tij = 0 iff xi and xj are conditionally independent given x\i,j.
• If I, J and K are a partitioning of the indices 1:n, then
• The submatrix TI,J = 0 iff xI and xJ are conditionally independent: p(xI, xJ | xK) = p(xI | xK) × p(xJ | xK)
• Product of Gaussians:
• N(x ; a, A) N(x ; b, B) = N(a ; b , A+B) × N(x ; c, C) [5.1] where
• C = (A-1+B-1)-1 = A(A+B)-1B = B(A+B)-1A = (I - K)A
• c = C(A-1a+B-1b) = A(A+B)-1b + B(A+B)-1a = a + K(b - a)
• K = CB-1 = A(A+B)-1 called the Kalman Gain in a Kalman filer.
• Power of Gaussian:
• N(x ; a, A)m = N(0 ; 0, m(2×π)m-2Am-1) × N(x ; a, m-1A) [5.2]
• N(x ; a, A)2 = N(0 ; 0, 2A) × N(x ; a, ½A)
• Quotient of Gaussians:
• N(x ; c, C) / N(x ; a, A) = N(x ; b, B) / N(a ; b , A+B) provided (A-C) is non-singular, where
• B = (C-1-A-1)-1 = A(A-C)-1C = C(A-C)-1A
• b = B(C-1c-A-1a) = A(A-C)-1- C(A-C)-1a
• Convolution of Gaussians
• N(x ; a, A) *N(x ; b, B) =  N(t ; a, A) N(x-t ; b, B)dt = N(x ; a+b, A+B)
• Exponential times Gaussian
• exp(aTx) N(x; m, S) = exp(½aT(2m+Sa)) N(x; m+Sa, S).
• Truncated Gaussian
• Suppose y ~ kN(y; m, S) is restricted to the domain satisfying b< aTy <c with k a normalizing constant. Then E(y) = m+grSa and Cov(y) = S - g2vSa(Sa)T where k=1/(F(q)-F(p)), g=1/âˆš(aTSa), p=g(b-aTm), q=g(c-aTm), r= (f(p)-f(q))/(F(q)-F(p)), v=(q f(q) - p f(p))/(F(q) - F(p)) + r2  and where f(q) and F(q) are the pdf and cdf respectively of a standard 1-dimensional Gaussian with f(q)=dF/dq. [5.22]
• Suppose y ~ kN(y; m, S) is restricted to the domain satisfying aTy <c with k a normalizing constant. Then E(y) = m+grSa and Cov(y) = S - g2vSa(Sa)T where k=1/F(q), g=1/âˆš(aTSa), q=g(c-aTm), r= -f(q)/F(q), vr2 - rq.
• Characteristic Function and Generating Functions:
• The characteristic function of x is a function of the real vector t and is ÃƒÂ¸(t) = <exp(jtTx)> = <cos(tTx)) + j <sin(tTx)> = exp(jtTm - ½tTSt) where j=âˆš(-1).
• The moment generating function of x is a function of the real vector t and is M(t) = <exp(tTx)> = exp(tTm + ½tTSt).  The moments of x are the derivatives of M(t) evaluated at t=0.
• The cumulant generating function of x is a function of the real vector t and is g(t) = ln <exp(tTx)> = tTm + ½tTSt. The cumulants of x are the derivatives of g(t) evaluated at t=0.
• Exponential of Gaussian (Lognormal distribution):
The random vector exp(Ax+b), where A and b may be complex and exp() operates elementwise, follows a multivariate lognormal distribution with the following means and covariances:
• <exp(Ax+b)> = exp(Am + b + ½ diag(ASAT))
• <exp(aTx+b)> = exp(aTm + b + ½aTSa)
• <exp(ax+b)> = exp(am + b + ½a2s) where s=diag(S)
• <exp(jx)> = exp(jm - ½s) where s=diag(S)
• Cov(exp(Ax+b)) = (<exp(Ax+b)>  <exp(Ax+b))H > (exp(ASAH) - 1) = (exp(Am + b + ½ diag(ASAT))  exp(Am + b + ½ diag(ASAT))H ) (exp(ASAH) - 1)
• Cov(exp(aTx+b)) = (<exp(aTx+b))  <exp(aTx+b))H > (exp(aHSa) - 1) = (exp(aTm + b + ½ diag(aTSa))  exp(aTm + b + ½ diag(aTSa))H ) (exp(aHSa) - 1)
• Cov(exp(ax+b)) = (<exp(ax+b)>  <exp(ax+b)>H ) (exp(|a|2S) - 1) = (exp(am + b +  ½a2s)  exp(am + b +  ½a2s)H ) (exp(|a|2S) - 1)
• Cov(exp(jx)) = (<exp(jx)>  <exp(jx)>H ) (exp(S) - 1) = (exp(jm - ½s)  exp(jm - ½s)H ) (exp(S) - 1)
• Entropy
• The differential entropy of x is h(x) =  -E{ln(p(x)} = ½ ln(|2 π e S|) nats = ½ log2(|2 π e S|) bits.
• Mutual Information
• If u and v with covariance matrices P and Q respectively are independent of each other and of x then
I(Ax+u, Bx+v) = ½ log2(|ASAT+P|×|BSBT+Q|/|[ASAT+P ASBT; BSAT BSBT+Q]|) = -½ log2(det(I - (BSBT+Q)-1BSAT(ASAT+P)-1ASBT)) bits
• I(Ax+u, bTx+v) = ½ log2(det(ASAT+P)(bTSb+q)/det([ASAT+P ASb; bTSAT bTSb+q])) = -½ log2(1 - bTSAT(ASAT+P)-1ASb / (bTSb+q)) bits
• I(Ax, Bx) = ½ log2(det(ASAT)det(BSBT)/det([A; B] S [A; B]T)) = -½ log2(det(I - (BSBT)-1BSAT(ASAT)-1ASBT)) bits.
• Cramer-Rao bound
Suppose m[n#1] and S[n#n] are functions of a parameter vector q[p#1] and that we take k independent samples of x to form the columns of a data matrix X[n#k]. In the expressions below,  ⊗ denotes the kroneker product, : denotes vectorization and dS/dq is a matrix of dimension n2#p (see derivatives).
• ln(p(X)) = -½k ln(det(2π×S)) - ½ tr((X-M)T S-1 (X-M))  where  M[n#k] = m×1[k#1]T.
• The Fisher Score vector, v, is defined by vT = d/dq (ln(p(X))
= 1[k#1]T(X-M)TS-1 dm/dq - ½(k S-1 - S-1(X-M)(X-M)TS-1):T dS/dq
= 1[k#1]T(X-M)TS-1 dm/dq - ½(k S-1:T - ((X-M)(X-M)T):T (S-1S-1)) dS/dq  [5.15]
• <v>= 0
• [k=1] vT = d/dq (ln(p(X)) =  (x-m)TS-1 dm/dq - ½(S-1 - S-1(x-m)(x-m)TS-1):T dS/dq
• The Fisher Information Matrix is defined by J = <vvT> = k ((dm/dq)T S-1 dm/dq + ½(dS/dq)T (SS)-1 dS/dq ) [5.16]
• The i,j element of J is given by Jijk ((dm/dqi)T S-1 dm/dqj + ½ tr(RiS-1RjS-1)) where Ri satisfies Ri: = dS/dqi
• Cramer-Rao bound: If f[r#1](X) is a function of X with mean value g(q), then Cov(f) >= dg/dq J-1 (dg/dq)T where >= represents the Loewner partial order[5.17]
• If g(q) = aq then Cov(f) >= a2 J-1

### Complex Gaussian

Definition: In this section, <=> represents the  Complex-to-Real isomporphism in which we replace each complex element, z, of a complex matrix C by a 2#2 real matrix [zR -zI; zI zR]=|z|×[cos(t) -sin(t); sin(t) cos(t)] where t=arg(z).  <-> represents the corresponding vector mapping in which we replace each complex element, z, of a complex vector c by a 2#1 real vector [zR; zI ].

[x[n]:Complex Gaussian] means that if x[n] <-> y[2n] , then y ~ N(y ; a, ½K) for some complex m[n] <-> a[2n] and +ve definite hermitian S[n#n] <=> K[2n#2n]. In other words, the real and imaginary components of x are jointly gaussian with a symmetric covariance matrix that lies in the range of the complex-to-real isomorphism.
• <x> = m[n] <-> a[2n]
• Cov(x) = <(x-m)(x-m)H> = <xxH> - mmH = S[n#n] <=> K[2n#2n]  [5.3]
• N(y ; a, ½K) = |π×K|exp( -(y-a)T K-1 (y-a) ) = |π×S|-1exp( -(x-m)H S-1 (x-m) )
• If S is diagonal (and hence also real) then N(x ; m, S) =  N(y ; a, ½K) =  N(|x-m| ; 0, ½S) × |π×S|. Thus we can express a complex pdf as a truncated real pdf of the same dimension if the components of x are independent.
• K[2n#2n] may be divided into 2#2  toeplitz blocks of the form [a -b; b a] (see Givens Rotation) where the corresponding element of S is a+jb
• All the 2#2 blocks of K that lie on the main diagonal are positive multiples of I. That is, for each component of x, the real and imaginary parts have the same variance and are uncorrelated.
• tr(K) = 2 tr(S)
• Distribution of Real and Imaginary Parts:
• <xR> = mR
• <xI> = mI
• Cov(xR) = Cov(xI) = <(xR-mR)(xR-mR)T> =<(xI-mI)(xI-mI)T) = ½SR
• <(xI-mI)(xR-mR)T> = -<(xR-mR)(xI-mI)T> = ½SI

In the following sections, we define  d=s½to be a positive real-valued vector of standard deviations. The function 1F1(a,b;z)=M(a,b,z) is the Confluent Hypergeometric or Kummer function, hypergeom(a,b,z) in MATLAB. The function 2F1(a,b;z) is the Confluent Hypergeometric function, hypergeom(a,b,z) in MATLAB.

• Magnitude Vector:  r=abs(x)
• <r> = ½ π½ d• exp(-abs(m÷d)2) 1F1(1.5 , 1 ; abs(m÷d)2)  [R.16 (4.1)]
• [m=0] <r> = ½ π½ d
• diag(COV(r))= s + abs(m)2 - ¼ π s • exp(-2abs(m÷d)2) 1F1(1.5 , 1 ; abs(m÷d)2)2
• [m=0] COV(r) = ¼ π (ddT 2F1([-0.5,-0.5] , 1 ; (ABS(SddT )•2) - ddT) [R.16 (4.19)]
• [m=0] diag(COV(r)) = (1-¼ π)s
• Magnitude-Squared Vector:  v=abs(x)2
• <v> =  s + abs(m)2
• [m=0] <v> =  s
• COV(v) = ABS(S+mmH)•2 - ABS(mmH)•2  [R.16 (2.13)]
• [m=0] COV(v) = ABS(S)•2

## Linear Expectations

• <Ax + b> = Am + b
• <Ax> = Am
• <x + b> = m + b
• Cov(Ax + b) = ASAT
• <tr(Y)> = tr(<Y>) where Y depends on x.

For [x: Real Gaussian] :

• <exp(jx)> = exp(jm - ½s)
• [m=0] <exp(jx)> = exp( - ½s)

• <(Ax + a)(Bx + b)H> = ASBH + (Am+a)(Bm+b)H   [5.23]
• <xxH> = S + mmH
• <xaH x> = (S + mmH)a
• <xH axH> = aH(S + mmH)
• <(Ax)(Ax)H> = A(S + mmH)AH
• <(x + a)(x + a)H> = S + (m+a)(m+a)H
• <(Ax+a)H (Bx+b)> = <tr((Bx+b)(Ax+a)H )> = tr(BSAH) + (Am+a)H (Bm+b)  [5.24]
• <xH x> = <tr(xxH)> = tr(S) + mH m
• <xHAx> = tr(AS) + mHAm
• <(Ax)H (Ax)> = <tr(AxxHAH)> tr(ASAH) + (Am)H (Am)
• <(x+a)H (x+a)> = tr(S) + (m+a)H (m+a)
• <(Ax + a)C ⊗ (Bx + b)> = <(Bx + b)(Ax + a)H): = (ACB) S: + (Am + a)C ⊗ (Bm + b)
• <xCx> = <xxH) = S:

For [x[n]: Real Gaussian] :

• <xx> = diag(S) + mm  = diag(S + mmT)    [5.6]
• Cov(xx) = 2 S • (S + 2mmT)   [5.7]
• <(Ax + a) ⊗ (Bx + b)> = (AB) S: + (Am + a) ⊗ (Bm + b)
• <xx> = S:
• <exp(jx)exp(jx)H> = (exp(jm - ½s) exp(jm - ½s)H) exp(S)
• [m=0] <exp(jx)exp(jx)H> = (exp(-½s) exp(-½s)H) exp(S) = exp(S - ½(1sT + s1T))
• <exp(jx)HAexp(jx)> =exp(jm - ½s)H (A• exp(S))exp(jm - ½s)
• <exp(jx)Hexp(jx)> = n

For [x: Complex Gaussian] :

• <xxT> = mmT  [5.3]
• <xxC> = diag(S) + mm [5.4]
• Cov(xxC) = <(xxC)(xTxH)> - <xxC><xxC>T = SSC + 2(mmHST) [5.5]

## Minimizing Quadratic Expectations

• argminK<||(AKB+C)x||2> = -(AHA)-1AHC(S+mmH)BH(B(S+mmH)BH)-1    [5.25]
• [m=0] argminK<||(AKB+C)x||2> = -(AHA)-1AHCSBH(BSBH)-1
• [m=0] argminK<||(KB+C)x||2> = -CSBH(BSBH)-1
• [x, y independent, zero-mean] argminK<||(AKB+C)x + (AKE+F)y||2> =  -(AHA)-1AH(CSxBH+FSyEH)(BSxBH+ESyEH)-1    [5.26]
• [x, y independent, zero-mean] argminK<||(KB+C)x + (KE+F)y||2> = -(CSxBH+FSyEH)(BSxBH+ESyEH)-1
• [x, y independent, zero-mean] argminK<||(KB+C)x + Ky||2> = -CSxBH(BSxBH+SyE)-1

## Cubic Expectations

For [x:Real Independent] :

For [x: Real Gaussian] :

• <(Ax + a)(Bx + b)T(Cx + c)> = ASBT(Cm+c) + ASCT(Bm+b) + tr(BSCT)×(Am+a) + (Am+a)(Bm+b)T(Cm+c)
• <xxTx> = <xTxxT>T = 2Sm + (tr(S)+ mTm)×m
• <(Ax + a)(Ax + a)T(Ax + a)> = (2ASAT + (Am+a)(Am+a)T)(Am+a) + tr(ASAT)×(Am+a)

## Quartic Expectations

For [x: Real Gaussian] :

• <(Ax + a)(Bx + b)T(Cx + c) (Dx + d)T> = (ASBT+(Am+a)(Bm+b)T)(CSDT+(Cm+c) (Dm+d)T) + (ASCT+(Am+a)(Cm+c)T)(BSDT+(Bm+b) (Dm+d)T) + (Bm+b)T(Cm+c)×(ASDT - (Am+a)(Dm+d)T) + tr(BSCT)×(ASDT + (Am+a)(Dm+d)T)  [5.27]
• <(Ax + a)(Bx + b)T(Cx + c) (Dx + d)T> =  [m=0]  (ASBT+abT)(CSDT+cdT) + (ASCT+acT)(BSDT+bdT) + bTc×(ASDT - adT) + tr(BSCT)×(ASDT + adT)
• <xxTAxxT> = <(xTAx) × xxT> =(S+mmT)(A+AT)(S+mmT) + mTAm * (S - mmT) +  tr(AS)×(S + mmT)
• <xxTxxT> = 2(S+mmT)2 +  (tr(S)+mTm)×(S - mmT)
• [m=0] <xxTAxxT> = S(A+AT)S  + tr(ASS
• [m=0] <xxTxxT> = 2S2  + tr(SS
• <(Ax + a)(Ax + a)T(Ax + a) (Ax + a)T> = 2(ASAT+(Am+a)(Am+a)T)2 + (Am+a)T(Am+a(ASAT - (Am+a)(Am+a)T) + tr(ASAT)×(ASAT + (Am+a)(Am+a)T)
• <(Ax + a)T(Bx + b) (Cx + c)T(Dx + d)> = tr(AS(CTD+DTC)SBT) + ((Am+a)TB + (Bm+b)TA)S(CT(Dm+d) + DT(Cm+c)) + (tr(ASBT)+(Am+a)T(Bm+b))(tr(CSDT)+(Cm+c)T(Dm+d)) [5.28]
• [m=0] <(Ax + a)T(Bx + b) (Cx + c)T(Dx + d)> = tr(AS(CTD+DTC)SBT) + (aTB + bTA)S(CTd + DTc) + (tr(ASBT)+aTb)(tr(CSDT)+cTd)
• <xTxxTx>= 2tr(S2) + 4mTSm + (tr(S) + mTm)2
• [m=0] <xTxxTx> =  2tr(S2) + (tr(S))2
• <xTAxxTBx> = tr(AS(B+BT)S) + mT(A + AT)S(B + BT)m + (tr(AS)+mTAm)(tr(BS)+mTBm)
•  [m=0] <xTAxxTBx> = tr(AS(B+BT)S) + tr(AS)×tr(BS)
• <aTxbTxcTxdTx> = (aT(S+mmT)b)(cT(S+mmT)d)+(aT(S+mmT)c)(bT(S+mmT)d)+(aT(S+mmT)d)(bT(S+mmT)c)-2aTmbTmcTmdTm
• <(Ax + a)T(Ax + a) (Ax + a)T(Ax + a)> = 2tr(ASATASAT) + 4(Am+a)TASAT(Am+a) + (tr(ASAT) + (Am+a)T(Am+a))2
• <exp(jx)Hexp(jx)exp(jx)Hexp(jx)>= n2

For [x: Complex Gaussian] :

• <(Ax + a)(Bx + b)H(Cx + c) (Dx + d)H> =  (ASBH+(Am + a)(Bm + b)H)(CSDH+(Cm + c)(Dm + d)H) + (Bm + b)H(Cm + c)ASDH  + tr(CSBH)×(ASDH + (Am + a)(Dm + d)H)
•  [m=0] <(Ax + a)(Bx + b)H(Cx + c) (Dx + d)H> =  (ASBH+abH)(CSDH+cdH) + bHcASDH  + tr(CSBH)×(ASDH + adH)
•  [m=0] <xxHAxxH> = SAS +tr(ASS
•  [m=0] <xxHxxH> = S2 +tr(SS
• <(Ax + a)H(Bx + b) (Cx + c)H(Dx + d)> = <tr((Bx + b)(Ax + a)H)tr((Dx + d)(Cx + c)H)>= tr(AHBSCHDS) + (Cm + c)HDSAH(Bm + b) +(Am + a)HBSCH(Dm + d) + (tr(BSAH)+(Am + a)H(Bm + b))(tr(DSCH)+(Cm + c)H(Dm + d))
• [m=0] <(Ax + a)H(Bx + b) (Cx + c)H(Dx + d)> = tr(AHBSCHDS) + cHDSAHb +aHBSCHd + (tr(BSAH)+aHb)(tr(DSCH)+cHd)
• <tr2((Bx + b)(Ax + a)H)>= tr((BSAH)2) + 2(Am + a)HBSAH(Bm + b) + (tr(BSAH)+(Am + a)H(Bm + b))2
• <xHxxHx> = <tr2(xxH))> = tr(S2) + 2mHSm + (tr(S)+ mHm)2
• [m=0] <xHxxHx> = <tr2(xxH))> =  tr(S2) +(tr(S))2

## Quintic Expressions

For [x: Real Gaussian] :

• <(Ax + a)(Bx + b)T(Cx + c) (Dx + d)T(Ex + e)> =  (ASBT+(Am+a)(Bm+b)T)CSET(Dm+d) + (ASBT+(Am+a)(Bm+b)T)CSDT(Em+e) + (ASBT+(Am+a)(Bm+b)T)(Cm+c)(Dm+d)T(Em+e)+ ASCTBSET(Dm+d) + ASCT(BSDT+(Bm+b)(Dm+d)T)(Em+e)+ ASDT(ESBT+(Em+e)(Bm+b)T)(Cm+c) + ASDTESCT(Bm+b) + (ASET+(Am+a)(Em+e)T)DSBT(Cm+c)+ ASETDSCT(Bm+b) + ASET(Dm+d)(Cm+c)T(Bm+b)+ tr(BTCSETDS) (Am+a) + tr(BT(CSDT+(Cm+c)(Dm+d)T)ES) (Am+a)+ tr(BTCS) (ASDT+(Am+a)(Dm+d)T)(Em+e) + tr(BTCS) ASET(Dm+d)+ tr(DTES) (ASBT+(Am+a)(Bm+b)T)(Cm+c) + tr(DTES) ASCT(Bm+b) + tr(BTCS) tr(DTES) (Am+a)
• <(Ax )(Bx )T(Cx ) (Dx )T(Ex)> =  A(S+mmT)BTCSETDm + A(S+mmT)BTCSDTEm + A(S+mmT)BTCmmTDTEm+ ASCTBSETDm + ASCTB(S+mmT)DTEm+ ASDTE(S+mmT)BTCm + ASDTESCTBm + A(S+mmT)ETDSBTCm+ ASETDSCTBm+ ASETDmmTCTBm+ tr(BTCSETDS)Am + tr(BT(C(S+mmT)DT)ES)Am+ tr(BTCS) A(S+mmT)DTEm+ tr(BTCS) ASETDm+ tr(DTES) A(S+mmT)BTCm + tr(DTES) ASCTBm + tr(BTCS) tr(DTES)Am
• <xxTxxTx> = (8S2 + 4SmmT + 3mmTS  + (mmT)2+   2tr(S) (2S+mmT) ) m + tr(2S2+mmTS)+ tr(S)2) m

## Recursion Formulae for High Powers

For [x: Real Gaussian] :

• Define Yn=<(xxT)n> then Zn=<(xTx)n> = tr(Yn)
• Y1 = S+mmT
• [m=0] Yn+1= tr(Yn)S+2nSYn  [5.19]
• [m=0] Y1=S
• [m=0] Z1=tr(S)
• [m=0] Y2=2S2+tr(SS
• [m=0] Z2=2tr(S2)+tr2(S)
• [m=0] Y3=8S3+4tr(S)×S2+(2tr(S2)+tr2(S))×S
• [m=0] Z3=8tr(S3)+6tr(S2)tr(S)+tr3(S)
• [m=0] Y4=48S4+24tr(S)×S3+(12tr(S2)+6tr2(S))×S2+(8tr(S3)+6tr(S2)tr(S)+tr3(S))×S
• [m=0] Z4=48tr(S4)+32tr(S3)tr(S)+12tr2(S2)+12tr(S2)tr2(S)+tr4(S)

For [x: Complex Gaussian] :

• Define Yn=<(xxH)n> then Zn=<(xHx)n> = tr(Yn)
• Y1 = S+mmH
• [m=0] Yn+1= tr(Yn)S+nSYn  [5.20]
• [m=0] Y1=S
• [m=0] Z1=tr(S)
• [m=0] Y2=S2+tr(SS
• [m=0] Z2=tr(S2)+tr2(S)
• [m=0] Y3=4S3+4tr(S)×S2+(tr(S2)+tr2(S))×S
• [m=0] Z3=4tr(S3)+5tr(S2)tr(S)+tr3(S)
• [m=0] Y4=12S4+12tr(S)×S3+(3tr(S2)+3tr2(S))×S2+(4tr(S3)+5tr(S2)tr(S)+tr3(S))×S
• [m=0] Z4=12tr(S4)+16tr(S3)tr(S)+3tr2(S2)+8tr(S2)tr2(S)+tr4(S)

## Product of Vector Elements

For [x: Real Gaussian] :

• [n: odd]  <prod(x[n]-m)> = 0.   [5.18]
• [n: even]  <prod(x[n]-m)> = (½n)!-12n sumv(sv(1),v(2)sv(3),v(4)...sv(n-1),v(n)) where the sum is over all n! permutations v of the numbers 1:n.   [5.18]
Note that each term in the summation arises (½n)!2½n times since the ½n factors sij can be rearranged in (½n)! orders and for each factor sij = sji since S is symmetric. Thus an equivalent formula is to omit the normalizing factor,  (½n)!-12n, and restrict the summation to all distinct pairings of the numbers 1:n. This is known as Isserlis' theorem or Wick's theorem.

This page is part of The Matrix Reference Manual. Copyright © 1998-2019 Mike Brookes, Imperial College, London, UK. See the file gfl.html for copying instructions. Please send any comments or suggestions to "mike.brookes" at "imperial.ac.uk".
Updated: \$Id: expect.html 11011 2018-12-03 09:59:00Z dmb \$