5.1 
N(x ; a,
A) N(x ; b, B) = N(a ; b ,
A+B) × N(x ; c, C) where C =
(A^{1}+B^{1})^{1} =
A(A+B)^{1}B =
B(A+B)^{1}A and c =
C(A^{1}a+B^{1}b) =
A(A+B)^{1}b +
B(A+B)^{1}a
 We assume without comment below that A, B, C and their
sums and inverses are all symmetric matrices.
 First we define C =
(A^{1}+B^{1})^{1} =
A(BA^{1}A+BB^{1}A)^{1}B
= A(B+A)^{1}B =
A(A+B)^{1}B = (similarly)
B(A+B)^{1}A.
 We also define c =
C(A^{1}a+B^{1}b) =
B(A+B)^{1}A
A^{1}a+A(A+B)^{1}B
B^{1}b= A(A+B)^{1}b +
B(A+B)^{1}a
 We see that c^{T}C^{1}c =
(a^{T}A^{1}+b^{T}B^{1})CC^{1}C(A^{1}a+B^{1}b)
=
(a^{T}A^{1}+b^{T}B^{1})C(A^{1}a+B^{1}b)
=
a^{T}A^{1}CA^{1}a
+
2a^{T}A^{1}CB^{1}b
+b^{T}B^{1}CB^{1}b
=
a^{T}(A+B)^{1}BA^{1}a
+
2a^{T}A^{1}A(A+B)^{1}BB^{1}b
+b^{T}(A+B)^{1}AB^{1}b
substituting for C
=
a^{T}(A+B)^{1}((A+B)A^{1}I)
a + 2a^{T}(A+B)^{1}b
+b^{T}(A+B)^{1}((A+B)B^{1}I)
b
= a^{T}(A^{1} (A+B)^{1})
a + 2a^{T}(A+B)^{1}b
+b^{T}(B^{1}(A+B)^{1})
b
= a^{T}A^{1}a
+b^{T}B^{1}b
(a^{T}(A+B)^{1}a 
2a^{T}(A+B)^{1}b
+b^{T}(A+B)^{1}b)
= a^{T}A^{1}a
+b^{T}B^{1}b
(ab)^{T}(A+B)^{1}(ab)
 Hence (xc)^{T}C^{1}(xc) =
x^{T}C^{1}x
2x^{T}C^{1}c +
c^{T}C^{1}c
=
x^{T}(A^{1}+B^{1})x
2x^{T}C^{1}C(A^{1}a+B^{1}b)
+ a^{T}A^{1}a
+b^{T}B^{1}b
(ab)^{T}(A+B)^{1}(ab)
= x^{T}A^{1}x +
x^{T}B^{1}x 
2x^{T}A^{1}a 
2x^{T}B^{1}b +
a^{T}A^{1}a +
b^{T}B^{1}b 
(ab)^{T}(A+B)^{1}(ab)
= (xa)^{T}A^{1}(xa) +
(xb)^{T}B^{1}(xb) 
(ab)^{T}(A+B)^{1}(ab)
 It follows that N(a ; b , A+B) ×
N(x ; c, C) = det(2 pi
(A+B))^{½}

exp((ab)^{T}(A+B)^{1}(ab))
× det(2 pi C)^{½}
exp((xc)^{T}C^{1}(xc))
= det(4 pi^{2} (A+B)C)^{½}
exp((xc)^{T}C^{1}(xc) +
(ab)^{T}(A+B)^{1}(ab))
multiplying determinants and adding exponents
= det(4 pi^{2}
(A+B)A(A+B)^{1}B)^{½}
exp((xa)^{T}A^{1}(xa) +
(xb)^{T}B^{1}(xb))
= det(2 pi A)^{½}
exp((xa)^{T}A^{1}(xa)) ×
det(2 pi B)^{½}
exp((xb)^{T}B^{1}(xb))
= N(x ; a, A) N(x ; b, B)

5.2 
N(x ; a, A)^{m} =
N(0 ; 0,
m(2×pi)^{m2}A^{m1}) ×
N(x ; a, m^{1}A)
We prove this by induction. For m=1, we have N(x ; a,
A)^{1} = N(0 ; 0,
(2×pi)^{1}I) × N(x ; a, A)
which is true.
 N(x ; a, A)^{m} = N(x ;
a, A) × N(x ; a,
A)^{m1}
= N(x ; a, A) × N(0 ; 0,
(m1)(2×pi)^{m3}A^{m2})
× N(x ; a, (m1)^{1}A)
= N(0 ; 0,
(m1)(2×pi)^{m3}A^{m2})
× N(x ; a, A) × N(x ; a,
(m1)^{1}A)
= N(0 ; 0,
(m1)(2×pi)^{m3}A^{m2})
× N(a ; a, (1+(m1)^{1})A) ×
N(x ; c, C) where from [5.1] C =
(A^{1}+(m1)A^{1})^{1} =
(mA^{1})^{1} =
m^{1}A and c =
C(A^{1}a+(m1)A^{1}a)
= mCA^{1}a = a
 So N(x ; a, A)^{m} = N(0 ;
0,
(m1)(2×pi)^{m3}A^{m2})
× N(a ; a, (1+(m1)^{1})A) ×
N(x ; a, m^{1}A)
= N(0 ; 0, 2 × pi ×
(m1)(2×pi)^{m3}A^{m2}×
(1+(m1)^{1})A) × N(x ; a,
m^{1}A)
= N(0 ; 0,
(m1)(1+(m1)^{1})(2×pi)^{m2}A^{m1})
× N(x ; a, m^{1}A)
= N(0 ; 0,
m(2×pi)^{m2}A^{m1})
× N(x ; a, m^{1}A)

5.3 
If
x_{[n]} has a complex Gaussian distribution then
Cov(x) = E(xx^{H})  mm^{H}
= S_{[n#n]} <=>
K_{[2n#2n]} and E(xx^{T}) =
mm^{T}
 From the definition of a complex
gaussian we have
 E(x_{i}^{R}x_{k}^{R})
= ½s_{i,k}^{R} +
m_{i}^{R}m_{k}^{R}
 E(x_{i}^{I}x_{k}^{I})
= ½s_{i,k}^{R} +
m_{i}^{I}m_{k}^{I}
 E(x_{i}^{I}x_{k}^{R})
= ½s_{i,k}^{I} +
m_{i}^{I}m_{k}^{R}
 E(x_{i}^{R}x_{k}^{I})
= ½s_{i,k}^{I} +
m_{i}^{R}m_{k}^{I}
 E(xx^{H})_{i,k} =
E(x_{i}x_{k}^{C}) =
E((x_{i}^{R}x_{k}^{R} +
x_{i}^{I}x_{k}^{I}) +
j(x_{i}^{I}x_{k}^{R}
 x_{i}^{R}x_{k}^{I})) =
s_{i,k}^{R} + js_{i,k}^{I}
+ m_{i}m_{k}^{C} = (S +
mm^{H})_{i,k}
 E(xx^{T})_{i,k} =
E(x_{i}x_{k}) =
E((x_{i}^{R}x_{k}^{R} 
x_{i}^{I}x_{k}^{I}) +
j(x_{i}^{I}x_{k}^{R}
+ x_{i}^{R}x_{k}^{I})) =
m_{i}m_{k} =
(mm^{T})_{i,k}

5.4 
If
x_{[n]} has a complex Gaussian distribution then
E(x • x^{C}) = diag(S) +
m • m^{C}
 E(x • x^{C}) =
E(diag(xx^{H})) =
diag(E(xx^{H})) = diag(S +
mm^{H}) = diag(S) + m •
m^{C } [5.3]

5.5 
If
x_{[n]} has a complex Gaussian distribution and p
= x • x^{C} then Cov(p) =
E(pp^{T})  E(p)E(p)^{T} =
S • S^{C} + 2(mm^{H}
• S^{T})^{R}
 If we define v = diag(S) and w = m
• m^{C} , then
E(p)E(p)^{T} =
(v+w)(v+w)^{T}
 Assume that y = x  m has zero mean and that
E(y • y^{C}) = diag(S) =
v
 E((y • y^{C})(y^{T}
• y^{H}))_{i,k} =
E(((x_{i}^{R})^{2} +
(x_{i}^{I})^{2})((x_{k}^{R})^{2}
+ (x_{k}^{I})^{2})) =
E((x_{i}^{R}x_{k}^{R})^{2} +
(x_{i}^{R}x_{k}^{I})^{2}
+
(x_{i}^{I}x_{k}^{R})^{2}
+
(x_{i}^{I}x_{k}^{I})^{2})
 Now from Wick's theorem and [5.3],
E((x_{i}^{R}x_{k}^{R})^{2}) =
E(x_{i}^{R}x_{i}^{R})E(x_{k}^{R}x_{k}^{R})
+ 2(E(x_{i}^{R}x_{k}^{R}))^{2} =
¼v_{i}v_{k} +
½(s_{i,k}^{R})^{2} =
(¼vv^{T} + ½S^{R}
• S^{R})_{i,k}
 Similarly
E((x_{i}^{I}x_{k}^{I})^{2})
= (¼vv^{T} + ½S^{R}
• S^{R})_{i,k} and
E((x_{i}^{I}x_{k}^{R})^{2}) =
E((x_{i}^{I}x_{k}^{I})^{2})
= (¼vv^{T} + ½S^{I}
• S^{I})_{i,k}
 Hence E((y •
y^{C})(y^{T} •
y^{H}))_{i,k} =
2(¼vv^{T} + ½S^{R}
• S^{R})_{i,k} +
2(¼vv^{T} + ½S^{I}
• S^{I})_{i,k} =
(vv^{T} + S •
S^{C})_{i,k}
 Since the expected value of any odd power of y_{i} is zero
and, from [5.3],
E(yy^{T}) = 0, we have
E((x • x^{C})(x^{T}
• x^{H})) = E(((y+m) •
(y+m)^{C})((y+m)^{T}
• (y+m)^{H}))
= E((y • y^{C})(y^{T}
• y^{H}) + (y •
y^{C})(m^{T} •
m^{H}) + (y •
m^{C})(y^{T} •
m^{H}) + (y •
m^{C})(m^{T} •
y^{H}) + (m •
y^{C})(y^{T} •
m^{H}) + (m •
y^{C})(m^{T} •
y^{H}) + (m •
m^{C})(y^{T} •
y^{H}) + (m •
m^{C})(m^{T} •
m^{H}))
= vv^{T} + S • S^{C} +
vw^{T} + 0 + S •
(mm^{H})^{C} + S^{C}
• (mm^{H}) + 0 + wv^{T} +
ww^{T} = S • S^{C}
+ 2(mm^{H} •
S^{T})^{R} +
(v+w)(v+w)^{T}

5.6 
If
x_{[n]} has a real Gaussian distribution then E(x
• x) = diag(S) + m •
m^{ } = diag(S + m •
m^{T})
 E(x • x)_{k} =
E(x_{k} • x_{k}) =
s_{k,k} + m_{k}^{2} =
(diag(S) + m • m)_{k}

5.7 
If
x_{[n]} has a real Gaussian distribution then
Cov(x • x) = 2 S • (S +
2mm^{T})
 Assume that x = y + m with E(y) = 0 and
and Cov(y) = S
 Cov(x • x)_{i,k} =
E(x_{i}^{2}x_{k}^{2}) 
(s_{i,}_{i} +
m_{i}^{2})(s_{k,k} +
m_{k}^{2}) =
E((y_{i}^{2}+2y_{i}m_{i}+m_{i}^{2})(y_{k}^{2}+2y_{k}m_{k}+m_{k}^{2}))
 (s_{i,}_{i} +
m_{i}^{2})(s_{k,k} +
m_{k}^{2}) = s_{i,i}s_{k,k} +
2s_{i,}_{k}^{2} +
s_{i,i}m_{k}^{2} +
s_{k,k}m_{i}^{2} +
4s_{i,}_{k}m_{i}m_{k}
+ m_{i}^{2}m_{k}^{2} 
(s_{i,}_{i} +
m_{i}^{2})(s_{k,k} +
m_{k}^{2}) = 2s_{i,}_{k}
(s_{i,}_{k}+2m_{i}m_{k})
= (2 S • (S +
2mm^{T}))_{i,k}

5.8 
If the joint distribution
[x; y] ~ N([x; y]; [p; q], [P
R; R^{T} Q]) then x  y ~
N(x, p+RQ^{1}(yq), P 
RQ^{1}R^{T}). The functions
f_{i}(y) below denote functions that are independent of
x.
 p(x  y) = p(x, y) / p(y) =
f_{1}(y) exp( ½[xp;
yq]^{T} S^{1} [xp; yq] )
where S = [P R; R^{T} Q]
 From [3.5] S^{1} =
[C^{1}, C^{1}RQ^{1};
Q^{1}R^{T}C^{1},
Q^{1}(I+R^{T}C^{1}RQ^{1})]
where C =(PRQ^{1}R^{T})
 Hence [xp; yq]^{T} S^{1}
[xp; yq] =
(xp)^{T}C^{1}(xp) 
2(xp)^{T}C^{1}RQ^{1}(yp)
+ f_{2}(y) =
(xpRQ^{1}(yq))^{T}C^{1}(xpRQ^{1}(yq))
+ f_{3}(y)
 Hence p(x  y) = f_{4}(y) exp(
½(xpRQ^{1}(yq))^{T}C^{1}(xpRQ^{1}(yq))
) = f_{5}(y) N(x,
p+RQ^{1}(yq), C) =
f_{5}(y) N(x,
p+RQ^{1}(yq), P 
RQ^{1}R^{T}) where
f_{5}(y) must equal 1 to give the correct integral.

5.9 
If x ~ N(x;
m, S) then x  A^{T}x=b ~
N(x, (IHA^{T})m+Hb,
(IHA^{T})S) where
H=SA(A^{T}SA)^{1}
 Define z = [x; A^{T}x] =
[I; A^{T}](xm). Then z has
mean [m; A^{T}m] and Cov(z) =
E([I;
A^{T}](xm)(xm)^{T}[I,
A]) = [I; A^{T}]S[I,
A] = [S SA; A^{T}S
A^{T}SA]
 From [5.8]
x  A^{T}x=b ~ N(x,
m+SA(A^{T}SA)^{1}(bA^{T}m),
S 
SA(A^{T}SA)^{1}A^{T}S)
= N(x, (IHA^{T})m+Hb,
(IHA^{T})S)

5.10 
If x ~ N(x;
m, S) then y = Ax+b ~
N(y; Am+b, ASA^{T})
 E(y) = E(Ax+b) = A E(x) + b =
Am+b
 Var(y) = E(yyT)  E(y)E(y)T
= E(Axx^{T}A^{T} +
Axb^{T} +
bx^{T}A^{T} +
bb^{T}) 
(Amm^{T}A^{T} +
Amb^{T} +
bm^{T}A^{T} +
bb^{T})
= (A(S+mm^{T})A^{T} +
Amb^{T} +
bm^{T}A^{T} +
bb^{T}) 
(Amm^{T}A^{T} +
Amb^{T} +
bm^{T}A^{T} +
bb^{T})
= ASA^{T}

5.11 
If S is +ve
semidefinite Hermitian, then the elements of R =
DIAG(S)^{½} S
DIAG(S)^{½} have magnitude <= 1.
 r_{i,j}^{2} = s_{i,j}^{2}
(s_{i,i}s_{j,j})^{1} <= 1 from
[3.6]

5.12 
If the joint distribution
[x; y] ~ N([x; y]; [p; q], [P
R^{T}; R Q]) then Var(x_{i} 
a^{T}y) is minimum when a =
Q^{1}R_{i} when its value is
diag(P)_{i} 
R_{i}^{T}Q^{1}R_{i}.
This proof is adapted from [R.1]. Since
variances are unaffected by a shift of mean we can assume wlog that p=0
and q=0. Now set b = a 
Q^{1}R_{i}, then for any a we
have
 Now note that for a zeromean scalar z, Var(z) =
E(zz^{T}). Hence
Var(x_{i}  a^{T}y) =
Var(x_{i} 
R_{i}^{T}Q^{1}y 
b^{T}y)
= E((x_{i} 
R_{i}^{T}Q^{1}y 
b^{T}y)(x_{i} 
R_{i}^{T}Q^{1}y 
b^{T}y)^{T})
= E(x_{i}x_{i}^{T} +
R_{i}^{T}Q^{1}yy^{T}Q^{1}R_{i}
+ b^{T}yy^{T}b 
2x_{i}y^{T}Q^{1}R_{i}
 2x_{i}y^{T}b +
2R_{i}^{T}Q^{1}yy^{T}b)
= diag(P)_{i} +
R_{i}^{T}Q^{1}QQ^{1}R_{i}
+ b^{T}Qb 
2R_{i}^{T}Q^{1}R_{i}
 2R_{i}^{T}b +
2R_{i}^{T}Q^{1}Qb
= diag(P)_{i} +
R_{i}^{T}Q^{1}R_{i}
+ b^{T}Qb 
2R_{i}^{T}Q^{1}R_{i}
 2R_{i}^{T}b +
2R_{i}^{T}b
= diag(P)_{i} 
R_{i}^{T}Q^{1}R_{i}
+ b^{T}Qb
Only the last term depends on b and, since Q is positive
semidefinite, this is minimum when b = 0.

5.13 
If the joint distribution
[x; y] ~ N([x; y]; [p; q], [P
R^{T}; R Q]) then the maximum (over a)
of the correlation between x_{i} and
a^{T}y is obtained when a =
Q^{1}R_{i}
This proof is adapted from [R.1]. Since
correlations are unaffected by a shift of mean we can assume wlog that
p=0 and q=0. For any a we can define
c=sqrt(R_{i}^{T}Q^{1}R_{i}
/ a^{T}Qa). From [5.12] we know that
 Var(x_{i}  ca^{T}y)
>= Var(x_{i} 
R_{i}^{T}Q^{1}y)
 E(x_{i}x_{i}^{T} +
c^{2}a^{T}yy^{T}a
 2cx_{i}y^{T}a) >=
E(x_{i}x_{i}^{T} +
R_{i}^{T}Q^{1}yy^{T}Q^{1}R_{i}

2x_{i}y^{T}Q^{1}R_{i})
 diag(P)_{i} +
c^{2}a^{T}Qa 
2cE(x_{i}y^{T}a) >=
diag(P)_{i} +
R_{i}^{T}Q^{1}QQ^{1}R_{i}

2E(x_{i}y^{T}Q^{1}R_{i})
 c^{2}a^{T}Qa 
2cE(x_{i}y^{T}a) >=
R_{i}^{T}Q^{1}R_{i}

2E(x_{i}y^{T}Q^{1}R_{i})

R_{i}^{T}Q^{1}R_{i}
 2cE(x_{i}y^{T}a)
>=
R_{i}^{T}Q^{1}R_{i}

2E(x_{i}y^{T}Q^{1}R_{i})
 cE(x_{i}y^{T}a) <=
E(x_{i}y^{T}Q^{1}R_{i})
 E(x_{i}y^{T}a) /
sqrt(a^{T}Qa) <=
E(x_{i}y^{T}Q^{1}R_{i})
/
sqrt(R_{i}^{T}Q^{1}R_{i}
)
 E(x_{i}y^{T}a) / sqrt(
Var(x_{i}) × a^{T}Qa) <=
E(x_{i}y^{T}Q^{1}R_{i})
/ sqrt( Var(x_{i}) ×
R_{i}^{T}Q^{1}R_{i}
)
 E(x_{i}y^{T}a) / sqrt(
Var(x_{i}) ×
Var(a^{T}y)) <=
E(x_{i}y^{T}Q^{1}R_{i})
/ sqrt( Var(x_{i}) ×
Var(R_{i}^{T}Q^{1}y))
Thus the correlation between x_{i} and
a^{T}y is bounded above by the right hand side and
attains this bound when a =
Q^{1}R_{i}. The value of the
correlation is given by

E(x_{i}y^{T}Q^{1}R_{i})
/ sqrt( Var(x_{i}) ×
Var(R_{i}^{T}Q^{1}y))
=
R_{i}^{T}Q^{1}R_{i}
/ sqrt( diag(P)_{i} ×
R_{i}^{T}Q^{1}R_{i})
=
sqrt(R_{i}^{T}Q^{1}R_{i}
/ diag(P)_{i})

5.14 
(g_{xy})_{i}^{2} = 1 
p_{ii}^{1} det([p_{ii} ,
r_{i}^{T}; r_{i} ,
Q]) det(Q)^{1}
 1  p_{ii}^{1} det([p_{ii} ,
r_{i}^{T}; r_{i} ,
Q]) det(Q)^{1}
 = 1  p_{ii}^{1} ((p_{ii} 
r_{i}^{T}Q^{1}r_{i})
det(Q)) det(Q)^{1} [3.1]
 = 1  ^{ }(1 
p_{ii}^{1}r_{i}^{T}Q^{1}r_{i})
=
p_{ii}^{1}r_{i}^{T}Q^{1}r_{i}
= (FR)_{ii} ÷ p_{ii}
= (g_{xy})_{i}^{2}

5.15 
d/dq
(ln(p(X)) =
1_{[k#1]}^{T}(XM)^{T}S^{1}
dm/dq  ½(k S^{1} 
S^{1}(XM)(XM)^{T}S^{1}):^{T}
dS/dq where
M_{[n#k]} =
m×1_{[k#1]}^{T}.
 d/dq (ln(p(X)) =
d/dm (ln(p(X)) dm/dq
+ d/dP (ln(p(X))
dP/dS dS/dq where
P = S^{1}
 d/dm (ln(p(X)) =
d/dM (ln(p(X))
dM/dm
=  ½ d/dM ( tr((XM)^{T}
S^{1} (XM))) dM/dm =
(S^{1}(XM)):^{T}
dM/dm [Derivative of
Trace]
= (S^{1}(XM)):^{T}
(1_{[k#1]} ⊗ I) [Derivative of Linear
Function]
=
1_{[k#1]}^{T}(XM)^{T}S^{1}
[Kroneker Product
Identity]
 d/dP (ln(p(X)) = ½k
d/dP (ln(det(2pi×P)))  ½
d/dS (tr((XM)^{T} P
(XM)))
= ½kP^{1}:^{T}  ½
d/dP (tr((XM)^{T} P
(XM))) [Derivative of Determinant] =
½kP^{1}:^{T} 
½
((XM)(XM)^{T}):^{T}
[Derivative of
Quadratic]
 dP/dS = (P ⊗ P) =
(S^{1} ⊗ S^{1})
[Derivative of
Inverse]
 d/dP (ln(p(X))
dP/dS = ½(
kS:^{T}  ½
((XM)(XM)^{T}):^{T})(S^{1}
⊗ S^{1})
= ½(S^{1}( kS 
½
(XM)(XM)^{T})S^{1}):^{T}
= ½(( kS^{1}  ½
S^{1}(XM)(XM)^{T})S^{1}):^{T}

5.16 
E(vv^{T}) = k
((dm/dq)^{T} S^{1}
dm/dq +
½(dS/dq)^{T} (S
⊗ S)^{1} dS/dq )
where v^{T} =
1_{[k#1]}^{T}(XM)^{T}S^{1}
dm/dq  ½(k
S^{1}:^{T} 
((XM)(XM)^{T}):^{T}
(S^{1} ⊗ S^{1}))
dS/dq
 Since v is the sum of k independent identically distributed
zeromean vectors, we can calculate E(vv^{T}) for
k=1 and then multiply the result by k. For k=1, v
simplifies to
v^{T} =
(xm)^{T}S^{1}
dm/dq + ½(S 
(xm)(xm)^{T} ):^{T}
dP/dq where P = S^{1}
(see [5.15])
 When we form the product vv^{T}, the cross terms are
cubic in (xm) and therefore have zero mean. We deal with the two
squared terms in vv^{T} separately. For the first term
 k E (((xm)^{T}S^{1}
dm/dq)^{T}
((xm)^{T}S^{1}
dm/dq))
= k E ((dm/dq)^{T}
S^{1}(xm)(xm)^{T}S^{1}
dm/dq)
= k E ((dm/dq)^{T}
S^{1} dm/dq)
 Since the second term has zero mean, we can write
 ¼k E
{(dP/dq)^{T} (S 
(xm)(xm)^{T} ):
(S 
(xm)(xm)^{T} ):^{T}
dP/dq}
= ¼k
(dP/dq)^{T}E {
((xm)(xm)^{T} ):
((xm)(xm)^{T}):^{T}
}dP/dq 
¼k(dP/dq)^{T}
S: S:^{T}
dP/dq
 We define T = TVEC(n,n) so that T
S: = T^{T} S: =
S^{T}: =S: since S and T are
symmetrical. [see vectorized
transpose]
 To calculate the quartic expectation, E
{((xm)(xm)^{T} ):
((xm)(xm)^{T}):^{T}
}, we use Wick's rule take the three possible pairings of the
(xm) terms and, treating x and y as distinct
random vectors, we get:
 E {((xm)(xm)^{T} ):
((xm)(xm)^{T}):^{T}
}
= E {((xm)(xm)^{T} ):
((ym)(ym)^{T}):^{T}
} + E {((xm)(ym)^{T} ):
((xm)(ym)^{T}):^{T}
} + E {((xm)(ym)^{T} ):
((ym)(xm)^{T}):^{T}
}
= S:S:^{T} + E {((ym)
⊗ (xm) )((ym) ⊗
(xm))^{T} } + E {((ym)
⊗ (xm) )((ym) ⊗
(xm))^{T}T^{T} }
[see kroneker
product and vectorized transpose]
= S:S:^{T} + E
{((ym)(ym) ) ⊗
((xm)(xm))^{T} } (I +
T^{T})
= S:S:^{T} + (S ⊗ S)
(I + T^{T})
 Substituting this into the second term expression gives
¼k
(dP/dq)^{T}(S:S:^{T}
+ (S ⊗ S) (I +
T^{T}))dP/dq 
¼k(dP/dq)^{T}
S: S:^{T}
dP/dq
= ¼k (dP/dq)^{T}
(S ⊗ S) (I +
T^{T})dP/dq
[Since the first and last terms
cancel]
= ½k
(dP/dq)^{T}(S ⊗
S) dP/dq [Since the symmetry of dP means that
T^{T} dP/dq =
dP/dq]
 Putting this all together and using dP/dq =
dP/dS dS/dq= (P
⊗ P) dS/dq = (S^{1}
⊗ S^{1}) dS/dq
[Derivative of
Inverse] we obtain
J = E {vv^{T}} = k E
((dm/dq)^{T} S^{1}
dm/dq) + ½k
(dP/dq)^{T}(S ⊗
S)dP/dq
= k E ((dm/dq)^{T}
S^{1} dm/dq) + ½k
(dS/dq)^{T}(S^{1}
⊗ S^{1})(S ⊗
S)(S^{1} ⊗
S^{1})dS/dq
= k E ((dm/dq)^{T}
S^{1} dm/dq) + ½k
(dS/dq)^{T}(S^{1}
⊗ S^{1})dS/dq

5.17 
If
f_{[r#1]}(X) is a function of X with mean
value g(q), then Cov(f) >=
dg/dq J^{1}
(dg/dq)^{T}
 g = E (f) means that dg/dq =
E(df/dq + f v^{T}) =
E(f v^{T}) where v^{T} =
d/dq (ln(p(X)).
 For any arbitrary vector a, we have the scalar equation
a^{T} dg/dq J^{1}
(dg/dq)^{T} a =
E(a^{T} f
v^{T}J^{1}
(dg/dq)^{T} a) =
Cov(a^{T} f, a^{T}
dg/dq J^{1}v)
 Hence from the CauchySchwarz inequality
(a^{T} dg/dq J^{1}
(dg/dq)^{T} a)^{2}
<= Var(a^{T}
f)Var(a^{T}
dg/dq J^{1}v)
= (a^{T}Cov(f)a) (a^{T}
dg/dq J^{1}Cov(v)
J^{1}(dg/dq)^{T}
a) = (a^{T}Cov(f)a)
(a^{T} dg/dq
J^{1}(dg/dq)^{T}
a) [Since Cov(v) =
J ]
 Hence (a^{T} dg/dq
J^{1} (dg/dq)^{T}
a) <= a^{T}Cov(f)a for any
a
 So Cov(f) >= dg/dq
J^{1}
(dg/dq)^{T} where >=
represents the Loewner partial order.

5.18 
For even n,
E(prod(x_{[n]}m)) =
(½n)!^{1}2^{½n}
sum_{v}(s_{v(1),v(2)}s_{v(3),v(4)}...s_{v(n1),v(n)})
where the sum is over all n! permutations v of the numbers
1:n. An equivalent formula is to omit the normalizing factor,
(½n)!^{1}2^{½n}, and to restrict
the summation to all distinct pairings of the numbers 1:n.
 Without loss of generality, we assume that m = 0. The
characteristic function is f(t) =
E(exp(jt^{T}x)) =
exp(½t^{T}St) where
j=sqrt(1).
 Differentiating E(exp(jt^{T}x))
shows that E(prod(x)) = j^{n}
d^{n}f/dt_{1}dt_{2}...dt_{n}
evaluated at t=0.
 Expanding the power series f(t) =
exp(½t^{T}St) =
sum((½t^{T}St)^{r}/r!;
r=0..inf). Term r in this contains only terms in
t_{x}^{r} where x is some combination of the
subscripts 1:n. When differentiated n times, terms with r
< n/2 will become zero, whereas terms with r > n/2
will still contain powers of t_{x} and will therefore be zero
when t is set to 0.
 It follows that when n is odd, all terms vanish and
E(prod(x))=0.
 When n is even, E(prod(x)) =
d^{n}g/dt_{1}dt_{2}...dt_{n}
evaluated at t=0 where g(t) = j^{n}
(½t^{T}St)^{½n}/(½n)!
=
½^{½n}(t^{T}St)^{½n}/(½n)!
 We can expand
(t^{T}St)^{½n}=t^{T}St
× t^{T}St × ...×
t^{T}St and we now differentiate in turn with
respect to t_{n}, t_{n}_{1}, ... ,
t_{1}. We note that t occurs n times in this
expansion and that dt/dt_{i} =
e_{i} a column of the identity matrix.
 When we differentiate with respect to t_{n}, we obtain a sum
of n terms, in each of which one of the occurences of t has been
replaced by e_{n} and the remaining n1
occurrences remain as t .
 If we now differentiate with respect to t_{n}_{1},
each of the n terms in the previous step changes into a sum of
n1 terms in which one ot the occurences of t is now replaced by
e_{n1} and the remaining n2 occurrences remain
as t . We thus have a total of n(n1) terms.
 We repeat this process for t_{n}_{2}, ... ,
t_{1}and we end up with n! terms of the form
(e_{v(1)})^{T}Se_{v(2)}(e_{v(3)})^{T}Se_{v(4)}...(e_{v(n1)})^{T}Se_{v(n)}
=
s_{v(1),v(2)}s_{v(3),v(4)}...s_{v(n1),v(n)}
where v runs through all n! permutations of 1:n.
 Thus E(prod(x)) =
(½n)!^{1}2^{½n}
sum_{v}(s_{v(1),v(2)}s_{v(3),v(4)}...s_{v(n1),v(n)})
 Note that each term in the summation arises
(½n)!2^{½n} times since the
½n factors s_{ij} can be rearranged in
(½n)! orders and for each factor s_{ij} =
s_{ji} since S is symmetric. Thus an equivalent formula
is to omit the normalizing factor,
(½n)!^{1}2^{½n}, and restrict the
summation to all distinct pairings of the numbers 1:n. This is known as
Isserlis' theorem or Wick's theorem.

5.19 
[x
Real Gaussian, m=0] Y_{n}=<(xx^{T})^{n}>,
then Y_{1}=S and Y_{n+1}=
tr(Y_{n})S+2nSY_{n}
 By defintion, Y_{1}=S. So we assume
n>0.
 Y_{n+1} consists of 2n+2 x terms
multiplied together. By Wick's theorem [5.18] we can obtain Y_{n+1} by
considering all possible pairings of the x's and treating each pair as
an independent Gaussian vector.
 We write Y_{n+1}=
<x(x^{T}x)^{n}x^{T}>
and note that the central term,
(x^{T}x)^{n}, is a product of
scalars which may therefore be reordered arbitrarily. If we dentote the frst
x by x_{1}, its pair can be in any of 2n+1
positions; we consider three cases:
 The pair to x_{1} can be the first x in one of the
scalars that form (x^{T}x)^{n};
this gives n terms of the form
<x_{1}(x_{1}^{T}x)(x^{T}x)^{n1}x^{T}>
=
<x_{1}x_{1}^{T}(xx^{T})^{n}>
= SY_{n}
 The pair to x_{1} can be the second x in one of the
scalars that form (x^{T}x)^{n};
this gives n terms of the form
<x_{1}(x^{T}x_{1})(x^{T}x)^{n1}x^{T}>
=
<x_{1}(x_{1}^{T}x)(x^{T}x)^{n1}x^{T}>
=
<x_{1}x_{1}^{T}(xx^{T})^{n}>
= SY_{n}
 The pair to to x_{1} can be the final x; this
gives a single term of the form
<x_{1}(x^{T}x)^{n}x_{1}^{T}>
=
<(x^{T}x)^{n}x_{1}x_{1}^{T}>
= tr(Y_{n})S
 Adding these 2n+1 terms gives
Y_{n+1}=
nSY_{n} + nSY_{n}+
tr(Y_{n})S =
tr(Y_{n})S+2nSY_{n}.

5.20 
[x Complex
Gaussian, m=0] Y_{n}=<(xx^{H})^{n}>,
then Y_{1}=S and Y_{n+1}=
tr(Y_{n})S+nSY_{n}
 By defintion, Y_{1}=S. So we assume
n>0.
 Y_{n+1} consists of 2n+2 x terms
multiplied together. By Wick's theorem [5.18] we can obtain Y_{n+1} by
considering all possible pairings of the x's and treating each pair as
an independent complex Gaussian vector.
 We write Y_{n+1}=
<x(x^{H}x)^{n}x^{H}>
and note that the central term,
(x^{H}x)^{n}, is a product of
scalars which may therefore be reordered arbitrarily. If we dentote the frst
x by x_{1}, its pair can be in any of 2n+1
positions; we consider three cases:
 The pair to x_{1} can be the first x in one of the
scalars that form (x^{H}x)^{n};
this gives n terms of the form
<x_{1}(x_{1}^{H}x)(x^{H}x)^{n1}x^{H}>
=
<x_{1}x_{1}^{H}(xx^{H})^{n}>
= SY_{n}
 The pair to x_{1} can be the second x in one of the
scalars that form (x^{H}x)^{n};
this gives n terms of the form
<x_{1}(x^{H}x_{1})(x^{T}x)^{n1}x^{T}>
=
<x_{1}(x_{1}^{T}x^{*})(x^{H}x)^{n1}x^{H}>
=
<x_{1}x_{1}^{T}x^{*}x^{H}(xx^{H})^{n1}>
= 0 [5.3]
 The pair to to x_{1} can be the final x; this
gives a single term of the form
<x_{1}(x^{H}x)^{n}x_{1}^{H}>
=
<(x^{H}x)^{n}x_{1}x_{1}^{H}>
= tr(Y_{n})S
 Adding these 2n+1 terms gives
Y_{n+1}=
nSY_{n} + 0+
tr(Y_{n})S =
tr(Y_{n})S+nSY_{n}.

5.21 
If y
~ kN(y; 0, 1) is
restricted to the domain p<y<q, then,
E(y)=r= (f(p)f(q))/(F(q)F(p)) and
Var(y)=v=1(q f(q)  p
f(p))/(F(q)F(p)) 
r^{2} where k=1/(F(q)F(p)), with f(x)
and F(x) being the pdf and cdf of the standard
Gaussian.
We first note that f(x)=f(x),
F(x)=1F(x),
dF/dx=f(x) and df/dx =
x f(x).
So with all integrals going from p to q to infinity we can
write Ã¢Ë†Â«
f(x)dx =
F(q)F(p),
Ã¢Ë†Â« xf(x)dx =
f(p)f(q),
Ã¢Ë†Â«
x^{2}f(x)dx =F(q)
 F(p) + p f(p)  q
f(q).
From this, E(y) =
(f(p)f(q))/(F(q)F(p))=r
and Var(y) = 1(q f(q)  p
f(p))/(F(q)F(p)) 
r^{2}.
For the special case p=Ã¢Ë†Å¾,
f(p)=F(p)=0 and E(y) = r
= f(q)/F(q)=r and
Var(y) = v = 1+r (qr).
For the special case q=+Ã¢Ë†Å¾,
f(q)=0, F(q)=1 and E(y) =
r =
f(p)/(1F(p))=f(p)/F(p)
and Var(y) = v = 1+ p
f(p)/F(p) 
r^{2}.

5.22 
Suppose y ~
kN(y; m, S) is restricted to the domain satisfying
b<a^{T}y<c with
k a normalizing constant. Then E(y) = m+grSa and Cov(y) = S 
g^{2}vSa(Sa)^{T}
where g=1/sqrt(a^{T}Sa),
p=g(ba^{T}m), q=g(ca^{T}m),
r= (f(p)f(q))/(F(q)F(p)), v=(q f(q) 
p f(p))/(F(q) 
F(p)) + r^{2} and
f(q) and
F(q) are the pdf and cdf
respectively of a standard 1dimensional Gaussian with
f(q)=dF/dq.
We use Cholesky decomposition to find T such that
S=TT^{T} and define
w=T^{1}(ym) which implies y =
Tw+m.
So w ~ kN(w; 0,
T^{1}ST^{T}) = kN(w;
0, I) within the domain satisfying ba^{T}m
< a^{T}Tw <
ca^{T}m.
Now find an orthogonal Q such that
gQT^{T}a = e where e is
the first column of the identity matrix and g =
1/sqrt(a^{T}TT^{T}a)=1/sqrt(a^{T}Sa)>
0. Thus Q is any orthogonal matrix whose first row is
ga^{T}T. If follows that
gT^{T}a =
Q^{T}e which implies
ga^{T}TQ^{T}= e.
Now we define z=Qw which implies
w=Q^{T}z.
So z ~ kN(z; 0, I) within the domain defined by
ba^{T}m <
a^{T}TQ^{T}z <
ca^{T}m which is equivalent to
g(ba^{T}m) <
e^{T}z <
g(ca^{T}m).
Since only the first element of z is constrained and all elements are
independent, we can use [5.21] to write E(z) = re
and Cov(z) = I  vee^{T} where
p=g(ba^{T}m),
q=g(ca^{T}m),
r=(f(p)f(q))/(F(q)F(p))
and v=(q f(q)  p
f(p))/(F(q)  F(p)) +
r^{2}.
We have y = TQ^{T}z+m and so
E(y) = rTQ^{T}e+m and
Cov(y)=TQ^{T}(I 
vee^{T})QT^{T} =
TQ^{T}QT^{T}vTQ^{T}ee^{T}QT^{T}
. From above we have gT^{T}a =
Q^{T}e which implies TQ^{T}e
= gTT^{T}a =
gSa.
Using this substitution (along with Q^{T}Q =
I) gives E(y) = m+grSa and
Cov(y) = S 
g^{2}vSa(Sa)^{T} as
required. 
5.23 
<(Ax + a)(Bx +
b)^{H}> = ASB^{H} +
(Am+a)(Bm+b)^{H}
 We define the zeromean vector u = xm so that
<u> = 0 and <uu^{H}> =
S.
 <(Ax + a)(Bx + b)^{H}> = <(Au + Am
+ a)(Bu + Bm + b)^{H}>
 = <Auu^{H}B^{H} +
Au(Bm+b)^{H} +
(Am+a)u^{H}B^{H} + (Am +
a)(Bm + b)^{H}>
 = ASB^{H} + 0 + 0 + (Am + a)(Bm +
b)^{H} = ASB^{H} + (Am +
a)(Bm + b)^{H}

5.24 
<(Ax+a)^{H} (Bx+b)> =
<tr((Bx+b)(Ax+a)^{H} )> =
tr(BSA^{H}) + (Am+a)^{H}
(Bm+b)
 <(Ax + a)^{H}(Bx + b)> = tr(<(Bx
+ b)(Ax + a)^{H}>)
 = tr(BSA^{H}) + tr((Bm +
b)(Am + a)^{H}) [5.23]
 = tr(BSA^{H}) + (Am +
a)^{H}(Bm + b)

5.25 
argmin_{K}<(AKB+C)x^{2}> =
(A^{H}A)^{1}A^{H}C(S+mm^{H})B^{H}(B(S+mm^{H})B^{H})^{1}

argmin_{K}<(AKB+C)x^{2}>
=
argmin_{K}<(AKB+C)x((AKB+C)x)^{H}>
 =
argmin_{K}<tr{(AKB+C)xx^{H}(AKB+C)^{H}}>
 =
argmin_{K}{tr((AKB+C)(S+mm^{H})(AKB+C)^{H})}

= (A^{H}A)^{1}A^{H}C(S+mm^{H})B^{H}(B(S+mm^{H})B^{H})^{1}
[2.7]

5.26 
argmin_{K}<(AKB+C)x +
(AKE+F)y^{2}> =
(A^{H}A)^{1}A^{H}(CS_{x}B^{H}+FS_{y}E^{H)}(BS_{x}B^{H}+ES_{y}E^{H})^{1}
 We can define S = <[x; y][x; y]^{H}>
= [Sx 0; 0 Sy] since x and y
are independent and zero mean
 argmin_{K}<(AKB+C)x +
(AKE+F)y^{2}>
 = argmin_{K}<(AK[B E]+[C F])[x;
y]^{2}>
 =
(A^{H}A)^{1}A^{H}[C+F]S[B+E]^{H}([B+E]S[B+E]^{H})^{1}
[5.25]
 =
(A^{H}A)^{1}A^{H}(CS_{x}B^{H}+FS_{y}E^{H)}(BS_{x}B^{H}+ES_{y}E^{H})^{1}

5.27 
<(Ax + a)(Bx +
b)^{T}(Cx + c) (Dx + d)^{T}>
=
(ASB^{T}+(Am+a)(Bm+b)^{T})(CSD^{T}+(Cm+c)
(Dm+d)^{T}) +
(ASC^{T}+(Am+a)(Cm+c)^{T})(BSD^{T}+(Bm+b)
(Dm+d)^{T}) +
(Bm+b)^{T}(Cm+c)×(ASD^{T}
 (Am+a)(Dm+d)^{T}) +
tr(BSC^{T})×(ASD^{T}
+ (Am+a)(Dm+d)^{T})
 We define the zeromean vector u = xm so that
<u> = 0 and <uu^{T}> =
S.
 We now multiply out the quartic expression, omitting terms invovling odd
powers of u since their mean is zero [5.18]
 <(Au + a)(Bu + b)^{T}(Cu + c) (Du
+ d)^{T}>
 =
<Auu^{T}B^{T}Cuu^{T}D^{T}
+
Auu^{T}B^{T}cd^{T}
+ Aub^{T}Cud^{T} +
Aub^{T}cu^{T}D^{T}
+
au^{T}B^{T}Cud^{T}
+
au^{T}B^{T}cu^{T}D^{T}
+ ab^{T}
Cuu^{T}D^{T} +
ab^{T}cd^{T}>
 =
<Auu^{T}B^{T}Cuu^{T}D^{T}
+
Auu^{T}B^{T}cd^{T}
+
Auu^{T}C^{T}bd^{T}
+
b^{T}c×Auu^{T}D^{T}
+ au^{T}B^{T}Cud^{T} +
ac^{T}Buu^{T}D^{T}
+ ab^{T} Cuu^{T}D^{T} +
ab^{T}cd^{T}>
 where we have moved or transposed some scalar factors of the form
p^{T}q. Defining v to have the same
statistics as u but independent of it, we use Isserlis' theorem
[5.18] to decompose the
first term as

<Auu^{T}B^{T}Cuu^{T}D^{T}>
=
<Auu^{T}B^{T}Cvv^{T}D^{T}>
+
<Auv^{T}B^{T}Cuv^{T}D^{T}>
+
<Auv^{T}B^{T}Cvu^{T}D^{T}>
 By again moving or transposing scalar factors of the form
p^{T}q we can rearrange to get
 =
<Auu^{T}B^{T}Cvv^{T}D^{T}>
+
<Auu^{T}C^{T}Bvv^{T}D^{T}>
+
<Auu^{T}D^{T}×v^{T}B^{T}Cv>
 We now use the quadratic expectation theorems to give
 = ASB^{T}CSD^{T} +
ASC^{T}BSD^{T} +
ASD^{T}×tr(B^{T}CS)
 Therefore we can write
 <(Au + a)(Bu + b)^{T}(Cu + c) (Du
+ d)^{T}>
 =
<Auu^{T}B^{T}Cuu^{T}D^{T}
+
Auu^{T}B^{T}cd^{T}
+
Auu^{T}C^{T}bd^{T}
+
b^{T}c×Auu^{T}D^{T}
+ au^{T}B^{T}Cud^{T} +
ac^{T}Buu^{T}D^{T}
+ ab^{T} Cuu^{T}D^{T} +
ab^{T}cd^{T}>
 = ASB^{T}CSD^{T} +
ASC^{T}BSD^{T} +
ASD^{T}×tr(B^{T}CS)
+ ASB^{T}cd^{T} +
ASC^{T}bd^{T} +
b^{T}c×ASD^{T} +
ad^{T}×tr(B^{T}CS)
^{ }+
ac^{T}BSD^{T} +
ab^{T}CSD^{T} +
ab^{T}cd^{T}>
 =
(ASB^{T}+ab^{T})(CSD^{T}+cd^{T})
+
(ASC^{T}+ac^{T})(BSD^{T}+bd^{T})
+ b^{T}c×(ASD^{T}
 ad^{T}) +
tr(BSC^{T})×(ASD^{T}
+ ad^{T})
 where we use tr(B^{T}CS) =
tr(BSC^{T}) and also add and subtract a copy of
ab^{T}cd^{T} =
ac^{T}bd^{T} =
b^{T}c×ad^{T}
 Now we note that Ax + a = Au + (Am + a)
so if we replace a by Am+a, b by Bm+b
etc in the above expression, we obtain
 <(Ax + a)(Bx + b)^{T}(Cx + c) (Dx
+ d)^{T}> =
(ASB^{T}+(Am+a)(Bm+b)^{T})(CSD^{T}+(Cm+c)
(Dm+d)^{T}) +
(ASC^{T}+(Am+a)(Cm+c)^{T})(BSD^{T}+(Bm+b)
(Dm+d)^{T}) +
(Bm+b)^{T}(Cm+c)×(ASD^{T}
 (Am+a)(Dm+d)^{T}) +
tr(BSC^{T})×(ASD^{T}
+ (Am+a)(Dm+d)^{T})

5.28 
<(Ax +
a)^{T}(Bx + b) (Cx +
c)^{T}(Dx + d)> =
tr(AS(C^{T}D+D^{T}C)SB^{T})
+ ((Am+a)^{T}B +
(Bm+b)^{T}A)S(C^{T}(Dm+d) +
D^{T}(Cm+c)) +
(tr(ASB^{T})+(Am+a)^{T}(Bm+b))(tr(CSD^{T})+(Cm+c)^{T}(Dm+d))
 <(Ax + a)^{T}(Bx +
b) (Cx + c)^{T}(Dx + d)> = tr(<(Bx +
b) (Cx + c)^{T}(Dx + d)(Ax +
a)^{T}>)
 so, from [5.27] we can write

= tr((BSC^{T}+(Bm+b)(Cm+c)^{T})(DSA^{T}+(Dm+d)
(Am+a)^{T}) +
(BSD^{T}+(Bm+b)(Dm+d)^{T})(CSA^{T}+(Cm+c)
(Am+a)^{T}) +
(Cm+c)^{T}(Dm+d)×(BSA^{T}
 (Bm+b)(Am+a)^{T}) +
tr(CSD^{T})×(BSA^{T}
+ (Bm+b)(Am+a)^{T}))
 = tr(BSC^{T}DSA^{T}) +
tr(BSC^{T}(Dm+d)(Am+a)^{T})
+
tr((Bm+b)(Cm+c)^{T}(DSA^{T}+(Dm+d)(Am+a)^{T}))
+ tr(BSD^{T}CSA^{T}) +
tr(BSD^{T}(Cm+c)(Am+a)^{T})
+
tr((Bm+b)(Dm+d)^{T}(CSA^{T}+(Cm+c)(Am+a)^{T}))
+
(Cm+c)^{T}(Dm+d)×tr(BSA^{T}
 (Bm+b)(Am+a)^{T}) +
tr(CSD^{T})×tr(BSA^{T}
+ (Bm+b)(Am+a)^{T})
 Now, noting that tr(pq^{T}) =
q^{T}p and also collecting together all terms
that equal
(Am+a)^{T}(Bm+b)(Cm+c)^{T}(Dm+d),
we can write
 = tr(BSC^{T}DSA^{T})) +
(Am+a)^{T}BSC^{T}(Dm+d) +
(Cm+c)^{T}DSA^{T}(Bm+b) +
tr(BSD^{T}CSA^{T}) +
(Am+a)^{T}BSD^{T}(Cm+c) +
(Dm+d)^{T}CSA^{T}(Bm+b) +
(Cm+c)^{T}(Dm+d)×tr(BSA^{T})
+
tr(CSD^{T})×(tr(BSA^{T})
+ (Am+a)^{T}(Bm+b)) +
(21)×(Am+a)^{T}(Bm+b)(Cm+c)^{T}(Dm+d)
 = tr(BSC^{T}DSA^{T} +
BSD^{T}CSA^{T}) +
(Am+a)^{T}BS(C^{T}(Dm+d)
+ D^{T}(Cm+c)) +
(Bm+b)^{T}ASD^{T}(Cm+c)
+
(Bm+b)^{T}ASC^{T}(Dm+d)
+
(Cm+c)^{T}(Dm+d)×tr(ASB^{T})
+
tr(CSD^{T})×(tr(ASB^{T}
) + (Am+a)^{T}(Bm+b)) +
(Am+a)^{T}(Bm+b)(Cm+c)^{T}(Dm+d)
 =
tr(ASD^{T}CSB^{T} +
ASC^{T}DSB^{T}) +
(Am+a)^{T}BS(C^{T}(Dm+d)
+ D^{T}(Cm+c)) +
(Bm+b)^{T}AS(C^{T}(Dm+d)
+ D^{T}(Cm+c)) +
(tr(CSD^{T}) +
(Cm+c)^{T}(Dm+d)) ×
(tr(ASB^{T} ) +
(Am+a)^{T}(Bm+b))
 =
tr(AS(D^{T}C+C^{T}D)SB^{T})
+ ((Am+a)^{T} B+
(Bm+b)^{T}A)S(C^{T}(Dm+d)
+ D^{T}(Cm+c)) +
(tr(CSD^{T}) +
(Cm+c)^{T}(Dm+d))
(tr(ASB^{T}) +
(Am+a)^{T}(Bm+b))
