4.1 
diag(ab^{T}) =
a • b
 If X = ab^{T}, then x_{i,j} =
a_{i}b_{j} and the diagonal elements of X,
x_{i,i} = a_{i}b_{i}. This is
precisely the i^{th} element of a • b.

4.2 
(A^{1} +
UV^{H})^{1} = A  (AU(I +
V^{H}AU)^{1})V^{H}A
We show that A^{1} + UV^{H} and
A  (AU(I +
V^{H}AU)^{1})V^{H}A
are inverses by multiplying them together:
 (A^{1} + UV^{H})( A 
(AU(I +
V^{H}AU)^{1})V^{H}A)
 =
I+UV^{H}A(U+UV^{H}AU)(I
+
V^{H}AU)^{1}V^{H}A
 =
I+UV^{H}AU(I+V^{H}AU)(I
+
V^{H}AU)^{1}V^{H}A
 =
I+UV^{H}AUV^{H}A
= I

4.3 
det(A^{1} +
UV^{H}) = det(I +
V^{H}AU) ×
det(A^{1})
From [3.1] we have that
det([A^{1}, U; V^{H}, I])
= det(A^{1}) ×
det(I+V^{H}AU) =
det(I)*det(A^{1}+UI^{1}V^{H}) which
simplifies to det([A^{1}, U;
V^{H}, I]) = det(A^{1})
× det(I+V^{H}AU) =
det(A^{1}+UV^{H})

4.4 
If
D_{[n#n]}= DIAG(d) is real with
d_{i}>=d_{i}_{+1} for all i,
and if Y_{[n#k]} is constrained to satisfy
Y^{H}Y=I then (a) max_{Y}
tr(Y^{H}DY) = sum(d_{1:k})
and is attained by Y=[I; 0] and (b)
min_{Y} tr(Y^{H}DY) =
sum(d_{nk+1:n}) and is attained by
Y=[0; I].
 Define v_{i} =
sum_{j}(y_{ij}^{2}). We must have
0<=v_{i}<=1 since Y consists of k columns
from an n#n unitary matrix. Also sum(v)=k since
each of the k columns of Y has unit length.
 tr(Y^{H}DY) =
v^{T}d which is a weighted sum of the
d_{i} with the constraints on v given above. (a) To
maximize this sum, we set v_{1:k}=1 and
v_{k}_{+1:n}=0 which gives
tr(Y^{H}DY) = sum(d_{1:k}).
(b) To minimize this sum, we set v_{1:nk}=0 and
v_{nk}_{+1:n}=1 which gives
tr(Y^{H}DY) =
sum(d_{nk+1:n}).

4.5 
If
D_{[n#n]}= DIAG(d) with
d_{i}>=d_{i}_{+1} for all i,
then max_{y} (y^{H}Dy 
y^{H}y=1) = d_{1}
and min_{y} (y^{H}Dy 
y^{H}y=1) = d_{n} and these bounds
are attained by y=e_{1} and
y=e_{n} respectively.
This is a special case of [4.4] with k=1, but may also
be proved directly.
 y^{H}Dy = sum(d •
y^{C} • y) <= max(d) ×
sum(y^{C} • y) = d_{1} ×
y^{H}y = d_{1}
 y^{H}Dy = sum(d •
y^{C} • y) >= min(d) ×
sum(y^{C} • y) = d_{n} ×
y^{H}y = d_{n}

4.6 
If
D_{[n#n]}= DIAG(d) with
d_{i}>=d_{i}_{+1} for all i,
then min_{W:[n#k]} max_{y}
(y^{H}Dy  y^{H}y=1
and W^{H}y=0) =
d_{nk} and this bound is attained by
W=[0; I_{[k#k]}] and
y=e_{nk}.
First define V = [0;
I_{[k#k]}] and note that
V^{H}y=0 implies that
y=PP^{H}y where
P=(I_{[nk#nk]}; 0) since
PP^{H}+VV^{H}=I
For any fixed W_{[k#k]},
max_{y} (y^{H}Dy 
y^{H}y=1 and
W^{H}y=0)
>= max_{y} (y^{H}Dy 
y^{H}y=1 and
W^{H}y=0 and
V^{H}y=0)
=max_{y}
(y^{H}PP^{H}DPP^{H}y
 y^{H}y=1 and
W^{H}y=0 and
V^{H}y=0)
>= min(d_{1:nk}) from [4.5] since
P^{H}DP=diag(d_{1:nk})
=d_{nk} since the entries of d are in
descending order.
From [4.5] this bound is attained when
P^{H}y=e_{nk} which
is in turn when y=e_{nk}.
Since this is true for any W_{[k#k]}, we must
have min_{W:[n#k]} max_{y}
(y^{H}Dy  y^{H}y=1
and W^{H}y=0) >=
d_{nk} But if we choose W=V,
then we attain this bound with y=e_{nk}.
where e_{i} is the i'th column of the identity
matrix.

4.7 
If
H_{[n#n]}=UDU^{H} is
hermitian, U is unitary and
D=diag(d)=diag(eig(H)), then min_{W}
max_{x} (x^{H}Hx 
x^{H}x=1 and
W_{[n#k]}^{H}x=0)
= d_{nk} and this bound is attained by
W=U_{:,nk+1:n} and
y=u_{nk}
 Set y=U^{H}x, then
x^{H}Hx = y^{H}Dy and
x^{H}x = y^{H}y
 min_{W} max_{x}
(x^{H}Hx  x^{H}x=1
and W_{[n#k]}^{H}x=0)
= min_{W} max_{y}
(y^{H}Dy 
y^{H}y=1 and
W_{[n#k]}^{H}Uy=0)
From [4.6] we
attain the bound with U^{H}W= [0;
I_{[k#k]}] and
y=e_{nk}.
Hence W = U[0; I_{[k#k]}] =
U_{:,nk+1:n}
and x = Ue_{nk} =
u_{nk}.

4.8 
If
H_{[n#n]}=UDU^{H} is
hermitian, U is unitary and
D=diag(d)=diag(eig(H)) contains the eigenvalues in
decreasing order, then max_{x}
(x^{H}Hx  x^{H}x=1)
= d_{1} and min_{x}
(x^{H}Hx  x^{H}x=1)
= d_{n} and these bounds are attained
by x=u_{1} and y=u_{n}
respectively.
From [4.5] and setting
y=U^{H}x, d_{1}=
max_{y} (y^{H}Dy 
y^{H}y=1) = max_{x}
((U^{H}x)^{H}D(U^{H}x)

(U^{H}x)^{H}(U^{H}x)=1)
= max_{x} (x^{H}Hx 
x^{H}x=1)
Similarly, d_{n}= min_{y}
(y^{H}Dy  y^{H}y=1)
= min_{x}
((U^{H}x)^{H}D(U^{H}x)

(U^{H}x)^{H}(U^{H}x)=1)
= min_{x} (x^{H}Hx 
x^{H}x=1)

4.9 
If
D_{[n#n]}= DIAG(d) is real with
d_{i}>=d_{i}_{+1}>0 for all
i, and if Y_{[n#k]} is constrained to have
rank k, then (a)
max(det(Y^{H}DY)/det(Y^{H}Y))
= prod(d_{1:k}) and is attained by Y=[I;
0] and (b)
min(det(Y^{H}DY)/det(Y^{H}Y))
= prod(d_{nk+1:n}) and is attained by
Y=[0; I].
 At an extreme value, the complex
gradient must be zero. Hence 0 =
d(ln(det(Y^{H}DY)/det(Y^{H}Y)))/dY^{C}
=
d(ln(det(Y^{H}DY)))/dY^{C}

d(ln(det(Y^{H}Y)))/dY^{C}
= (DY(Y^{H}DY)^{1} 
Y(Y^{H}Y)^{1}):
[2.16]
 Hence DY(Y^{H}DY)^{1} =
Y(Y^{H}Y)^{1} and so DY =
Y(Y^{H}Y)^{1}Y^{H}DY
 Taking the Hermitian transpose, we get
Y^{H}DY(Y^{H}Y)^{1}
Y^{H} = Y^{H} D
 Since rank(Y)=k, Y^{H} must contain
k linearly independent columns and so the columns of
Y^{H} contain a complete set of eigenvectors for the
k#k matrix
Y^{H}DY(Y^{H}Y)^{1}
with eigenvalues given by the corresponding entries in d.

det(Y^{H}DY)/det(Y^{H}Y)
=
det(Y^{H}DY(Y^{H}Y)^{1})
equals the product of the eigenvalues and is therefore the product of k
elements of d. Since the elements of d and in decreasing order, the
maximum possible product is prod(d_{1:k}) and the minimum
is prod(d_{nk+1:n}). These values are
attained by Y=[I; 0] and Y=[0; I]
respectively.

4.10 
If A is hermitian and B is
+ve definite hermitian there exists X such that
X^{H}BX=I and
X^{H}AX=D where X and
D may be obtained from the eigendecomposition
B^{1}A=XDX^{1} with
D=DIAG(d) a diagonal matrix
of eigenvalues in nonincreasing order. If S is the +ve definite
hermitian square root of B^{1} (i.e.
S^{2}B=I) then the eigendecomposition of the
hermitian matrix SAS is
SAS=WDW^{H} where
W=S^{1}X is
unitary.
 Since B is +ve definite hermitian, we can decompose it as
B = ULU^{H} where U is unitary and L
is diagonal with real diagonal entries >0.
 Define the real diagonal matrix M =
L^{½}.
 The matrix M^{H}U^{H}AUM
is hermitian and can therefore be decomposed as
M^{H}U^{H}AUM =
VDV^{H} where V is unitary and
D=DIAG(d) is real. We may order the columns of V so that
the elements of d are in nonincreasing order.
 Define X = UMV. Then X^{H}BX =
V^{H}M^{H}U^{H}BUMV
= V^{H}M^{H}LMV =
V^{H}V = I and
X^{H}AX =
V^{H}M^{H}U^{H}AUMV
= V^{H}VDV^{H}V = D
 B^{1}A =
(ULU^{H})^{1}(UM^{1}VDV^{H}M^{1}U^{H})
= (UM^{2}U^{H})
(UM^{1}VDV^{H}M^{1}U^{H})
=
UMVDV^{H}M^{1}U^{H}
= (UMV) D (UMV)^{1} =
XDX^{1}
 The hermitian +ve definite square root of B^{1} is S
= UMU^{H}. Therefore SAS =
(UMU^{H})
(UM^{1}VDV^{H}M^{1}U^{H})
(UMU^{H}) =
UVDV^{H}U^{H} = (UV)
D
(UV)^{H}=WDW^{H}.
 If A and B are real, then all the above
matrices can be taken as real.

4.11 
If W is +ve definite Hermitian
and B is Hermitian, then if X_{[n#k]} is
restricted to have rank(X)=k, max_{X}
tr((X^{H}WX)^{1}
X^{H}BX) = sum(d_{1:k})
where d are the eigenvalues of W^{1}B sorted into
decreasing order and this is attained by taking the columns of X to be
the corresponding eigenvectors.
 From [4.10] we can find
G_{[n#n]} such that
G^{H}WG = I and
G^{H}BG = D = DIAG(d) where
d and G are the eigenvalues and corresponding eigenvectors
of W^{1}B with the elements of d in
nonincreasing order. Since G is nonsingular, the range of
X=GY_{[n#k]} over rank(Y)=k
includes all n#k matrices with rank k.
 Hence, max_{X}
tr((X^{H}WX)^{1}
X^{H}BX) = max_{Y}
tr(((GY)^{H}W(GY))^{1}
(GY)^{H}B(GY)) = max_{Y}
tr((Y^{H}G^{H}WGY)^{1}
Y^{H}G^{H}BGY) =
max_{Y} tr((Y^{H}Y)^{1}
Y^{H}DY).
 From [1.9] any
Y_{[n#k]} with rank k may be decomposed as
Y=Q_{[n#k]}R_{[k#k]}
with Q^{H}Q=I and R
nonsingular.
 Hence, max_{Y}
tr((Y^{H}Y)^{1}
Y^{H}DY) = max_{Q,R}
tr((R^{H}Q^{H}QR)^{1}
R^{H}Q^{H}DQR) =
max_{Q,R}
tr(R^{1}R^{H}
R^{H}Q^{H}DQR) =
max_{Q,R}
tr(R^{1}Q^{H}DQR) =
max_{Q,R}
tr(Q^{H}DQRR^{1}) =
max_{Q,R} tr(Q^{H}DQ) =
max_{Q} tr(Q^{H}DQ)
 From [4.4], the maximum is
sum(d_{1:k}) and is attained by Y=Q=[I;
0]. From this we find that X=GY consists of the first
k columns of G, that is, the eigenvectors corresponding to the
k highest eigenvalues.

4.12 
If W is +ve definite Hermitian
and B is Hermitian, then if X_{[n#k]} is
restricted to have rank(X)=k, max_{X}
det((X^{H}WX)^{1}
X^{H}BX) = prod(d_{1:k})
where d are the eigenvalues of W^{1}B sorted into
decreasing order and this is attained by taking the columns of X to be
the corresponding eigenvectors.
 From [4.10] we can find
G_{[n#n]} such that
G^{H}WG = I and
G^{H}BG = D = DIAG(d) where
d and G are the eigenvalues and corresponding eigenvectors
of W^{1}B with the elements of d in
nonincreasing order. Since G is nonsingular, the range of
X=GY_{[n#k]} over rank(Y)=k
includes all n#k matrices with rank k.
 Hence, max_{X}
det((X^{H}WX)^{1}
X^{H}BX) = max_{Y}
det(((GY)^{H}W(GY))^{1}
(GY)^{H}B(GY)) = max_{Y}
det((Y^{H}G^{H}WGY)^{1}
Y^{H}G^{H}BGY) =
max_{Y} det((Y^{H}Y)^{1}
Y^{H}DY).
 From [4.9], the maximum is
prod(d_{1:k}) and is attained by Y=[I; 0].
From this we find that X=GY consists of the first k
columns of G, that is, the eigenvectors corresponding to the k
highest eigenvalues.

4.13 
If W is +ve definite Hermitian
and B is Hermitian and A_{[n#m]} is a given
matrix, then if X_{[n#k]} is restricted such that
rank([A X])=m+k, max_{X} tr(([A
X]^{H}W[A X])^{1} [A
X]^{H}B[A X]) =
tr((A^{H}WA)^{1}A^{H}BA)
+ sum(d_{1:k}) where d are the eigenvalues of
(W^{1}A(A^{H}WA)^{1}A^{H})B
sorted into decreasing order and this is attained by taking the columns of
X to be the corresponding eigenvectors.
 From [4.10] we can find
G_{[n#n]} such that
G^{H}WG = I and
G^{H}BG = D = DIAG(d) where
d and G are the eigenvalues and corresponding eigenvectors
of W^{1}B with the elements of d in
nonincreasing order.
 We can do the QR decomposition
G^{1}A =
U_{[n#m]}R_{[m#m]} =
[U_{[n#m]}
V_{[n#nm]}][R; 0] where [U
V]^{H}[U V] = I
Note that
(A^{H}WA)^{1}
A^{H}BA =
(R^{H}U^{H}G^{H}WGUR)^{1}
R^{H}U^{H}G^{H}BGUR=
R^{1}U^{H}DUR
 Now for any X_{[n#k]} satisfying rank([A
X])=m+k,
 We form the QR decomposition
V^{H}G^{1}X =
K_{[nm#k]}S_{[k#k]} and we
define the upper triangular matrix T_{[m+k#m+k]}
= [R U^{H}G^{1}X; 0
S] giving [A X] = G[U VK]T.
T must be non singular since rank([A X])=m+k.
 Now we have tr(([A X]^{H}W[A
X])^{1} [A X]^{H}B[A X])
= tr((T^{H}[U
VK]^{H}G^{H}WG[U
VK]T)^{1} T^{H}[U
VK]^{H}G^{H}BG[U
VK]T) since [A X] = G[U
VK]T
= tr(T^{1}([U VK]^{H}[U
VK])^{1}
T^{H}T^{H}[U
VK]^{H}D[U VK]T) since T is nonsingular, G^{H}WG
= I
= tr([U VK]^{H}D[U VK]) since [U VK]^{H}[U VK]
= I_{[m+k#m+k]} and from [1.17]
= tr(U^{H}DU) +
tr(K^{H}V^{H}DVK)
[1.19] =
tr((A^{H}WA)^{1}
A^{H}BA) +
tr(K^{H}V^{H}DVK)
 Thus we need to maximize
tr(K^{H}V^{H}DVK) subject to
K^{H}K = I . From [4.11] (with W=I) the maximum equals the sum of the
top k eigenvalues of V^{H}DV and is
attained by setting K to the corresponding eigenvectors. From K
we can derive X = GVK. Note that S does not affect our
objective function and so can be taken to be the identity.
 We have V^{H}DV K = K L where L is a
diagonal matrix containing the top k eigenvalues of
V^{H}DV.
Therefore X L = GVK L = GVV^{H}DV K =
GVV^{H}G^{H}B GVK =
GVV^{H}G^{H}B X which means
that X consists of the eigenvectors of
GVV^{H}G^{H}B with
eigenvalues diag(L).
 We can use the fact that A=GUR, W =
G^{H}G^{1} and
U^{H}U=I to
give A^{H}WA =
R^{H}U^{H}G^{H}G^{H}G^{1}GUR
= R^{H}R. Thus
A(A^{H}WA)^{1}A^{H}
= GUR R^{1}R^{H}
R^{H}U^{H}G^{H}
= GUU^{H}G^{H}
 Therefore, since
UU^{H}+VV^{H}=I , we get
GVV^{H}G^{H}B=
G(IUU^{H})G^{H}B =
(GG^{H}GUU^{H}G^{H})B
=
(W^{1}A(A^{H}WA)^{1}A^{H})B

4.14 
If W=F^{H}F is +ve
definite Hermitian, B is Hermitian and
A_{[n#m]} is a given matrix and the columns of
V are an orthonormal basis for the null space of
A^{H}F^{H}, then if
X_{[n#k]} is restricted such that rank([A
X])=m+k, max_{X} tr(([A
X]^{H}W[A X])^{1} [A
X]^{H}B[A X]) =
tr((A^{H}WA)^{1}A^{H}BA)
+ sum(d_{1:k}) where d are the eigenvalues of
V^{H}F^{H}BF^{1}V
sorted into decreasing order and this is attained by taking the columns of
X to be the corresponding eigenvectors multiplied by
F^{1}V.
 For any X_{[n#k]} we can do a QR decomposition V^{H}FX =
K_{[nm#k]}S_{[k#k]} and we define
the upper triangular matrix T_{[m+k#m+k]} =
[R U^{H}FX; 0
S_{[k#k]}] where FA =
U_{[n#m]}R_{[m#m]} is
also a QR decomposition. We note that
Tis non singular iff rank([A X])=m+k.
 Now, tr(([A X]^{H}W[A X])^{1}
[A X]^{H}B[A X]) =
tr((T^{H}[U
VK]^{H}F^{}^{H}F^{H}FF^{1}[U
VK]T)^{1} T^{H}[U
VK]^{H}F^{H}BF^{1}[U
VK]T) = tr(T^{1}([U
VK]^{H}[U VK])^{1}
T^{H}T^{H}[U
VK]^{H}D[U VK]T) = tr([U
VK]^{H}F^{H}BF^{1}[U
VK]) =
tr(U^{H}F^{H}BF^{1}U)
+
tr(K^{H}V^{H}F^{H}BF^{1}VK)
 The first term is independent of X, while the maximum value of the second
subject to K^{H}K=I is equal to the sum of
the k highest eigenvalues of
V^{H}F^{H}BF^{1}V
with the columns of K the corresponding eigenvectors. A suitable
X is then given by X = F^{1}VK.

4.15 
If W is +ve definite Hermitian
and B is Hermitian and A_{[n#m]} is a given
matrix, then max_{X} det(([A
X]^{H}W[A X])^{1} [A
X]^{H}B[A X]  rank([A
X_{[n#k]}])=m+k) =
det((A^{H}WA)^{1}A^{H}BA)×prod(l_{1:k})
where l are the eigenvalues of
W^{1}B(I  A
(A^{H}BA)^{1}A^{H}B
) sorted into decreasing order and this maximum may be attained by taking the
columns of X to be the corresponding eigenvectors.
 From [4.10] we can find
G_{[n#n]} such that
G^{H}WG = I and
G^{H}BG = D = DIAG(d) where
d and G are the eigenvalues and corresponding eigenvectors
of W^{1}B with the elements of d in
nonincreasing order.
 We can do the QR decomposition
G^{1}A =
U_{[n#m]}R_{[m#m]} =
[U_{[n#m]}
V_{[n#nm]}][R; 0] where [U
V]^{H}[U V] = I
Note that
(A^{H}WA)^{1}
A^{H}BA =
(R^{H}U^{H}G^{H}WGUR)^{1}
R^{H}U^{H}G^{H}BGUR=
R^{1}U^{H}DUR
 Now for any X_{[n#k]} satisfying rank([A
X])=m+k,
 We form the QR decomposition
V^{H}G^{1}X =
K_{[nm#k]}S_{[k#k]} and we define
the upper triangular matrix T_{[m+k#m+k]} =
[R U^{H}G^{1}X; 0 S]
giving [A X] = G[U VK]T. T must
be non singular since rank([A X])=m+k.
 Now we have det(([A X]^{H}W[A
X])^{1} [A X]^{H}B[A X])
= det((T^{H}[U
VK]^{H}G^{H}WG[U
VK]T)^{1} T^{H}[U
VK]^{H}G^{H}BG[U
VK]T) since [A X] = G[U
VK]T
= det(T^{1}([U VK]^{H}[U
VK])^{1}
T^{H}T^{H}[U
VK]^{H}D[U VK]T) since T is nonsingular, G^{H}WG
= I
= det([U VK]^{H}D[U VK]) since [U VK]^{H}[U VK]
= I_{[m+k#m+k]}
= det([U^{H}DU
U^{H}DVK ;
K^{H}V^{H}DU
K^{H}V^{H}DVK )
=
det(U^{H}DU)×det(K^{H}V^{H}DVK
 K^{H}V^{H}DU
(U^{H}DU)^{1}U^{H}DVK)
[3.1] =
det(U^{H}DU)×det(K^{H}
(V^{H}DV  V^{H}DU
(U^{H}DU)^{1}U^{H}DV)
K)
= det((A^{H}WA)^{1}
A^{H}BA)×det(K^{H}
(V^{H}DV  V^{H}DU
(U^{H}DU)^{1}U^{H}DV)
K)
 Thus we need to maximize det(K^{H}
(V^{H}DV  V^{H}DU
(U^{H}DU)^{1}U^{H}DV)
K) subject to K^{H}K = I.
From [4.12] (with
W=I) the maximum equals the product of the top k
eigenvalues of V^{H}DV  V^{H}DU
(U^{H}DU)^{1}U^{H}DV
and is attained by setting K to the corresponding eigenvectors. From
K we can derive X = GVK as one possible X. Note
that S does not affect our objective function and so can be taken to be
the identity.
 We can manipulate V^{H}DV 
V^{H}DU
(U^{H}DU)^{1}U^{H}DV
= V^{H}G^{H}BGV 
V^{H}G^{H}BGU
(U^{H}G^{H}BGU)^{1}U^{H}G^{H}BGV
= V^{H}G^{H}BGV 
V^{H}G^{H}BGUR(R^{H}U^{H}G^{H}BGUR)^{1}R^{H}U^{H}G^{H}BGV
= V^{H}G^{H}BGV 
V^{H}G^{H}BA
(A^{H}BA)^{1}A^{H}BGV
= V^{H}G^{H}(I  BA
(A^{H}BA)^{1}A^{H})BGV
 So if K contains eigenvectors of
V^{H}G^{H}(I  BA
(A^{H}BA)^{1}A^{H})BGV,
we have V^{H}G^{H}(I 
BA
(A^{H}BA)^{1}A^{H})BGV
K = K L for some diagonal L. Hence
X L = GVK L = GVV^{H}G^{H}(I
 BA
(A^{H}BA)^{1}A^{H})BGV
K
= G (IUU^{H})G^{H}(I 
BA
(A^{H}BA)^{1}A^{H})
BX
= (GG^{H} 
GUU^{H}G^{H}  G
G^{H}BA
(A^{H}BA)^{1}A^{H}
+ GUU^{H}G^{H}BA
(A^{H}BA)^{1}A^{H}
)BX
= (GG^{H} 
AR^{1}R^{H}A^{H}
 G G^{H}BA
(A^{H}BA)^{1}A^{H}
+
AR^{1}R^{H}A^{H}BA
(A^{H}BA)^{1}A^{H}
)BX
= (GG^{H} 
AR^{1}R^{H}A^{H}
 G G^{H}BA
(A^{H}BA)^{1}A^{H}
+ AR^{1}R^{H}A^{H}
)BX
= (GG^{H}  G
G^{H}BA
(A^{H}BA)^{1}A^{H} )BX
= (W^{1}B  W^{1}BA
(A^{H}BA)^{1}A^{H}B
)X
= W^{1}B(I  A
(A^{H}BA)^{1}A^{H}B
)X
 So X consists of the eigenvectors of
W^{1}B(I  A
(A^{H}BA)^{1}A^{H}B
) corresponding to the k highest eigenvalues [there must be an easier
proof than this methinks]

4.16 
x^{H}y^{2} =
x^{H}yy^{H}x <=
x^{H}xy^{H}y for any complex
vectors x and y
 If y^{H}y=0 then the inequality is true since
y=0 making both sides of the inequality zero. Hence we assume
that y^{H}y>0.
 0 <= xy^{H}y 
yy^{H}x^{2}
= (y^{H}yx^{H} 
x^{H}yy^{H})(xy^{H}y
 yy^{H}x)
=
y^{H}yx^{H}xy^{H}y

x^{H}yy^{H}xy^{H}y

y^{H}yx^{H}yy^{H}x
+
x^{H}yy^{H}yy^{H}x
= x^{H}xy^{H}y 
x^{H}yy^{H}x 
x^{H}yy^{H}x +
x^{H}yy^{H}x (after dividing
all terms by y^{H}y)
= x^{H}xy^{H}y 
x^{H}yy^{H}x

4.17 
X^{H}Y(Y^{H}Y)^{1}Y^{H}X
<= X^{H}X where <= represents the
Loewner partial order.
 Let a be an arbitrary vector

a^{H}X^{H}Y(Y^{H}Y)^{1}Y^{H}Xa
= a^{H}X^{H} ×
Y(Y^{H}Y)^{1}Y^{H}Xa
which is of the form u^{H}v
 Applying the scalar CauchySchwarz inequality [4.16] gives
(a^{H}X^{H}Y(Y^{H}Y)^{1}Y^{H}Xa)^{2}
<= a^{H}X^{H}Xa
×
a^{H}X^{H}Y(Y^{H}Y)^{1}Y^{H}Y(Y^{H}Y)^{1}Y^{H}Xa
= a^{H}X^{H}Xa
×
a^{H}X^{H}Y(Y^{H}Y)^{1}Y^{H}Xa
 If
a^{H}X^{H}Y(Y^{H}Y)^{1}Y^{H}Xa
> 0 we can divide through to obtain
a^{H}X^{H}Y(Y^{H}Y)^{1}Y^{H}Xa
<= a^{H}X^{H}Xa =  Xa
^{2}
 If, however
a^{H}X^{H}Y(Y^{H}Y)^{1}Y^{H}Xa
= 0, then the inequality is true in any case since the right side is >=
0.
 Hence
a^{H}X^{H}Y(Y^{H}Y)^{1}Y^{H}Xa
<= a^{H}X^{H}Xa for any
vector a.
 Hence
X^{H}Y(Y^{H}Y)^{1}Y^{H}X
<= X^{H}X in the sense of the Loewner partial
order.
