Matrix Reference Manual
Proofs Section 4: Matrix Identities and Equations

Go to: Introduction, Notation, Index

 4.1 diag(abT) = a • b If X = abT, then xi,j = aibj and the diagonal elements of X, xi,i = aibi. This is precisely the ith element of a • b. 4.2 (A-1 + UVH)-1 = A - (AU(I + VHAU)-1)VHA We show that A-1 + UVH and A - (AU(I + VHAU)-1)VHA are inverses by multiplying them together: (A-1 + UVH)( A - (AU(I + VHAU)-1)VHA) = I+UVHA-(U+UVHAU)(I + VHAU)-1VHA = I+UVHA-U(I+VHAU)(I + VHAU)-1VHA = I+UVHA-UVHA = I 4.3 det(A-1 + UVH) = det(I + VHAU) × det(A-1) From  [3.1] we have that det([A-1, -U; VH, I]) = det(A-1) × det(I+VHAU) = det(I)*det(A-1+UI-1VH)  which simplifies to det([A-1, -U; VH, I]) = det(A-1) × det(I+VHAU) = det(A-1+UVH) 4.4 If D[n#n]= DIAG(d) is real with di>=di+1 for all i, and if Y[n#k] is constrained to satisfy YHY=I then (a) maxY tr(YHDY) = sum(d1:k) and is attained by Y=[I; 0] and (b)  minY tr(YHDY) = sum(dn-k+1:n) and is attained by Y=[0; I]. Define vi = sumj(|yij|2). We must have 0<=vi<=1 since Y consists of k columns from an n#n unitary matrix. Also sum(v)=k since each of the k columns of Y has unit length. tr(YHDY) = vTd which is a weighted sum of the di with the constraints on v given above. (a) To maximize this sum, we set v1:k=1 and vk+1:n=0 which gives tr(YHDY) = sum(d1:k). (b) To minimize this sum, we set v1:n-k=0 and vn-k+1:n=1 which gives tr(YHDY) = sum(dn-k+1:n). 4.5 If D[n#n]= DIAG(d) with di>=di+1 for all i, then maxy (yHDy | yHy=1) = d1 and miny (yHDy | yHy=1) = dn and these bounds are attained by y=e1 and y=en respectively. This is a special case of [4.4] with k=1, but may also be proved directly. yHDy = sum(d • yC • y) <= max(d) × sum(yC • y) = d1 × yHy = d1 yHDy = sum(d • yC • y) >= min(d) × sum(yC • y) = dn × yHy = dn 4.6 If D[n#n]= DIAG(d) with di>=di+1 for all i, then minW:[n#k] maxy (yHDy | yHy=1 and WHy=0) = dn-k and this bound is attained by W=[0; I[k#k]] and y=en-k. First define V =  [0; I[k#k]] and note that VHy=0 implies that y=PPHy where P=(I[n-k#n-k]; 0) since PPH+VVH=I For any fixed W[k#k], maxy (yHDy | yHy=1 and WHy=0)  >= maxy (yHDy | yHy=1 and WHy=0 and VHy=0) =maxy (yHPPHDPPHy | yHy=1 and WHy=0 and VHy=0) >= min(d1:n-k) from [4.5] since PHDP=diag(d1:n-k) =dn-k since the entries of d are in descending order. From [4.5] this bound is attained when PHy=en-k which is in turn when y=en-k. Since this is true for any W[k#k], we must have minW:[n#k] maxy (yHDy | yHy=1 and WHy=0) >= dn-k But if we choose W=V, then we attain this bound with y=en-k. where ei is the i'th column of the identity matrix. 4.7 If H[n#n]=UDUH is hermitian, U is unitary and D=diag(d)=diag(eig(H)), then minW maxx (xHHx | xHx=1 and W[n#k]Hx=0) =  dn-k and this bound is attained by W=U:,n-k+1:n and y=un-k  Set y=UHx, then xHHx = yHDy and xHx = yHy minW maxx (xHHx | xHx=1 and W[n#k]Hx=0) = minW maxy (yHDy | yHy=1 and W[n#k]HUy=0) From [4.6] we attain the bound with UHW= [0; I[k#k]] and y=en-k. Hence W = U[0; I[k#k]] = U:,n-k+1:n and x = Uen-k = un-k. 4.8 If H[n#n]=UDUH is hermitian, U is unitary and D=diag(d)=diag(eig(H)) contains the eigenvalues in decreasing order, then maxx (xHHx | xHx=1) = d1 and  minx (xHHx | xHx=1) = dn and these bounds are attained by x=u1 and y=un respectively. From [4.5] and setting y=UHx,  d1= maxy (yHDy | yHy=1) = maxx ((UHx)HD(UHx) | (UHx)H(UHx)=1) = maxx (xHHx | xHx=1) Similarly, dn= miny (yHDy | yHy=1) = minx ((UHx)HD(UHx) | (UHx)H(UHx)=1) = minx (xHHx | xHx=1) 4.9 If D[n#n]= DIAG(d) is real with di>=di+1>0 for all i, and if Y[n#k] is constrained to have rank k, then (a) max(det(YHDY)/det(YHY)) = prod(d1:k) and is attained by Y=[I; 0] and (b)  min(det(YHDY)/det(YHY)) = prod(dn-k+1:n) and is attained by Y=[0; I]. At an extreme value, the complex gradient must be zero. Hence 0 = d(ln(det(YHDY)/det(YHY)))/dYC = d(ln(det(YHDY)))/dYC - d(ln(det(YHY)))/dYC = (DY(YHDY)-1 - Y(YHY)-1): [2.16] Hence DY(YHDY)-1 = Y(YHY)-1 and so DY = Y(YHY)-1YHDY Taking the Hermitian transpose, we get YHDY(YHY)-1 YH = YH D Since rank(Y)=k, YH must contain k linearly independent columns and so the columns of YH contain a complete set of eigenvectors for the k#k matrix YHDY(YHY)-1 with eigenvalues given by the corresponding entries in d.  det(YHDY)/det(YHY) = det(YHDY(YHY)-1) equals the product of the eigenvalues and is therefore the product of k elements of d. Since the elements of d and in decreasing order, the maximum possible product is prod(d1:k) and the minimum is prod(dn-k+1:n). These values are attained by Y=[I; 0] and Y=[0; I] respectively. 4.1 If A is hermitian and B is +ve definite hermitian there exists X such that XHBX=I and XHAX=D where X and D may be obtained from the eigendecomposition B-1A=XDX-1 with D=DIAG(d) a diagonal matrix of eigenvalues in non-increasing order. If S is the +ve definite hermitian square root of B-1 (i.e. S2B=I)  then the eigendecomposition of the hermitian matrix SAS is SAS=WDWH where W=S-1X is unitary. Since B is +ve definite hermitian, we can decompose it as B = ULUH where U is unitary and L is diagonal with real diagonal entries >0. Define the real diagonal matrix M = L-½. The matrix MHUHAUM is hermitian and can therefore be decomposed as MHUHAUM = VDVH where V is unitary and D=DIAG(d) is real. We may order the columns of V so that the elements of d are in non-increasing order. Define X = UMV. Then XHBX = VHMHUHBUMV = VHMHLMV = VHV = I and XHAX = VHMHUHAUMV = VHVDVHV = D B-1A = (ULUH)-1(UM-1VDVHM-1UH) = (UM2UH) (UM-1VDVHM-1UH) = UMVDVHM-1UH = (UMV) D (UMV)-1 =  XDX-1 The hermitian +ve definite square root of B-1 is S = UMUH. Therefore SAS = (UMUH) (UM-1VDVHM-1UH) (UMUH) = UVDVHUH = (UV) D (UV)H=WDWH. If A and B are real, then all the above matrices can be taken as real. 4.11 If W is +ve definite Hermitian and B is Hermitian, then if X[n#k] is restricted to have rank(X)=k,  maxX tr((XHWX)-1 XHBX) = sum(d1:k) where d are the eigenvalues of W-1B sorted into decreasing order and this is attained by taking the columns of X to be the corresponding eigenvectors. From [4.10] we can find G[n#n] such that GHWG = I and GHBG = D = DIAG(d) where d and G  are the eigenvalues and corresponding eigenvectors of W-1B with the elements of d in non-increasing order. Since G is non-singular, the range of X=GY[n#k] over rank(Y)=k includes all n#k matrices with rank k. Hence, maxX tr((XHWX)-1 XHBX) = maxY tr(((GY)HW(GY))-1 (GY)HB(GY)) = maxY tr((YHGHWGY)-1 YHGHBGY)  = maxY tr((YHY)-1 YHDY). From [1.9] any Y[n#k] with rank k may be decomposed as Y=Q[n#k]R[k#k] with QHQ=I and R non-singular. Hence, maxY tr((YHY)-1 YHDY) = maxQ,R tr((RHQHQR)-1 RHQHDQR) = maxQ,R tr(R-1R-H RHQHDQR) = maxQ,R tr(R-1QHDQR) = maxQ,R tr(QHDQRR-1) = maxQ,R tr(QHDQ) = maxQ tr(QHDQ) From [4.4], the maximum is sum(d1:k) and is attained by Y=Q=[I; 0]. From this we find that X=GY consists of the first k columns of G, that is, the eigenvectors corresponding to the k highest eigenvalues. 4.12 If W is +ve definite Hermitian and B is Hermitian, then if X[n#k] is restricted to have rank(X)=k,  maxX det((XHWX)-1 XHBX) = prod(d1:k) where d are the eigenvalues of W-1B sorted into decreasing order and this is attained by taking the columns of X to be the corresponding eigenvectors. From [4.10] we can find G[n#n] such that GHWG = I and GHBG = D = DIAG(d) where d and G  are the eigenvalues and corresponding eigenvectors of W-1B with the elements of d in non-increasing order. Since G is non-singular, the range of X=GY[n#k] over rank(Y)=k includes all n#k matrices with rank k. Hence, maxX det((XHWX)-1 XHBX) = maxY det(((GY)HW(GY))-1 (GY)HB(GY)) = maxY det((YHGHWGY)-1 YHGHBGY)  = maxY det((YHY)-1 YHDY). From [4.9], the maximum is prod(d1:k) and is attained by Y=[I; 0]. From this we find that X=GY consists of the first k columns of G, that is, the eigenvectors corresponding to the k highest eigenvalues. 4.13 If W is +ve definite Hermitian and B is Hermitian and A[n#m] is a given matrix, then if X[n#k] is restricted such that rank([A X])=m+k,  maxX tr(([A X]HW[A X])-1 [A X]HB[A X]) = tr((AHWA)-1AHBA) + sum(d1:k) where d are the eigenvalues of (W-1-A(AHWA)-1AH)B sorted into decreasing order and this is attained by taking the columns of X to be the corresponding eigenvectors. From [4.10] we can find G[n#n] such that GHWG = I and GHBG = D = DIAG(d) where d and G  are the eigenvalues and corresponding eigenvectors of W-1B with the elements of d in non-increasing order. We can do the QR decomposition G-1A = U[n#m]R[m#m] = [U[n#m] V[n#n-m]][R; 0] where [U V]H[U V] = INote that (AHWA)-1 AHBA = (RHUHGHWGUR)-1 RHUHGHBGUR= R-1UHDUR Now for any X[n#k] satisfying rank([A X])=m+k, We form the QR decomposition VHG-1X = K[n-m#k]S[k#k] and we define the upper triangular matrix  T[m+k#m+k] = [R  UHG-1X; 0 S] giving [A X] = G[U  VK]T.  T must be non singular since rank([A X])=m+k. Now we have tr(([A X]HW[A X])-1 [A X]HB[A X]) =  tr((TH[U VK]HGHWG[U VK]T)-1 TH[U VK]HGHBG[U VK]T) since [A X] = G[U  VK]T =  tr(T-1([U VK]H[U VK])-1 T-HTH[U VK]HD[U VK]T) since T is non-singular, GHWG = I =  tr([U VK]HD[U VK]) since [U  VK]H[U  VK] = I[m+k#m+k] and from [1.17] =  tr(UHDU) + tr(KHVHDVK) [1.19]=  tr((AHWA)-1 AHBA) + tr(KHVHDVK) Thus we need to maximize tr(KHVHDVK) subject to KHK = I . From [4.11] (with W=I) the maximum equals the sum of the top k eigenvalues of VHDV and is attained by setting K to the corresponding eigenvectors. From K we can derive X = GVK. Note that S does not affect our objective function and so can be taken to be the identity. We have VHDV K = K L where L is a diagonal matrix containing the top  k eigenvalues of VHDV. Therefore X L =  GVK L = GVVHDV K = GVVHGHB GVK  = GVVHGHB X which means that X consists of the eigenvectors of GVVHGHB with eigenvalues diag(L). We can use the fact that A=GUR, W = G-HG-1 and UHU=I to give AHWA = RHUHGHG-HG-1GUR = RHR. Thus A(AHWA)-1AH = GUR R-1R-H RHUHGH = GUUHGH Therefore, since UUH+VVH=I , we get GVVHGHB= G(I-UUH)GHB = (GGH-GUUHGH)B = (W-1-A(AHWA)-1AH)B 4.14 If W=FHF is +ve definite Hermitian, B is Hermitian and A[n#m] is a given matrix and the columns of V are an orthonormal basis for the null space of AHFH, then if X[n#k] is restricted such that rank([A X])=m+k,  maxX tr(([A X]HW[A X])-1 [A X]HB[A X]) = tr((AHWA)-1AHBA) + sum(d1:k) where d are the eigenvalues of VHF-HBF-1V sorted into decreasing order and this is attained by taking the columns of X to be the corresponding eigenvectors multiplied by F-1V. For any X[n#k] we can do a QR decomposition VHFX = K[n-m#k]S[k#k] and we define the upper triangular matrix T[m+k#m+k] = [R  UHFX; 0 S[k#k]] where FA = U[n#m]R[m#m] is also a QR decomposition. We note that  Tis non singular iff rank([A X])=m+k. Now, tr(([A X]HW[A X])-1 [A X]HB[A X]) =  tr((TH[U VK]HF-HFHFF-1[U VK]T)-1 TH[U VK]HF-HBF-1[U VK]T)  =  tr(T-1([U VK]H[U VK])-1 T-HTH[U VK]HD[U VK]T)  =  tr([U VK]HF-HBF-1[U VK])  =  tr(UHF-HBF-1U) + tr(KHVHF-HBF-1VK) The first term is independent of X, while the maximum value of the second subject to KHK=I is equal to the sum of the k highest eigenvalues of VHF-HBF-1V with the columns of K the corresponding eigenvectors. A suitable X is then given by X = F-1VK. 4.15 If W is +ve definite Hermitian and B is Hermitian and A[n#m] is a given matrix, then  maxX det(([A X]HW[A X])-1 [A X]HB[A X] | rank([A X[n#k]])=m+k) = det((AHWA)-1AHBA)×prod(l1:k) where l are the eigenvalues of  W-1B(I  - A (AHBA)-1AHB  ) sorted into decreasing order and this maximum may be attained by taking the columns of X to be the corresponding eigenvectors. From [4.10] we can find G[n#n] such that GHWG = I and GHBG = D = DIAG(d) where d and G  are the eigenvalues and corresponding eigenvectors of W-1B with the elements of d in non-increasing order. We can do the QR decomposition G-1A = U[n#m]R[m#m] = [U[n#m] V[n#n-m]][R; 0] where [U V]H[U V] = INote that (AHWA)-1 AHBA = (RHUHGHWGUR)-1 RHUHGHBGUR= R-1UHDUR Now for any X[n#k] satisfying rank([A X])=m+k, We form the QR decomposition VHG-1X = K[n-m#k]S[k#k] and we define the upper triangular matrix  T[m+k#m+k] = [R  UHG-1X; 0 S] giving [A X] = G[U  VK]T.  T must be non singular since rank([A X])=m+k. Now we have det(([A X]HW[A X])-1 [A X]HB[A X]) =  det((TH[U VK]HGHWG[U VK]T)-1 TH[U VK]HGHBG[U VK]T) since [A X] = G[U  VK]T =  det(T-1([U VK]H[U VK])-1 T-HTH[U VK]HD[U VK]T) since T is non-singular, GHWG = I =  det([U VK]HD[U VK]) since [U  VK]H[U  VK] = I[m+k#m+k] =  det([UHDU  UHDVK ;  KHVHDU  KHVHDVK ) = det(UHDU)×det(KHVHDVK - KHVHDU (UHDU)-1UHDVK)  [3.1]= det(UHDU)×det(KH (VHDV - VHDU (UHDU)-1UHDV) K) = det((AHWA)-1 AHBA)×det(KH (VHDV - VHDU (UHDU)-1UHDV) K) Thus we need to maximize det(KH (VHDV - VHDU (UHDU)-1UHDV) K) subject to KHK = I. From [4.12] (with W=I) the maximum equals the product of the top k eigenvalues of VHDV - VHDU (UHDU)-1UHDV and is attained by setting K to the corresponding eigenvectors. From K we can derive X = GVK as one possible X. Note that S does not affect our objective function and so can be taken to be the identity. We can manipulate VHDV - VHDU (UHDU)-1UHDV = VHGHBGV - VHGHBGU (UHGHBGU)-1UHGHBGV = VHGHBGV - VHGHBGUR(RHUHGHBGUR)-1RHUHGHBGV = VHGHBGV - VHGHBA (AHBA)-1AHBGV = VHGH(I - BA (AHBA)-1AH)BGV So if K contains eigenvectors of VHGH(I - BA (AHBA)-1AH)BGV, we have VHGH(I - BA (AHBA)-1AH)BGV K = K L for some diagonal L. Hence X L = GVK L = GVVHGH(I - BA (AHBA)-1AH)BGV K = G (I-UUH)GH(I - BA (AHBA)-1AH) BX = (GGH  - GUUHGH  - G GHBA (AHBA)-1AH  + GUUHGHBA (AHBA)-1AH )BX = (GGH  - AR-1R-HAH  - G GHBA (AHBA)-1AH  + AR-1R-HAHBA (AHBA)-1AH )BX = (GGH  - AR-1R-HAH  - G GHBA (AHBA)-1AH  + AR-1R-HAH )BX = (GGH  - G GHBA (AHBA)-1AH )BX = (W-1B  - W-1BA (AHBA)-1AHB )X = W-1B(I  - A (AHBA)-1AHB )X So X consists of the eigenvectors of W-1B(I  - A (AHBA)-1AHB  ) corresponding to the k highest eigenvalues [there must be an easier proof than this methinks] 4.16 |xHy|2 = xHyyHx <= xHxyHy for any complex vectors x and y If yHy=0 then the inequality is true since y=0 making both sides of the inequality zero. Hence we assume that yHy>0. 0 <= |xyHy - yyHx|2 = (yHyxH - xHyyH)(xyHy - yyHx) = yHyxHxyHy - xHyyHxyHy - yHyxHyyHx + xHyyHyyHx = xHxyHy - xHyyHx - xHyyHx + xHyyHx (after dividing all terms by yHy) = xHxyHy - xHyyHx 4.17 XHY(YHY)-1YHX <= XHX where <= represents the Loewner partial order. Let a be an arbitrary vector aHXHY(YHY)-1YHXa = aHXH  × Y(YHY)-1YHXa which is of the form uHv Applying the scalar Cauchy-Schwarz inequality [4.16] gives (aHXHY(YHY)-1YHXa)2 <= aHXHXa  × aHXHY(YHY)-1YHY(YHY)-1YHXa = aHXHXa  × aHXHY(YHY)-1YHXa If aHXHY(YHY)-1YHXa > 0 we can divide through to obtain aHXHY(YHY)-1YHXa <= aHXHXa = | Xa |2 If, however aHXHY(YHY)-1YHXa = 0, then the inequality is true in any case since the right side is >= 0. Hence aHXHY(YHY)-1YHXa <= aHXHXa for any vector a. Hence XHY(YHY)-1YHX <= XHX in the sense of the Loewner partial order.

This page is part of The Matrix Reference Manual. Copyright © 1998-2021 Mike Brookes, Imperial College, London, UK. See the file gfl.html for copying instructions. Please send any comments or suggestions to "mike.brookes" at "imperial.ac.uk".
Updated: \$Id: proof004.html 11291 2021-01-05 18:26:10Z dmb \$