Matrix Manual: Matrix Calculus

Matrix Reference Manual
Proofs Section 4: Matrix Identities and Equations

4.1	diag(ab^T) = a • b If X = ab^T, then x_i,j = a_ib_j and the diagonal elements of X, x_i,i = a_ib_i. This is precisely the i^th element of a • b.
4.2	(A^-1 + UV^H)^-1 = A - (AU(I + V^HAU)^-1)V^HA We show that A^-1 + UV^H and A - (AU(I + V^HAU)^-1)V^HA are inverses by multiplying them together: (A^-1 + UV^H)( A - (AU(I + V^HAU)^-1)V^HA) = I+UV^HA-(U+UV^HAU)(I + V^HAU)^-1V^HA = I+UV^HA-U(I+V^HAU)(I + V^HAU)^-1V^HA = I+UV^HA-UV^HA = I
4.3	det(A^-1 + UV^H) = det(I + V^HAU) × det(A^-1) From [3.1] we have that det([A^-1, -U; V^H, I]) = det(A^-1) × det(I+V^HAU) = det(I)det(A^-1+UI^-1V^H) which simplifies to det([A^-1, -U; V^H, I]) = det(A^-1) × det(I+V*^HAU) = det(A^-1+UV^H)
4.4	If D_[n#n]= DIAG(d) is real with d_i≥d_i₊₁ for all i, and if Y_[n#k] is constrained to satisfy Y^HY=I then (a) max_Y tr(Y^HDY) = sum(d_1:k) and is attained by Y=[I; 0] and (b) min_Y tr(Y^HDY) = sum(d_n-k+1:n) and is attained by Y=[0; I]. Define v_i = sum_j(\|y_ij\|²). We must have 0≤v_i≤1 since Y consists of k columns from an n#n unitary matrix. Also sum(v)=k since each of the k columns of Y has unit length. tr(Y^HDY) = v^Td which is a weighted sum of the d_i with the constraints on v given above. (a) To maximize this sum, we set v_1:k=1 and v_k_+1:n=0 which gives tr(Y^HDY) = sum(d_1:k). (b) To minimize this sum, we set v_1:n-k=0 and v_n-k_+1:n=1 which gives tr(Y^HDY) = sum(d_n-k+1:n).
4.5	If D_[n#n]= DIAG(d) with d_i≥d_i₊₁ for all i, then max_y (y^HDy \| y^Hy=1) = d₁ and min_y (y^HDy \| y^Hy=1) = d_n and these bounds are attained by y=e₁ and y=e_n respectively. This is a special case of [4.4] with k=1, but may also be proved directly. y^HDy = sum(d • y^C • y) ≤ max(d) × sum(y^C • y) = d₁ × y^Hy = d₁ y^HDy = sum(d • y^C • y) ≥ min(d) × sum(y^C • y) = d_n × y^Hy = d_n
4.6	If D_[n#n]= DIAG(d) with d_i≥d_i₊₁ for all i, then min_W:[n#k] max_y (y^HDy \| y^Hy=1 and W^Hy=0) = d_n-k and this bound is attained by W=[0; I_[k#k]] and y=e_n-k. First define V = [0; I_[k#k]] and note that V^Hy=0 implies that y=PP^Hy where P=(I_[n-k#n-k]; 0) since PP^H+VV^H=I For any fixed W_[k#k], max_y (y^HDy \| y^Hy=1 and W^Hy=0) ≥ max_y (y^HDy \| y^Hy=1 and W^Hy=0 and V^Hy=0) =max_y (y^HPP^HDPP^Hy \| y^Hy=1 and W^Hy=0 and V^Hy=0) ≥ min(d_1:n-k) from [4.5] since P^HDP=diag(d_1:n-k) =d_n-k since the entries of d are in descending order. From [4.5] this bound is attained when P^Hy=e_n-k which is in turn when y=e_n-k. Since this is true for any W_[k#k], we must have min_W:[n#k] max_y (y^HDy \| y^Hy=1 and W^Hy=0) ≥ d_n-k But if we choose W=V, then we attain this bound with y=e_n-k. where e_i is the i'th column of the identity matrix.
4.7	If H_[n#n]=UDU^H is hermitian, U is unitary and D=diag(d)=diag(eig(H)), then min_W max_x (x^HHx \| x^Hx=1 and W_[n#k]^Hx=0) = d_n-k and this bound is attained by W=U_:,n-k+1:n and y=u_n-k Set y=U^Hx, then x^HHx = y^HDy and x^Hx = y^Hy min_W max_x (x^HHx \| x^Hx=1 and W_[n#k]^Hx=0) = min_W max_y (y^HDy \| y^Hy=1 and W_[n#k]^HUy=0) From [4.6] we attain the bound with U^HW= [0; I_[k#k]] and y=e_n-k. Hence W = U[0; I_[k#k]] = U_:,n-k+1:n and x = Ue_n-k = u_n-k.
4.8	If H_[n#n]=UDU^H is hermitian, U is unitary and D=diag(d)=diag(eig(H)) contains the eigenvalues in decreasing order, then max_x (x^HHx \| x^Hx=1) = d₁ and min_x (x^HHx \| x^Hx=1) = d_n and these bounds are attained by x=u₁ and y=u_n respectively. From [4.5] and setting y=U^Hx, d₁= max_y (y^HDy \| y^Hy=1) = max_x ((U^Hx)^HD(U^Hx) \| (U^Hx)^H(U^Hx)=1) = max_x (x^HHx \| x^Hx=1) Similarly, d_n= min_y (y^HDy \| y^Hy=1) = min_x ((U^Hx)^HD(U^Hx) \| (U^Hx)^H(U^Hx)=1) = min_x (x^HHx \| x^Hx=1)
4.9	If D_[n#n]= DIAG(d) is real with d_i≥d_i₊₁>0 for all i, and if Y_[n#k] is constrained to have rank k, then (a) max(det(Y^HDY)/det(Y^HY)) = prod(d_1:k) and is attained by Y=[I; 0] and (b) min(det(Y^HDY)/det(Y^HY)) = prod(d_n-k+1:n) and is attained by Y=[0; I]. At an extreme value, the complex gradient must be zero. Hence 0 = d(ln(det(Y^HDY)/det(Y^HY)))/dY^C = d(ln(det(Y^HDY)))/dY^C - d(ln(det(Y^HY)))/dY^C = (DY(Y^HDY)^-1 - Y(Y^HY)^-1): [2.16] Hence DY(Y^HDY)^-1 = Y(Y^HY)^-1 and so DY = Y(Y^HY)^-1Y^HDY Taking the Hermitian transpose, we get Y^HDY(Y^HY)^-1 Y^H = Y^H D Since rank(Y)=k, Y^H must contain k linearly independent columns and so the columns of Y^H contain a complete set of eigenvectors for the k#k matrix Y^HDY(Y^HY)^-1 with eigenvalues given by the corresponding entries in d. det(Y^HDY)/det(Y^HY) = det(Y^HDY(Y^HY)^-1) equals the product of the eigenvalues and is therefore the product of k elements of d. Since the elements of d and in decreasing order, the maximum possible product is prod(d_1:k) and the minimum is prod(d_n-k+1:n). These values are attained by Y=[I; 0] and Y=[0; I] respectively.
4.10	If A is hermitian and B is +ve definite hermitian there exists X such that X^HBX=I and X^HAX=D where X and D may be obtained from the eigendecomposition B^-1A=XDX^-1 with D=DIAG(d) a diagonal matrix of eigenvalues in non-increasing order. If S is the +ve definite hermitian square root of B^-1 (i.e. S²B=I) then the eigendecomposition of the hermitian matrix SAS is SAS=WDW^H where W=S^-1X is unitary. Since B is +ve definite hermitian, we can decompose it as B = ULU^H where U is unitary and L is diagonal with real diagonal entries >0. Define the real diagonal matrix M = L^-½. The matrix M^HU^HAUM is hermitian and can therefore be decomposed as M^HU^HAUM = VDV^H where V is unitary and D=DIAG(d) is real. We may order the columns of V so that the elements of d are in non-increasing order. Define X = UMV. Then X^HBX = V^HM^HU^HBUMV = V^HM^HLMV = V^HV = I and X^HAX = V^HM^HU^HAUMV = V^HVDV^HV = D B^-1A = (ULU^H)^-1(UM^-1VDV^HM^-1U^H) = (UM²U^H) (UM^-1VDV^HM^-1U^H) = UMVDV^HM^-1U^H = (UMV) D (UMV)^-1 = XDX^-1 The hermitian +ve definite square root of B^-1 is S = UMU^H. Therefore SAS = (UMU^H) (UM^-1VDV^HM^-1U^H) (UMU^H) = UVDV^HU^H = (UV) D (UV)^H=WDW^H. If A and B are real, then all the above matrices can be taken as real.
4.11	If W is +ve definite Hermitian and B is Hermitian, then if X_[n#k] is restricted to have rank(X)=k, max_X tr((X^HWX)^-1 X^HBX) = sum(d_1:k) where d are the eigenvalues of W^-1B sorted into decreasing order and this is attained by taking the columns of X to be the corresponding eigenvectors. From [4.10] we can find G_[n#n] such that G^HWG = I and G^HBG = D = DIAG(d) where d and G are the eigenvalues and corresponding eigenvectors of W^-1B with the elements of d in non-increasing order. Since G is non-singular, the range of X=GY_[n#k] over rank(Y)=k includes all n#k matrices with rank k. Hence, max_X tr((X^HWX)^-1 X^HBX) = max_Y tr(((GY)^HW(GY))^-1 (GY)^HB(GY)) = max_Y tr((Y^HG^HWGY)^-1 Y^HG^HBGY) = max_Y tr((Y^HY)^-1 Y^HDY). From [1.9] any Y_[n#k] with rank k may be decomposed as Y=Q_[n#k]R_[k#k] with Q^HQ=I and R non-singular. Hence, max_Y tr((Y^HY)^-1 Y^HDY) = max_Q,R tr((R^HQ^HQR)^-1 R^HQ^HDQR) = max_Q,R tr(R^-1R^-H R^HQ^HDQR) = max_Q,R tr(R^-1Q^HDQR) = max_Q,R tr(Q^HDQRR^-1) = max_Q,R tr(Q^HDQ) = max_Q tr(Q^HDQ) From [4.4], the maximum is sum(d_1:k) and is attained by Y=Q=[I; 0]. From this we find that X=GY consists of the first k columns of G, that is, the eigenvectors corresponding to the k highest eigenvalues.
4.12	If W is +ve definite Hermitian and B is Hermitian, then if X_[n#k] is restricted to have rank(X)=k, max_X det((X^HWX)^-1 X^HBX) = prod(d_1:k) where d are the eigenvalues of W^-1B sorted into decreasing order and this is attained by taking the columns of X to be the corresponding eigenvectors. From [4.10] we can find G_[n#n] such that G^HWG = I and G^HBG = D = DIAG(d) where d and G are the eigenvalues and corresponding eigenvectors of W^-1B with the elements of d in non-increasing order. Since G is non-singular, the range of X=GY_[n#k] over rank(Y)=k includes all n#k matrices with rank k. Hence, max_X det((X^HWX)^-1 X^HBX) = max_Y det(((GY)^HW(GY))^-1 (GY)^HB(GY)) = max_Y det((Y^HG^HWGY)^-1 Y^HG^HBGY) = max_Y det((Y^HY)^-1 Y^HDY). From [4.9], the maximum is prod(d_1:k) and is attained by Y=[I; 0]. From this we find that X=GY consists of the first k columns of G, that is, the eigenvectors corresponding to the k highest eigenvalues.
4.13	If W is +ve definite Hermitian and B is Hermitian and A_[n#m] is a given matrix, then if X_[n#k] is restricted such that rank([A X])=m+k, max_X tr(([A X]^HW[A X])^-1 [A X]^HB[A X]) = tr((A^HWA)^-1A^HBA) + sum(d_1:k) where d are the eigenvalues of (W^-1-A(A^HWA)^-1A^H)B sorted into decreasing order and this is attained by taking the columns of X to be the corresponding eigenvectors. From [4.10] we can find G_[n#n] such that G^HWG = I and G^HBG = D = DIAG(d) where d and G are the eigenvalues and corresponding eigenvectors of W^-1B with the elements of d in non-increasing order. We can do the QR decomposition G^-1A = U_[n#m]R_[m#m] = [U_[n#m] V_[n#n-m]][R; 0] where [U V]^H[U V] = I Note that (A^HWA)^-1 A^HBA = (R^HU^HG^HWGUR)^-1 R^HU^HG^HBGUR= R^-1U^HDUR Now for any X_[n#k] satisfying rank([A X])=m+k, We form the QR decomposition V^HG^-1X = K_[n-m#k]S_[k#k] and we define the upper triangular matrix T_[m+k#m+k] = [R U^HG^-1X; 0 S] giving [A X] = G[U VK]T. T must be non singular since rank([A X])=m+k. Now we have tr(([A X]^HW[A X])^-1 [A X]^HB[A X]) = tr((T^H[U VK]^HG^HWG[U VK]T)^-1 T^H[U VK]^HG^HBG[U VK]T) since [A X] = G[U VK]T = tr(T^-1([U VK]^H[U VK])^-1 T^-HT^H[U VK]^HD[U VK]T) since T is non-singular, G^HWG = I = tr([U VK]^HD[U VK]) since [U VK]^H[U VK] = I_[m+k#m+k] and from [1.17] = tr(U^HDU) + tr(K^HV^HDVK) [1.19] = tr((A^HWA)^-1 A^HBA) + tr(K^HV^HDVK) Thus we need to maximize tr(K^HV^HDVK) subject to K^HK = I . From [4.11] (with W=I) the maximum equals the sum of the top k eigenvalues of V^HDV and is attained by setting K to the corresponding eigenvectors. From K we can derive X = GVK. Note that S does not affect our objective function and so can be taken to be the identity. We have V^HDV K = K L where L is a diagonal matrix containing the top k eigenvalues of V^HDV. Therefore X L = GVK L = GVV^HDV K = GVV^HG^HB GVK = GVV^HG^HB X which means that X consists of the eigenvectors of GVV^HG^HB with eigenvalues diag(L). We can use the fact that A=GUR, W = G^-HG^-1 and U^HU=I to give A^HWA = R^HU^HG^HG^-HG^-1GUR = R^HR. Thus A(A^HWA)^-1A^H = GUR R^-1R^-H R^HU^HG^H = GUU^HG^H Therefore, since UU^H+VV^H=I , we get GVV^HG^HB= G(I-UU^H)G^HB = (GG^H-GUU^HG^H)B = (W^-1-A(A^HWA)^-1A^H)B
4.14	If W=F^HF is +ve definite Hermitian, B is Hermitian and A_[n#m] is a given matrix and the columns of V are an orthonormal basis for the null space of A^HF^H, then if X_[n#k] is restricted such that rank([A X])=m+k, max_X tr(([A X]^HW[A X])^-1 [A X]^HB[A X]) = tr((A^HWA)^-1A^HBA) + sum(d_1:k) where d are the eigenvalues of V^HF^-HBF^-1V sorted into decreasing order and this is attained by taking the columns of X to be the corresponding eigenvectors multiplied by F^-1V. For any X_[n#k] we can do a QR decomposition V^HFX = K_[n-m#k]S_[k#k] and we define the upper triangular matrix T_[m+k#m+k] = [R U^HFX; 0 S_[k#k]] where FA = U_[n#m]R_[m#m] is also a QR decomposition. We note that Tis non singular iff rank([A X])=m+k. Now, tr(([A X]^HW[A X])^-1 [A X]^HB[A X]) = tr((T^H[U VK]^HF^-^HF^HFF^-1[U VK]T)^-1 T^H[U VK]^HF^-HBF^-1[U VK]T) = tr(T^-1([U VK]^H[U VK])^-1 T^-HT^H[U VK]^HD[U VK]T) = tr([U VK]^HF^-HBF^-1[U VK]) = tr(U^HF^-HBF^-1U) + tr(K^HV^HF^-HBF^-1VK) The first term is independent of X, while the maximum value of the second subject to K^HK=I is equal to the sum of the k highest eigenvalues of V^HF^-HBF^-1V with the columns of K the corresponding eigenvectors. A suitable X is then given by X = F^-1VK.
4.15	If W is +ve definite Hermitian and B is Hermitian and A_[n#m] is a given matrix, then max_X det(([A X]^HW[A X])^-1 [A X]^HB[A X] \| rank([A X_[n#k]])=m+k) = det((A^HWA)^-1A^HBA)×prod(l_1:k) where l are the eigenvalues of W^-1B(I - A (A^HBA)^-1A^HB ) sorted into decreasing order and this maximum may be attained by taking the columns of X to be the corresponding eigenvectors. From [4.10] we can find G_[n#n] such that G^HWG = I and G^HBG = D = DIAG(d) where d and G are the eigenvalues and corresponding eigenvectors of W^-1B with the elements of d in non-increasing order. We can do the QR decomposition G^-1A = U_[n#m]R_[m#m] = [U_[n#m] V_[n#n-m]][R; 0] where [U V]^H[U V] = I Note that (A^HWA)^-1 A^HBA = (R^HU^HG^HWGUR)^-1 R^HU^HG^HBGUR= R^-1U^HDUR Now for any X_[n#k] satisfying rank([A X])=m+k, We form the QR decomposition V^HG^-1X = K_[n-m#k]S_[k#k] and we define the upper triangular matrix T_[m+k#m+k] = [R U^HG^-1X; 0 S] giving [A X] = G[U VK]T. T must be non singular since rank([A X])=m+k. Now we have det(([A X]^HW[A X])^-1 [A X]^HB[A X]) = det((T^H[U VK]^HG^HWG[U VK]T)^-1 T^H[U VK]^HG^HBG[U VK]T) since [A X] = G[U VK]T = det(T^-1([U VK]^H[U VK])^-1 T^-HT^H[U VK]^HD[U VK]T) since T is non-singular, G^HWG = I = det([U VK]^HD[U VK]) since [U VK]^H[U VK] = I_[m+k#m+k] = det([U^HDU U^HDVK ; K^HV^HDU K^HV^HDVK ) = det(U^HDU)×det(K^HV^HDVK - K^HV^HDU (U^HDU)^-1U^HDVK) [3.1] = det(U^HDU)×det(K^H (V^HDV - V^HDU (U^HDU)^-1U^HDV) K) = det((A^HWA)^-1 A^HBA)×det(K^H (V^HDV - V^HDU (U^HDU)^-1U^HDV) K) Thus we need to maximize det(K^H (V^HDV - V^HDU (U^HDU)^-1U^HDV) K) subject to K^HK = I. From [4.12] (with W=I) the maximum equals the product of the top k eigenvalues of V^HDV - V^HDU (U^HDU)^-1U^HDV and is attained by setting K to the corresponding eigenvectors. From K we can derive X = GVK as one possible X. Note that S does not affect our objective function and so can be taken to be the identity. We can manipulate V^HDV - V^HDU (U^HDU)^-1U^HDV = V^HG^HBGV - V^HG^HBGU (U^HG^HBGU)^-1U^HG^HBGV = V^HG^HBGV - V^HG^HBGUR(R^HU^HG^HBGUR)^-1R^HU^HG^HBGV = V^HG^HBGV - V^HG^HBA (A^HBA)^-1A^HBGV = V^HG^H(I - BA (A^HBA)^-1A^H)BGV So if K contains eigenvectors of V^HG^H(I - BA (A^HBA)^-1A^H)BGV, we have V^HG^H(I - BA (A^HBA)^-1A^H)BGV K = K L for some diagonal L. Hence X L = GVK L = GVV^HG^H(I - BA (A^HBA)^-1A^H)BGV K = G (I-UU^H)G^H(I - BA (A^HBA)^-1A^H) BX = (GG^H - GUU^HG^H - G G^HBA (A^HBA)^-1A^H + GUU^HG^HBA (A^HBA)^-1A^H )BX = (GG^H - AR^-1R^-HA^H - G G^HBA (A^HBA)^-1A^H + AR^-1R^-HA^HBA (A^HBA)^-1A^H )BX = (GG^H - AR^-1R^-HA^H - G G^HBA (A^HBA)^-1A^H + AR^-1R^-HA^H )BX = (GG^H - G G^HBA (A^HBA)^-1A^H )BX = (W^-1B - W^-1BA (A^HBA)^-1A^HB )X = W^-1B(I - A (A^HBA)^-1A^HB )X So X consists of the eigenvectors of W^-1B(I - A (A^HBA)^-1A^HB ) corresponding to the k highest eigenvalues [there must be an easier proof than this methinks]
4.16	\|x^Hy\|² = x^Hyy^Hx ≤ x^Hxy^Hy for any complex vectors x and y If y^Hy=0 then the inequality is true since y=0 making both sides of the inequality zero. Hence we assume that y^Hy>0. 0 ≤ \|xy^Hy - yy^Hx\|² = (y^Hyx^H - x^Hyy^H)(xy^Hy - yy^Hx) = y^Hyx^Hxy^Hy - x^Hyy^Hxy^Hy - y^Hyx^Hyy^Hx + x^Hyy^Hyy^Hx = x^Hxy^Hy - x^Hyy^Hx - x^Hyy^Hx + x^Hyy^Hx (after dividing all terms by y^Hy) = x^Hxy^Hy - x^Hyy^Hx
4.17	X^HY(Y^HY)^-1Y^HX ≤ X^HX where ≤ represents the Loewner partial order. Let a be an arbitrary vector a^HX^HY(Y^HY)^-1Y^HXa = a^HX^H × Y(Y^HY)^-1Y^HXa which is of the form u^Hv Applying the scalar Cauchy-Schwarz inequality [4.16] gives (a^HX^HY(Y^HY)^-1Y^HXa)² ≤ a^HX^HXa × a^HX^HY(Y^HY)^-1Y^HY(Y^HY)^-1Y^HXa = a^HX^HXa × a^HX^HY(Y^HY)^-1Y^HXa If a^HX^HY(Y^HY)^-1Y^HXa > 0 we can divide through to obtain a^HX^HY(Y^HY)^-1Y^HXa ≤ a^HX^HXa = \| Xa \|² If, however a^HX^HY(Y^HY)^-1Y^HXa = 0, then the inequality is true in any case since the right side is ≥ 0. Hence a^HX^HY(Y^HY)^-1Y^HXa ≤ a^HX^HXa for any vector a. Hence X^HY(Y^HY)^-1Y^HX ≤ X^HX in the sense of the Loewner partial order.

This page is part of The Matrix Reference Manual. Copyright © 1998-2022 Mike Brookes, Imperial College, London, UK. See the file gfl.html for copying instructions. Please send any comments or suggestions to "mike.brookes" at "imperial.ac.uk".
Updated: $Id: proof004.html 11291 2021-01-05 18:26:10Z dmb $