Matrix Reference Manual
Proofs Section 1: Basic Theorems

Go to: Introduction, Notation, Index

1.1 If A[m#n]=F[m#r]G[r#n] has linearly independent columns then  >= n and we can find B[r#r]  and a subset of F's columns, D[m#r-n], such that F = [D A]B.

To prove this, we replace the columns of F one at a time with columns taken from A and show that we can do this for all of A's columns.

  • Define F[k] = [D[k] a1:k] consisting of r-k columns selected from F followed by the first k columns of A. We prove by induction that, for each k in the range 0:n, we can choose D[k] and B[k] so that F=F[k]B[k].
  • For k=0, the proposition is true with D[0]=F and B[0]=I.
  • Suppose F=F[k]B[k] is true for some k<n. Then, A=F[k]B[k]G and if we define x = (B[k]G)k+1, the (k+1)th column of B[k]G, we have ak+1F[k]x.
  • Suppose xi is the first non-zero element of x. We must have i <= r-k for otherwise ak+1 = [0 a1:k]x or equivalently A[xr-k+1:r; -1; 0[n-k-1#1]] = 0. This would contradict the linear independence of A.
  • If we define D[k+1] to be D[k] with its ith column removed then F[k] = F[k+1]S where S=[I[i-1#i-1] 0;0 -xi-1xi+1:r I[r-i#r-i];0[1#i-1] xi-1 0[1#r-i]]. It follows that if B[k+1]=SB[k] we have F=F[k+1]B[k+1] as required.
  • Finally, we have F=F[n]B[n] = [D[n] A] where D[n] consists of r-n columns selected from F. In order for D[n] to exist, we must therefore have r>=n.
1.2 For any non-zero A[m#n] we can find a linearly independent subset of A's columns, F such that A=F[m#r]G[r#n] with G upper triangular and r<=n.

We prove this by induction on n. If n=1, set F=A and G=1.

Now suppose A, F and G satisfy the theorem and consider the m#n+1 matrix A+=[A x]. If x=Fy for some y then set F+=F and G+=[G y], otherwise set F+=[F x] and G+=[G 0; 0T 1]. It is straightforward to show in both cases that A+=F+G+ that G+ is upper triangular and that r+<=n+1.

We must also show that in the second case (i.e. if x!=Fy for any y) that the columns of [F x] are linearly independent. If [F x][a; b] = 0, then Fa=-bx. We must have b=0 since otherwise x =F(-a/b) which contradicts the assumption that x != Fy. Hence Fa = 0 which implies that a=0 since the columns of F are linearly independent. Thus the columns of [F x] are also linearly independent.

1.3 For any matrix A[m#n] we have rank(A) <= m and rank(A) <= n.
  • We have A = A[m#n] I[n#n] = I[m#m]A[m#n] . Since rank(A) is the minimum inner dimension of such a decomposition, we must have rank(A) <= m and rank(A) <= n.
1.4 rank(AB) <= rank(A) and rank(AB) <= rank(B)
  • If A=FG is a full-rank decomposition, then we can decompose AB as F.GB. Hence rank(AB) is <= the inner dimension of this decomposition which is rank(A). A similar argument shows that rank(AB) <= rank(B).
1.5 rank(A)=n iff A[m#n] has linearly independent columns.
  • If the columns of A are linearly independent and A=F[m#r]G[r#n] is a full-rank decomposition, then from 1.1, we have rank(A) = r >= n, but from 1.3, rank(A)<=n. Hence rank(A)=n.
  • If A's columns are not linearly independent then from 1.2 we can find a linearly independent subset F with A=F[m#r]G[r#n]. This implies that rank(A) <= r < n since the rank is the minimum inner dimension of such a decomposition.
1.6 rank(I[n#n])=n
  • The columns of I are linearly independent since Ix=0 implies x=0. Hence from 1.5 its rank is n.
1.7 If A has linearly independent columns then we can extend it to a square matrix [D[m#m-n] A[m#n]] with linearly independent columns.
  • We have A=IA so from 1.1 we can find D and B such that I[m#m]=[D A][m#m]B. From 1.6, 1.4 and 1.3 we have m = rank(I) <= rank([D A]) <= m. Hence rank([D A])=m and by 1.5 its columns are linearly independent.
1.8 If A[m#n] has linearly independent columns, then there exists a Q[m#n] and upper triangular R[n#n] and S[n#n] such that A=QR  and QHQ = RS = I. If A is real then Q, R and S may be chosen to be real.

We prove this by induction on n. If n=1, we set Q=a/||a||, R=||a||  and S=||a||-1 where a=A. Note that ||a|| cannot equal 0 since this would violate the assumption that the columns of A are linearly independent.

Now assume the theorem to be true for A[m#n] with Q, R and S defined as in the theorem. Given an m#(n+1) matrix A+ = [A x] with linearly independent columns, we define y = x - QQHx and note that y cannot equal 0 since we would then have x = QQHx = ASQHx contradicting the assumption that the columns of A+ are linearly independent.

We set Q+ = [Q  y/||y|| ], R+ = [R  QHx; 0T  ||y|| ] and S+ = [S  -SQHx/||y||; 0T  ||y||-1 ]. It is straightforward to show that A+, Q+, R+ and S+ satisfy the theorem by performing the appropriate multiplications. Note that if A is real then all the above calculations give real results.

This recursive procedure is called Gram-Schmitt orthogonalization but is not recommended because of its poor numerical properties.

1.9 For any A[m#n] there exists a Q[m#m] and upper triangular R[m#n] such that A=QR  and QHQ = I. If A is real, then Q and R may be taken as real.
  • From 1.2 we decompose A=FG with G upper triangular and F[m#r] linearly independent.
  • From 1.7 we find D such that [F D][m#m] has linearly independent columns.
  • From 1.8 we decompose [F d] = QU where QHQ = I.

We now have A = FG = [F D] [I; 0] G = Q U [I; 0] G = QR where R = U [I; 0] G is the product of three upper triangular matrices and is therefore upper triangular itself. If A is real then all of thee operations preserve realness.

If m>=n, the lower m-n rows of R are all zero and so the product A=QR is unaffected if we delete the last n-r columns of Q and the last n-r rows of R.

1.10 The diagonal elements of a skew-symmetric matrix are all 0.
[Provided that the underlying field does not have characteristic 2]
  • If A is skew symmetric then by definition a(i,j)=-a(j,i). By setting j=i, we get a(i,i)=-a(i,i) from which a(i,i)=0.
1.11 The rank of a skew-symmetric matrix is even.
[Provided that the underlying field does not have characteristic 2]
  • We prove this by induction starting with a 1#1 skew-symmetric matrix which must equal 0 and hence has even rank.
  • Given a skew-symmetric An+1#n+1, we can, by [1.10] partition it as A = [C b ; -bT 0] where Cn#n is skew-symmetric and rank(C) is even, by the induction hypothesis.
  • If  b = Cx for some x, then A = [I x]T C [I x] and also C = [I 0] A [I 0]T  and so, by [1.4], rank(A) = rank(C) which is even.
  • If, on the other hand, b is linearly independent of the columns of C then, -bT is also linearly independent of the rows of C = -CT. Hence rank(A) = rank([C b ; -bT 0]) = rank([C b]) + 1 = rank(C) + 2 which is even.
1.12 If D is diagonal then AD = DA iff ai,j=0 whenever di,i != dj,j.
  • (AD)i,j = ai,jdj,j = (DA)i,j = di,iai,j  ==> ai,j (di,i - dj,j) = 0  ==> ai,j = 0 or di,i = dj,j
1.13 If D = DIAG(c1I1, c2I2, ..., cMIM) where the ck are distinct scalars and the Ij are identity matrices, then AD = DA iff A = DIAG(A1, A2, ..., AM) where each Ak is the same size as the corresponding Ik.
  • From [1.12] ai,j=0 except when di,i = dj,j. However since the ck are distinct, di,i = dj,j only within one of the ckIk blocks. It follows that ai,j=0 except when i and j lie in the same block. The result follows.
1.14 A=UDVH=LDMH are alternative singular value decompositions of A iff UHL = DIAG(Q1, Q2, ..., Qk, R) and VHM = DIAG(Q1, Q2, ..., Qk, S) where Q1, Q2, ..., Qk are unitary matrices whose sizes are given by the multiplicities of the corresponding distinct non-zero singular values and R and S are unitary matrices whose size equals the number of zero singular values. [R.11]
  • Define the unitary matrices P=UHL and Q=VHM
  • UD2UH = AAH = LD2LH = UPD2PHUH ==> D2P = PD2
  • From [1.13], it follows that P (and also Q, by a similar argument based on AHA) is block diagonal with orthogonal blocks whose sizes match the multiplicity of the singular values in D. It follows that PD = DP.
  • We have UHAV = D = UHLDMH V= PDQH ==> DQ = PD = DP
  • Hence D(P - Q) = 0 and P must equal Q except for rows in which dii = 0. The result follows.
1.15 If D is diagonal then XDXT = sumi(di × xixiT) and XDXH = sumi(di × xixiH)
  • (XDXT)pq = sumi(xpidixqi) = sumi(di × xpixqi) = sumi(di × (xixiT)pq) = (sumi(di × xixiT))pq
  • The proof for XDXH is identical.
1.16 If D is diagonal then tr(XDXT) = sumi(di × xiTxi) and tr(XDXH) = sumi(di × xiHxi) = sumi(di × |xi|2)
  • tr(XDXT) = tr(sumi(di × xixiT)) [1.15] = tr(sumi(di × xiTxi)) = sumi(di × xiTxi)
  • The proof for tr(XDXH) is identical.
1.17 tr(AB) = tr(BA)
  • (AB)i,j = sumk(aikbkj)
  • tr(AB) = sumi(AB)ii = sumi,k(aikbki) = sumk,i(bkiaik) = sumk(BA)kk = tr(BA)
1.18 tr(AB) = A:TBT: = AT:TB:  = AH:HB: = (A:HBH:)C
  • (AB)i,j = sumk(aikbkj)
  • tr(AB) = sumi(AB)ii = sumi,k(aikbki) = sumi,k(Aik(BT)ik) = A:TBT:
  • tr(AB) = tr(BA[1.17] = B:TAT [1.18] =  AT:TB:
  • tr(AB) =  AT:TB: =  AH:HB:  since AT:T = AH:H
  • tr(AB) = A:TBT: =  (A:HBH:)C
1.19 tr([A B]T [C D]) = tr(ATC) + tr(BTD)
  • Note that conformality implies that the matrix dimensions are A[n#m], B[n#k], C[n#m] and D[n#k]
  • tr([A B]T [C D]) = tr([ATC ATD ; BTC BTD]) = tr(ATC) + tr(BTD) since these diagonal blocks are square.
1.20 The pseudoinverse is unique
  • Assume that both X+ and Y satisfy the pseudo inverse equations. We will show that Y = X+.
    1. (a) XX+X=X  and (b)  XYX=X
    2. (a) X+XX+=X+ and (b)  YXY=Y
    3. (a) (XX+)H=XX+  and (b)  (XY)H=XY
    4. (a) (X+X)H=X+X  and (b)  (YX)H=YX
  • Y = YXY [2b] = YYHXH [3b] = YYHXH(X+)HXH [1a] = YXY(X+)HXH [3b] = YXYXX+ [3a] = YXX+ [2b] = XHYHX+ [4b] = XH(X+)HXHYHX+ [1a] = X+XXHYHX+ [4a] = X+XYXX+ [4b] = X+XX+ [1b] = X+ [2a]
1.21 C=TOE(a)[m#r]TOE(b)[r#n] is toeplitz iff ar+1:r+m-1b1:n-1T = a1:m-1br+1:r+n-1T
  • The product, C, is toeplitz iff ci+1,j+1-ci,j = 0 for all 1<=i<=m-1 and 1<=j<=n-1.
  • ci+1,j+1-ci,j=sump=1:r(Ai+1,pBp,j+1) - sumq=1:r(Ai,qBq,j) where Au,v=au-v+r and Bu,v=bu-v+n from the toeplitz property.
  • Hence ci+1,j+1-ci,j =  sump=1:r(ai+1-p+rbp-j-1+n) - sumq=1:r(ai-q+rbq-j+n)
  • Substituting p=q+1 in the first summation gives ci+1,j+1-ci,j = sumq=0:r-1(ai-q+rbq-j+n) - sumq=1:r(ai-q+rbq-j+n)
  • The summation terms all cancel except for q=0 and q=r which leaves ci+1,j+1-ci,j = ai+rbn-j - aibn-j+r = ai+rbk - aibk+r where k=n-j. Note that 1<=j<=n-1 implies 1<=k<=n-1.
  • Thus C is toeplitz iff ai+rbk = aibk+r for  all 1<=i<=m-1 and 1<=k<=n-1 which gives ar+1:r+m-1b1:n-1T = a1:m-1br+1:r+n-1T as required.

This page is part of The Matrix Reference Manual. Copyright © 1998-2017 Mike Brookes, Imperial College, London, UK. See the file gfl.html for copying instructions. Please send any comments or suggestions to "mike.brookes" at "".
Updated: $Id: proof001.html 10094 2017-09-01 17:31:37Z dmb $