Matrix Calculus
Go to: Introduction, Notation, Index
Contents of Calculus Section
Notation
- j is the square root of -1
- XR and XI are the real
and imaginary parts of X = XR +
jXI
- XC is the complex conjugate of X
- X: denotes the long column vector formed by concatenating the
columns of X (see vectorization).
- A ¤ B = KRON(A,B), the kroneker product
- A • B the Hadamard or elementwise product
- matrices and vectors A, B, C do not depend on
X
In the main part of this page we express results in terms of differentials
rather than derivatives for two reasons: they avoid notational disagreements
and they cope easily with the complex case. In most cases however, the
differentials have been written in the form dY: =
dY/dX dX: so that the corresponding
derivative may be easily extracted.
Derivatives with respect to a real matrix
If X is p#q and Y is m#n, then
dY: = dY/dX dX: where
the derivative dY/dX is a large mn#pq
matrix. If X and/or Y are column vectors or scalars, then the
vectorization operator : has no effect and may be omitted.
dY/dX is also called the Jacobian Matrix of
Y: with respect to X: and det(dY/dX)
is the corresponding Jacobian. The Jacobian occurs when changing
variables in an integration:
Integral(f(Y)dY:)=Integral(f(Y(X))
det(dY/dX) dX:).
Although they do not generalise so well, other authors use alternative
notations for the cases when X and Y are both vectors or when one
is a scalar. In particular:
- dy/dx is sometimes written as a column vector rather
than a row vector
- dy/dx is sometimes transposed from the above
definition or else is sometimes written
dy/dxT to emphasise the
correspondence between the columns of the derivative and those of
xT.
- dY/dx and dy/dX are often written
as matrices rather than, as here, a column vector and row vector respectively.
The matrix form may be converted to the form used here by appending : or
:T respectively.
Derivatives with respect to a complex matrix
If X is complex then dY: =
dY/dX dX: can only be true iff
Y(X) is an analytic function which normally implies that
Y(X) does not depend on XC or
XH.
Even for non-analytic functions we can write uniquely dY: =
dY/dX dX: +
dY/dXC
dXC: provided that is
analytic with respect to X and XC individually
(or equivalently with respect to XR and
XI individually).
dY/dX is the Generalized Complex Derivative and
dY/dXC is the Complex Conjugate Derivative
[R.4, R.9].
We define the generalized derivatives in terms of partial derivatives with
respect to XR and XI:
- dY/dX = ½
(dY/dXR - j
dY/dXI)
- dY/dXC =
(dYC/dX)C
= ½ (dY/dXR + j
dY/dXI)
We have the following relationships for both analytic and non-analytic
functions Y(X):
- Cauchy Riemann equations: The following are equivalent:
- Y(X) is an analytic function of X
- dY: = dY/dX
dX:
- dY/dXC = 0 for all
X
- dY/dXR + j
dY/dXI = 0 for all
X
- dY: = dY/dX dX: +
dY/dXC
dXC:
- dY/dXR =
dY/dX +
dY/dXC
- dY/dXI = j
(dY/dX -
dY/dXC)
- Chain rule: If Z is a function of Y which is itself a
function of X, then dZ/dX =
dZ/dY dY/dX. This is
the same as for real derivatives.
- Real-valued: If Y(X) is real for all complex X,
then
- dY/dXC=
(dY/dX)C
- dY: = 2(dY/dX
dX:)R
- If W(X) is analytic with
W(X)=Y(X) for all real X, then
dW/dX = 2
(dY/dX)R for all real X
- Example: If C=CH,
y(x)=xHCx and
w(x)=xTCx, then
dy/dx = xHC and
dw/dx =
2xTCR
If f(x) is a real function of a complex vector then
df/dxC=
(df/dx)C and we can define
grad(f(x)) = 2 (df/dx)H =
(df/dxR+j
df/dxI)T as the
Complex Gradient Vector [R.9]
with the following properties:
- grad(f(x)) is zero at an extreme value of f
.
- grad(f(x)) points in the direction of steepest slope
of f(x)
- The magnitude of the steepest slope is equal to
|grad(f(x))|. Specifically, if g(x) =
grad(f(x)), then lima->0
a-1( f(x+ag(x)) -
f(x) ) = | g(x) |2
- grad(f(x)) is normal to the surface f(x)
= constant which means that it can be used for gradient ascent/descent
algorithms.
Basic Properties
- We may write the following differentials unambiguously without parentheses:
- Transpose:
dYT=d(YT)=(dY)T
- Hermitian Transpose:
dYH=d(YH)=(dY)H
- Conjugate:
dYC=d(YC)=(dY)C
- Linearity:
d(Y+Z)=dY+dZ
- Chain Rule: If Z
is a function of Y which is itself a function of X, then for both
the normal and the generalized complex derivative:
dZ: = dZ/dY dY: =
dZ/dY dY/dX
dX:
- Product Rule: d(YZ) =Y dZ +
dY Z
- d(YZ): = (I ¤ Y)
dZ: + (ZT ¤ I)
dY: = ((I ¤ Y)
dZ/dX + (ZT ¤
I) dY/dX ) dX:
- Hadamard Product:
d(Y • Z) =Y • dZ +
dY • Z
- Kroneker Product: d(Y
¤ Z) =Y ¤ dZ + dY
¤ Z
Differentials of Linear
Functions
- d(Ax) = d(xTA):
=A dx
- d(ATXB): =
(AT dX B): = (B
¤ A)T dX:
- d(aTXb) = (b ¤
a)T dX: =
(abT):T dX:
- d(aTXa) =
d(aTXTa) =
(a ¤ a)T dX: =
(aaT):T
dX:
- d(XB): = (dX B): =
(BT ¤ I) dX:
- d(xbT): =
(dx bT): = (b ¤
I) dx
- d(aTXTb) =
(a ¤ b)T dX: =
(baT):T
dX:
- [x: Complex]
- Writing In = I[n#n]
and
Tq,m = TVEC(q,m),
- d(X[m#n] ¤
A[p#q]): = (In
¤ Tq,m ¤
Ip)(Imn ¤ A:)
dX: = (Inq ¤
Tm,p )(In
¤ A: ¤ Im) dX:
- d(A[p#q] ¤
X[m#n]): = (Iq
¤ Tn,p ¤
Im)(A: ¤ Imn)
dX: = (Tm,n ¤
Ipq )(In ¤ A:
¤ Im) dX:
Differentials of Quadratic
Products
- d(Ax+b)TC(Dx+e)
= ((Ax+b)TCD +
(Dx+e)TCTA)
dx
- d(xTCx) =
xT(C+CT)dx
= [C=CT]
2xTCdx
- d(Ax+b)T (Dx+e) =
( (Ax+b)TD +
(Dx+e)TA)dx
- d(Ax+b)T (Ax+b) =
2(Ax+b)TAdx
- d(Ax+b)TC(Ax+b)
= [C=CT]
2(Ax+b)TCA dx
- d(Ax+b)HC(Dx+e)
= (Ax+b)HCD dx +
(Dx+e)TCTAC
dxC
- d (xHCx)
=xHC dx
+xTCT
dxC = [C=CH]
2(xHC dx)R
- d (xHx) =
2(xH dx)R
- d(aTXTXb) =
X(abT +
baT):T dX:
- d(aTXTXa) =
2(XaaT ):T
dX:
- d(aTXTCXb)
= (CTXabT +
CXbaT):T dX:
- d(aTXTCXa)
= ((C + CT)XaaT
):T dX: = [C=CT]
2(CXaaT):T
dX:
- d((Xa+b)TC(Xa+b)) =
((C+CT)(Xa+b)aT
):T dX:
- d(X2): = (XdX + dX
X): = (I ¤ X + XT ¤ I)
dX:
- d(XTCX): =
(XTCdX): +
(d(XT) CX): = (I
¤ XTC) dX: +
(XTCT ¤ I)
dXT:
- d(XHCX): =
(XHCdX): +
(d(XH) CX): = (I
¤ XHC) dX: +
(XTCT ¤ I)
dXH:
Differentials of Cubic
Products
- d(xxTAx) =
(xxT(A+AT)+xTAxI
)dx
Differentials of Inverses
- d(X-1) = -X-1dX
X-1 [2.1]
- d(X-1): =
-(X-T ¤ X-1)
dX:
- d(aTX-1b) = -
(X-TabTX-T ):T
dX: = - (abT):T
(X-T ¤ X-1)
dX: [2.6]
- d(tr(ATX-1B)) =
d(tr(BTXTA)) =
-(X-TABTX-T):T
dX: = -(ABT):T
(X-T ¤ X-1)
dX:
Differentials of Trace
Note: matrix dimensions must result in an n*n argument for tr().
- d(tr(Y))=tr(dY)
- d(tr(X)) = d(tr(XT)) =
I:T
dX: [2.4]
- d(tr(Xk))
=k(Xk-1)T:T
dX:
- d(tr(AXk)) =
(SUMr=0:k-1(XrAXk-r-1)T
):T dX:
- d(tr(AX-1B)) =
-(X-1BAX-1)T:T
dX:=
-(X-TATBTX-T):T
dX: [2.5]
- d(tr(AX-1))
=d(tr(X-1A)) =
-(X-TATX-T
):T dX:
- d(tr(ATXBT)) =
d(tr(BXTA)) =
(AB):T
dX: [2.4]
- d(tr(XAT)) =
d(tr(ATX))
=d(tr(XTA)) =
d(tr(AXT)) =
A:T dX:
-
d(tr(ATX-1BT))
= d(tr(BXTA)) =
-(X-TABX-T):T
dX: = -(AB):T
(X-T ¤ X-1)
dX:
- d(tr(AXBXTC)) =
(ATCTXBT
+ CAXB):T dX:
- d(tr(XAXT)) =
d(tr(AXTX)) =
d(tr(XTXA)) =(
X(A+AT)):T
dX:
- d(tr(XTAX)) =
d(tr(AXXT)) =
d(tr(XXTA)) =
((A+AT)X):T
dX:
- d(tr(AXBX)) =
(ATXTBT
+
BTXTAT
):T dX:
- d(tr((AXb+c)(AXb+c)T)
=
2(AT(AXb+c)bT):T
dX:
- d(tr((XTCX)-1A) =
[C:symmetric] d(tr(A
(XTCX)-1) =
-((CX(XTCX)-1)(A+AT)(XTCX)-1):T
dX:
-
d(tr((XTCX)-1(XTBX))
= [B,C:symmetric] d(tr(
(XTBX)(XTCX)-1)
=
2(BX(XTCX)-1-(CX(XTCX)-1)XTBX(XTCX)-1
):T dX:
Note: matrix dimensions must result in an n#n argument for
det(). Some of the expressions below involve inverses: these forms apply only
if the quantity being inverted is square and non-singular; alternative forms
involving the adjoint, ADJ(), do not have
the non-singular requirement.
- d(det(X)) = d(det(XT))
= ADJ(XT):T
dX: = det(X)
(X-T):T
dX: [2.7]
- d(det(ATXB)) =
d(det(BTXTA))
= (A ADJ(ATXB)TBT):T
dX: = [A,B:
nonsingular] det(ATXB) ×
(X-T):T dX:
[2.8]
- d(ln(det(ATXB))) = [A,B: nonsingular]
(X-T):T dX:
[2.9]
- d(ln(det(X))) =
(X-T):T
dX:
- d(det(Xk)) =
d(det(X)k) = k ×
det(Xk) ×
(X-T):T dX:
[2.10]
- d(ln(det(Xk))) = k
× (X-T):T
dX:
- d(det(XTCX)) = [C=CT]
2det(XTCX)×(CX(XTCX)-1):T
dX: [2.11]
- = [C=CT,
CX: nonsingular]
2det(XTCX)×(X-T):T
dX:
- d(ln(det(XTCX))) = [C=CT]
2(CX(XTCX)-1):T
dX:
- = [C=CT,
CX:
nonsingular] 2(X-T):T
dX:
- d(det(XHCX)) =
det(XHCX) ×
(CTXC
(XTCTXC)-1)dX:
+
(CX(XHCX)-1):T
dXC:) [2.12]
- d(ln(det(XHCX))) =
(CTXC
(XTCTXC)-1):TdX:
+
(CX(XHCX)-1):T
dXC: [2.13]
dY/dX is called the Jacobian Matrix
of Y: with respect to X: and
JX(Y)=det(dY/dX) is
the corresponding Jacobian. The Jacobian occurs when changing variables
in an integration:
Integral(f(Y)dY:)=Integral(f(Y(X))
det(dY/dX) dX:).
-
JX(X[n#n]-1)=
(-1)ndet(X)-2n
Hessian matrix
If f is a real function of x then the Hermitian matrix Hx
f = (d/dx
(df/dx)H)T is
the Hessian matrix of f(x). A value of x for which
grad f(x) = 0 corresponds to a minimum, maximum or
saddle point according to whether Hx f is
positive definite, negative definite or indefinite.
- [Real] Hx f
= d/dx (df/dx)T
- Hx f is symmetric
- Hx (aTx) = 0
- Hx
(Ax+b)TC(Dx+e) =
ATCD +
DTCTA
- Hx (Ax+b)T
(Dx+e) = ATD +
DTA
- Hx
(Ax+b)TC(Ax+b) =
AT(C + CT)A =
[C=CT]
2ATCA
- Hx (Ax+b)T
(Ax+b) = 2ATA
- Hx (xTCx) =
C+CT = [C=CT] 2C
- Hx (xTx) =
2I
- [x: Complex] Hx
f = (d/dx
(df/dx)H)T =
d/dxC
(df/dx)T
This page is part of The Matrix Reference
Manual. Copyright © 1998-2005 Mike Brookes, Imperial
College, London, UK. See the file gfl.html for copying
instructions. Please send any comments or suggestions to "mike.brookes" at
"imperial.ac.uk".
Updated: $Id: calculus.html,v 1.28 2009/09/03 07:29:40 dmb Exp $