Matrix Calculus
Go to: Introduction, Notation, Index
Contents of Calculus Section
Notation
 j is the square root of 1
 X^{R} and X^{I} are the real
and imaginary parts of X = X^{R} +
jX^{I}
 X^{C} is the complex conjugate of X
 X: denotes the long column vector formed by concatenating the
columns of X (see vectorization).
 A ¤ B = KRON(A,B), the kroneker product
 A • B the Hadamard or elementwise product
 matrices and vectors A, B, C do not depend on
X
 I_{n} = I_{[n#n]} the
n#n identity matrix
 T_{m,n} =
TVEC(m,n) is the vectorized
transpose matrix, i.e. X^{T}:=T_{m,n}X:
for X_{[m,n]}
In the main part of this page we express results in terms of differentials
rather than derivatives for two reasons: they avoid notational disagreements
and they cope easily with the complex case. In most cases however, the
differentials have been written in the form dY: =
dY/dX dX: so that the corresponding
derivative may be easily extracted.
Derivatives with respect to a real matrix
If X is p#q and Y is m#n, then
dY: = dY/dX dX: where
the derivative dY/dX is a large mn#pq
matrix. If X and/or Y are column vectors or scalars, then the
vectorization operator : has no effect and may be omitted.
dY/dX is also called the Jacobian Matrix of
Y: with respect to X: and det(dY/dX)
is the corresponding Jacobian. The Jacobian occurs when changing
variables in an integration:
Integral(f(Y)dY:)=Integral(f(Y(X))
det(dY/dX) dX:).
Although they do not generalise so well, other authors use alternative
notations for the cases when X and Y are both vectors or when one
is a scalar. In particular:
 dy/dx is sometimes written as a column vector rather
than a row vector
 dy/dx is sometimes transposed from the above
definition or else is sometimes written
dy/dx^{T} to emphasise the
correspondence between the columns of the derivative and those of
x^{T}.
 dY/dx and dy/dX are often written
as matrices rather than, as here, a column vector and row vector respectively.
The matrix form may be converted to the form used here by appending : or
:^{T} respectively.
Derivatives with respect to a complex matrix
If X is complex then dY: =
dY/dX dX: can only be true iff
Y(X) is an analytic function which normally implies that
Y(X) does not depend on X^{C} or
X^{H}.
Even for nonanalytic functions we can write uniquely dY: =
dY/dX dX: +
dY/dX^{C}
dX^{C}: provided that is
analytic with respect to X and X^{C} individually
(or equivalently with respect to X^{R} and
X^{I} individually).
dY/dX is the Generalized Complex Derivative and
dY/dX^{C} is the Complex Conjugate Derivative
[R.4, R.9].
We define the generalized derivatives in terms of partial derivatives with
respect to X^{R} and X^{I}:
 dY/dX = ½
(dY/dX^{R}  j
dY/dX^{I})
 dY/dX^{C} =
(dY^{C}/dX)^{C}
= ½ (dY/dX^{R} + j
dY/dX^{I})
We have the following relationships for both analytic and nonanalytic
functions Y(X):
 Cauchy Riemann equations: The following are equivalent:
 Y(X) is an analytic function of X
 dY: = dY/dX
dX:
 dY/dX^{C} = 0 for all
X
 dY/dX^{R} + j
dY/dX^{I} = 0 for all
X
 dY: = dY/dX dX: +
dY/dX^{C}
dX^{C}:
 dY/dX^{R} =
dY/dX +
dY/dX^{C}
 dY/dX^{I} = j
(dY/dX 
dY/dX^{C})
 Chain rule: If Z is a function of Y which is itself a
function of X, then dZ/dX =
dZ/dY dY/dX. This is
the same as for real derivatives.
 Realvalued: If Y(X) is real for all complex X,
then
 dY/dX^{C}=
(dY/dX)^{C}
 dY: = 2(dY/dX
dX:)^{R}
 If W(X) is analytic with
W(X)=Y(X) for all real X, then
dW/dX = 2
(dY/dX)^{R} for all real X
 Example: If C=C^{H},
y(x)=x^{H}Cx and
w(x)=x^{T}Cx, then
dy/dx = x^{H}C and
dw/dx =
2x^{T}C^{R}
If f(x) is a real function of a complex vector then
df/dx^{C}=
(df/dx)^{C} and we can define
grad(f(x)) = 2 (df/dx)^{H} =
(df/dx^{R}+j
df/dx^{I})^{T} as the
Complex Gradient Vector [R.9]
with the following properties:
 grad(f(x)) is zero at an extreme value of f
.
 grad(f(x)) points in the direction of steepest slope
of f(x)
 The magnitude of the steepest slope is equal to
grad(f(x)). Specifically, if g(x) =
grad(f(x)), then lim_{a>0}
a^{1}( f(x+ag(x)) 
f(x) ) =  g(x) ^{2}
 grad(f(x)) is normal to the surface f(x)
= constant which means that it can be used for gradient ascent/descent
algorithms.
Basic Properties
 We may write the following differentials unambiguously without parentheses:
 Transpose:
dY^{T}=d(Y^{T})=(dY)^{T}
 Hermitian Transpose:
dY^{H}=d(Y^{H})=(dY)^{H}
 Conjugate:
dY^{C}=d(Y^{C})=(dY)^{C}
 Linearity:
d(Y+Z)=dY+dZ
 Chain Rule: If Z
is a function of Y which is itself a function of X, then for both
the normal and the generalized complex derivative:
dZ: = dZ/dY dY: =
dZ/dY dY/dX
dX:
 Product Rule: d(YZ) =Y dZ +
dY Z
 d(YZ): = (I ¤ Y)
dZ: + (Z^{T} ¤ I)
dY: = ((I ¤ Y)
dZ/dX + (Z^{T} ¤
I) dY/dX ) dX:
 Hadamard Product:
d(Y • Z) =Y • dZ +
dY • Z
 Kroneker Product: d(Y
¤ Z) =Y ¤ dZ + dY
¤ Z
Differentials of Linear
Functions
 d(Ax) = d(x^{T}A):
=A dx
 d(x^{T}a) =
d(a^{T}x) = a^{T}
dx
 d(bx^{T}a) = ba^{T}
dx
 d(AXB): =
(A dX B): = (B^{T}
¤ A) dX:
 d(a^{T}Xb) = (b ¤
a)^{T} dX: =
(ab^{T}):^{T} dX:
 d(a^{T}Xa) =
d(a^{T}X^{T}a) =
(a ¤ a)^{T} dX: =
(aa^{T}):^{T}
dX:
 [X_{[m#n}_{]}] d(AX): = (I_{n} ¤
A) dX:
 [X_{[m#n}_{]}] d(XB): = (dX B): =
(B^{T} ¤ I_{m}) dX:
 [x_{[n}_{]}] d(xb^{T}): =
(dx b^{T}): = (b ¤
I_{n}) dx
 d(AX^{T}B):
= (B^{T}
¤ A) dX^{T}:
 d(a^{T}X^{T}b) =
(a ¤ b)^{T} dX: = (ab^{T}):^{T}
dX^{T}:= (ba^{T}):^{T} dX:
 d(x) = x^{1}x^{T}
dx
 [x: Complex]
d (x^{H}A): =
A^{T} dx^{C}
 d(X_{[m#n]} ¤
A_{[p#q]}): = (I_{n}
¤ T_{q,m} ¤
I_{p})(I_{mn} ¤ A:)
dX: = (I_{nq} ¤
T_{m,}_{p} )(I_{n}
¤ A: ¤ I_{m}) dX:
 d(A_{[p#q]} ¤
X_{[m#n]}): = (I_{q}
¤ T_{n,p} ¤
I_{m})(A: ¤ I_{mn})
dX: = (T_{m,n} ¤
I_{pq} )(I_{n} ¤ A:
¤ I_{m}) dX:
Differentials of Quadratic
Products
 d(Ax+b)^{T}C(Dx+e)
= ((Ax+b)^{T}CD +
(Dx+e)^{T}C^{T}A)
dx
 d(x^{T}Cx) =
x^{T}(C+C^{T})dx
= [C=C^{T}]
2x^{T}Cdx
 d(Ax+b)^{T} (Dx+e) =
( (Ax+b)^{T}D +
(Dx+e)^{T}A)dx
 d(Ax+b)^{T} (Ax+b) =
2(Ax+b)^{T}Adx
 d(Ax+b)^{T}C(Ax+b)
= [C=C^{T}]
2(Ax+b)^{T}CA dx
 d(Ax+b)^{H}C(Dx+e)
= (Ax+b)^{H}CD dx +
(Dx+e)^{T}C^{T}A^{C}
dx^{C}
 d (x^{H}Cx)
=x^{H}C dx
+x^{T}C^{T}
dx^{C} = [C=C^{H}]
2(x^{H}C dx)^{R}
 d (x^{H}x) =
2(x^{H} dx)^{R}
 d(a^{T}X^{T}Xb) =
X(ab^{T} +
ba^{T}):^{T} dX:
 d(a^{T}X^{T}Xa) =
2(Xaa^{T} ):^{T}
dX:
 d(a^{T}X^{T}CXb)
= (C^{T}Xab^{T} +
CXba^{T}):^{T} dX:
 d(a^{T}X^{T}CXa)
= ((C + C^{T})Xaa^{T}
):^{T} dX: = [C=C^{T}]
2(CXaa^{T}):^{T}
dX:
 d((Xa+b)^{T}C(Xa+b)) =
((C+C^{T})(Xa+b)a^{T}
):^{T} dX:
 [X_{[n#n}_{]}] d(X^{2}): = (XdX + dX
X): = (I_{n} ¤ X + X^{T} ¤ I_{n})
dX:
 [X_{[m#n}_{]}] d(X^{T}CX):
= (I_{n}
¤ X^{T}C) dX: +
(X^{T}C^{T} ¤ I_{n})
dX^{T}: = (I_{n}
¤ X^{T}C+T_{n,n}(I_{n}
¤ X^{T}C^{T})) dX:
 [X_{[m#n]}, C_{[m#m]}=C^{T}] d(X^{T}CX):
= (I_{n×n}+T_{n,n})(I_{n}
¤ X^{T}C) dX:
 [X_{[m#n}_{]}] d(X^{T}X): = (I_{n}
¤ X^{T}) dX: +
(X^{T} ¤ I_{n})
dX^{T}: = (I_{n×n}
+ T_{n,n})(I_{n}
¤ X^{T}) dX:
 [X_{[m#n}_{]}] d(X^{H}CX): =
(X^{H}CdX): +
(d(X^{H}) CX): = (I_{n}
¤ X^{H}C) dX: +
(X^{T}C^{T} ¤ I_{n})
dX^{H}:
Differentials of Cubic
Products
 d(xx^{T}Ax) =
(xx^{T}(A+A^{T})+x^{T}Ax×I
)dx
 d(xx^{T}x) =
(2xx^{T}+x^{T}x×I
)dx
 [X_{[m#n}_{]}]
d(XAX^{T}BX): = (X^{T}B^{T}XA^{T}
¤ I_{m} + I_{n}
¤ XAX^{T}B) dX: +
(X^{T}B ¤ XA)
dX^{T}: = (X^{T}B^{T}XA^{T}
¤ I_{m} + T_{n,m}(XA ¤ X^{T}B)
+ I_{n}
¤ XAX^{T}B) dX:
 [X_{[m#n]}]
d(XX^{T}X): = (X^{T}X
¤ I_{m} + I_{n}
¤ XX^{T}) dX: +
(X^{T} ¤ X)
dX^{T}: = (X^{T}X
¤ I_{m} + T_{n,m}(X ¤ X^{T})
+ I_{n}
¤ XX^{T}) dX:
 [X_{[m#n]}]
d(XAXBX): = (X^{T}B^{T}X^{T}A^{T}
¤ I_{m} + X^{T}B^{T}
¤ XA + I_{n}
¤ XAXB) dX:
 [X_{[n#n]}]
d(X^{3}): = ((X^{T})^{2}
¤ I_{n} + X^{T}
¤ X + I_{n}
¤ X^{2}) dX:
Differentials of Inverses
 d(X^{1}) = X^{1}dX
X^{1} [2.1]
 d(X^{1}): =
(X^{T} ¤ X^{1})
dX:
 d(a^{T}X^{1}b) = 
(X^{T}ab^{T}X^{T} ):^{T}
dX: =  (ab^{T}):^{T}
(X^{T} ¤ X^{}^{1})
dX: [2.6]
 d(tr(A^{T}X^{1}B)) =
d(tr(B^{T}X^{T}A)) =
(X^{T}AB^{T}X^{T}):^{T}
dX: = (AB^{T}):^{T}
(X^{T} ¤ X^{1})
dX:
Differentials of Trace
Note: matrix dimensions must result in an n*n argument for tr().
 d(tr(Y))=tr(dY)
 d(tr(X)) = d(tr(X^{T})) =
I:^{T}
dX: [2.4]
 d(tr(X^{k}))
=k(X^{k}^{1})^{T}:^{T}
dX:
 d(tr(AX^{k})) =
(SUM_{r=0:k1}(X^{r}AX^{kr}^{1})^{T}
):^{T} dX:
 d(tr(AX^{1}B)) =
(X^{1}BAX^{1})^{T}:^{T}
dX:=
(X^{T}A^{T}B^{T}X^{T}):^{T}
dX: [2.5]
 d(tr(AX^{1}))
=d(tr(X^{1}A)) =
(X^{T}A^{T}X^{T}
):^{T} dX:
 d(tr(A^{T}XB^{T})) =
d(tr(BX^{T}A)) =
(AB):^{T}
dX: [2.4]
 d(tr(XA^{T})) =
d(tr(A^{T}X))
=d(tr(X^{T}A)) =
d(tr(AX^{T})) =
A:^{T} dX:

d(tr(A^{T}X^{1}B^{T}))
= d(tr(BX^{T}A)) =
(X^{T}ABX^{T}):^{T}
dX: = (AB):^{T}
(X^{T} ¤ X^{1})
dX:
 d(tr(AXBX^{T}C)) =
(A^{T}C^{T}XB^{T}
+ CAXB):^{T} dX:
 d(tr(XAX^{T})) =
d(tr(AX^{T}X)) =
d(tr(X^{T}XA)) =(
X(A+A^{T})):^{T}
dX:
 d(tr(X^{T}AX)) =
d(tr(AXX^{T})) =
d(tr(XX^{T}A)) =
((A+A^{T})X):^{T}
dX:
 d(tr(AXBX)) =
(A^{T}X^{T}B^{T}
+
B^{T}X^{T}A^{T}
):^{T} dX:
 d(tr((AXb+c)(AXb+c)^{T})
=
2(A^{T}(AXb+c)b^{T}):^{T}
dX:
 d(tr((X^{T}CX)^{1}A) =
[C:symmetric] d(tr(A
(X^{T}CX)^{1}) =
((CX(X^{T}CX)^{1})(A+A^{T})(X^{T}CX)^{1}):^{T}
dX:

d(tr((X^{T}CX)^{1}(X^{T}BX))
= [B,C:symmetric] d(tr(
(X^{T}BX)(X^{T}CX)^{1})
=
2(BX(X^{T}CX)^{1}(CX(X^{T}CX)^{1})X^{T}BX(X^{T}CX)^{1}
):^{T} dX:
Note: matrix dimensions must result in an n#n argument for
det(). Some of the expressions below involve inverses: these forms apply only
if the quantity being inverted is square and nonsingular; alternative forms
involving the adjoint, ADJ(), do not have
the nonsingular requirement.
 d(det(X)) = d(det(X^{T}))
= ADJ(X^{T}):^{T}
dX: = det(X)
(X^{T}):^{T}
dX: [2.7]
 d(det(A^{T}XB)) =
d(det(B^{T}X^{T}A))
= (A ADJ(A^{T}XB)^{T}B^{T}):^{T}
dX: = [A,B:
nonsingular] det(A^{T}XB) ×
(X^{T}):^{T} dX:
[2.8]
 d(ln(det(A^{T}XB))) = [A,B: nonsingular]
(X^{T}):^{T} dX:
[2.9]
 d(ln(det(X))) =
(X^{T}):^{T}
dX:
 d(det(X^{k})) =
d(det(X)^{k}) = k ×
det(X^{k}) ×
(X^{T}):^{T} dX:
[2.10]
 d(ln(det(X^{k}))) = k
× (X^{T}):^{T}
dX:
 d(det(X^{T}CX)) = [C=C^{T}]
2(CX ADJ(X^{T}CX)):^{T}
dX: = 2det(X^{T}CX)×(CX(X^{T}CX)^{1}):^{T}
dX: [2.11]
 = [C=C^{T},
CX: nonsingular]
2det(X^{T}CX)×(X^{T}):^{T}
dX:
 d(ln(det(X^{T}CX))) = [C=C^{T}]
2(CX(X^{T}CX)^{1}):^{T}
dX:
 = [C=C^{T},
CX:
nonsingular] 2(X^{T}):^{T}
dX:
 d(det(X^{H}CX)) =
det(X^{H}CX) ×
(C^{T}X^{C}
(X^{T}C^{T}X^{C})^{1}):^{T}dX:
+
(CX(X^{H}CX)^{1}):^{T}
dX^{C}:) [2.12]
 d(ln(det(X^{H}CX))) =
(C^{T}X^{C}
(X^{T}C^{T}X^{C})^{1}):^{T}dX:
+
(CX(X^{H}CX)^{1}):^{T}
dX^{C}: [2.13]
dY/dX is called the Jacobian Matrix
of Y: with respect to X: and
J_{X}(Y)=det(dY/dX) is
the corresponding Jacobian. The Jacobian occurs when changing variables
in an integration:
Integral(f(Y)dY:)=Integral(f(Y(X))
det(dY/dX) dX:).

J_{X}(X_{[n#n]}^{1})=
(1)^{n}det(X)^{2n}
Hessian matrix
If f is a real function of x then the Hermitian matrix H_{x}
f = (d/dx
(df/dx)^{H})^{T} is
the Hessian matrix of f(x). A value of x for which
grad f(x) = 0 corresponds to a minimum, maximum or
saddle point according to whether H_{x} f is
positive definite, negative definite or indefinite.
 [Real] H_{x} f
= d/dx (df/dx)^{T}
 H_{x} f is symmetric
 H_{x} (a^{T}x) = 0
 H_{x}
(Ax+b)^{T}C(Dx+e) =
A^{T}CD +
D^{T}C^{T}A
 H_{x} (Ax+b)^{T}
(Dx+e) = A^{T}D +
D^{T}A
 H_{x}
(Ax+b)^{T}C(Ax+b) =
A^{T}(C + C^{T})A =
[C=C^{T}]
2A^{T}CA
 H_{x} (Ax+b)^{T}
(Ax+b) = 2A^{T}A
 H_{x} (x^{T}Cx) =
C+C^{T} = [C=C^{T}] 2C
 H_{x} (x^{T}x) =
2I
 [x: Complex] H_{x}
f = (d/dx
(df/dx)^{H})^{T} =
d/dx^{C}
(df/dx)^{T}
 H_{x} f is hermitian
 H_{x}
(Ax+b)^{H}C(Ax+b) =
[C=C^{H}]
(A^{H}CA)^{T}
[2.14]
 H_{x} (x^{H}Cx) =
[C=C^{H}]
C^{T}
This page is part of The Matrix Reference
Manual. Copyright © 19982005 Mike Brookes, Imperial
College, London, UK. See the file gfl.html for copying
instructions. Please send any comments or suggestions to "mike.brookes" at
"imperial.ac.uk".
Updated: $Id: calculus.html 3437 20130916 14:55:24Z dmb $