Matrix Calculus
Go to: Introduction, Notation, Index
Contents of Calculus Section
Notation
 j is the square root of 1
 X^{R} and X^{I} are the real
and imaginary parts of X = X^{R} +
jX^{I}
 X^{C} is the complex conjugate of X
 X: denotes the long column vector formed by concatenating the
columns of X (see vectorization).
 A ⊗ B = KRON(A,B), the kroneker product
 A • B the Hadamard or elementwise product
 matrices and vectors A, B, C do not depend on
X
 I_{n} = I_{[n#n]} the
n#n identity matrix
 T_{m,n} = TVEC(m,n) is the vectorized
transpose matrix, i.e.
X^{T}:=T_{m,n}X: for
X_{[m,n]}
 ∂Y/∂X and
∂Y/∂X^{C} are partial derivatives with
X^{C} and X respectively held constant (note
that
X^{H}=(X^{C})^{T})
 ∂Y/∂X^{R} and
∂Y/∂X^{I} are partial derivatives with
X^{I} and X^{R} respectively
held constant
In the main part of this page we express results in terms of differentials
rather than derivatives for two reasons: they avoid notational disagreements
and they cope easily with the complex case. In most cases however, the
differentials have been written in the form dY: =
dY/dX dX: so that the corresponding
derivative may be easily extracted.
Derivatives with respect to a real matrix
If X is p#q and Y is m#n, then
dY: = dY/dX dX: where
the derivative dY/dX is a large mn#pq
matrix. If X and/or Y are column vectors or scalars, then the
vectorization operator : has no effect and may be omitted.
dY/dX is also called the Jacobian Matrix of
Y: with respect to X: and det(dY/dX)
is the corresponding Jacobian. The Jacobian occurs when changing
variables in an integration:
Integral(f(Y)dY:)=Integral(f(Y(X))
det(dY/dX) dX:).
Although they do not generalise so well, other authors use alternative
notations for the cases when X and Y are both vectors or when one
is a scalar. In particular:
 dy/dx is sometimes written as a column vector rather
than a row vector
 dy/dx is sometimes transposed from the above
definition or else is sometimes written
dy/dx^{T} to emphasise the
correspondence between the columns of the derivative and those of
x^{T}.
 dY/dx and dy/dX are often written
as matrices rather than, as here, a column vector and row vector respectively.
The matrix form may be converted to the form used here by appending : or
:^{T} respectively.
Derivatives with respect to a complex matrix
If X is complex then dY: =
dY/dX dX: can only be generally true
iff Y(X) is an analytic function. This
normally implies that Y(X) does not depend explicitly on
X^{C} or X^{H}.
Even for nonanalytic functions we can treat X and
X^{C} (with
X^{H}=(X^{C})^{T})
as distinct variables and write uniquely dY: =
∂Y/∂X dX: +
∂Y/∂X^{C}
dX^{C}: provided that Y is
analytic with respect to X and X^{C} individually
(or equivalently with respect to X^{R} and
X^{I} individually). ∂Y/∂X
is the Generalized Complex
Derivative and ∂Y/∂X^{C} is the
Complex Conjugate
Derivative [R.4, R.9]; their properties are studied in
Wirtinger Calculus.
We define the generalized derivatives in terms of partial derivatives with
respect to X^{R} and X^{I}:
 ∂Y/∂X = ½
(∂Y/∂X^{R}  j
∂Y/∂X^{I})
 ∂Y/∂X^{C} =
(∂Y^{C}/∂X)^{C} =
½ (∂Y/∂X^{R} + j
∂Y/∂X^{I})
We have the following relationships for both analytic and nonanalytic
functions Y(X):
 The following are equivalent
ways of saying that Y(X) is analytic:
 Y(X) is an analytic function of X
 dY: = ∂Y/∂X dX:
 ∂Y/∂X^{C} = 0 for all
X
 ∂Y/∂X^{R} + j
∂Y/∂X^{I} = 0 for all X
(these are the Cauchy Riemann equations)
 dY: = ∂Y/∂X dX: +
∂Y/∂X^{C}
dX^{C}:
 ∂Y/∂X^{R} =
∂Y/∂X +
∂Y/∂X^{C}
 ∂Y/∂X^{I} = j
(∂Y/∂X 
∂Y/∂X^{C})
 Chain rule: If Z is a function of Y which is itself a
function of X, then ∂Z/∂X =
∂Z/∂Y ∂Y/∂X. This is the same
as for real derivatives.
 Realvalued: If Y(X) is real for all complex X,
then
 ∂Y/∂X^{C}=
(∂Y/∂X)^{C}
 dY: = 2(∂Y/∂X
dX:)^{R}
 If Y(X) is real for all complex X and
W(X) is analytic and if
W(X)=Y(X) for all realvalued X, then
∂W/∂X = 2
(∂Y/∂X)^{R} for all real X
 Example: If C=C^{H},
y(x)=x^{H}Cx and
w(x)=x^{T}Cx, then
∂y/∂x = x^{H}C and
∂w/∂x =
2x^{T}C^{R}
Suppose f(X) is a scalar real function of a complex matrix (or
vector), X, and g(X) is a
complexvalued vector function of X. To minimize
f(X) subject to
g(X)=0, we use complex
Lagrange multipliers, k, and minimize
f(X)+k^{H}g(X)+k^{T}g(X)^{C}
subject to g(X)=0. Hence we
solve
∂f/∂X+k^{H}∂g/∂X+k^{T}∂g^{C}/∂X=0^{T}
subject to g(X)=0.
 Example: If
f(x)=x^{H}Sx and
g(x)=a^{H}x1
where S=S^{H},
then
∂^{f}/∂x+k^{H}∂g/∂x+k^{T}∂g^{C}/∂x=x^{H}S+ka^{H}+0^{T}=0^{T}
which, taking the conjugate, implies
Sx+k^{C}a=0
from which
x=k^{C}S^{1}a.
Substituting this into the constraint,
g(x)=a^{H}x1=0,
gives
k^{C}a^{H}S^{1}a
= 1 from which
k=(a^{H}S^{1}a)^{1}.
Substituting this back into the expression for x gives
x =
S^{1}a(a^{H}S^{1}a)^{1}.
If f(X) is a real function of a complex matrix (or vector),
X, then ∂f/∂X^{C}=
(∂f/∂X)^{C} and we can define the
complexvalued column vector grad(f(X)) = 2
(∂f/∂X)^{H} =
(∂f/∂X^{R}+j
∂f/∂X^{I})^{T} as
the Complex Gradient Vector [R.9]
with the properties listed below. If we use <> to represent the vector
mapping associated with the ComplextoReal isomporphism, and
X_{[m#n]}: <>
y_{[2mn]} where y is real, then
grad(f(X)) <> grad(f(y))
where the latter is the conventional grad function from vector calculus.
 grad(f(X)) is zero at an extreme value of f
.
 grad(f(X)) points in the direction of steepest slope
of f(x)
 The magnitude of the steepest slope is equal to
grad(f(X)). Specifically, if g(X) =
grad(f(X)), then lim_{a>0}
a^{1}( f(X+ag(X)) 
f(X) ) =  g(X) ^{2}
 grad(f(X)) is normal to the surface f(X)
= constant which means that it can be used for gradient ascent/descent
algorithms.
 If f(X)=y^{H}y, then
grad(f(X))=2(∂y/∂X)^{H}y+2(∂y/∂X^{C})^{T}y^{C}
Basic Properties
 We may write the following differentials unambiguously without parentheses:
 Transpose:
dY^{T}=d(Y^{T})=(dY)^{T}
 Hermitian Transpose:
dY^{H}=d(Y^{H})=(dY)^{H}
 Conjugate:
dY^{C}=d(Y^{C})=(dY)^{C}
 Linearity:
d(Y+Z)=dY+dZ
 Chain Rule: If Z
is a function of Y which is itself a function of X, then for both
the normal and the generalized complex derivative:
dZ: = dZ/dY dY: =
dZ/dY dY/dX
dX:
 Product Rule: d(YZ) =Y dZ +
dY Z
 d(YZ): = (I ⊗ Y)
dZ: + (Z^{T} ⊗ I)
dY: = ((I ⊗ Y)
dZ/dX + (Z^{T} ⊗
I) dY/dX ) dX:
 Hadamard Product:
d(Y • Z) =Y • dZ +
dY • Z
 Kroneker Product: d(Y
⊗ Z) =Y ⊗ dZ + dY
⊗ Z
Differentials of Linear
Functions
 d(Ax) =
d(x^{T}A^{T}):
=A dx
 d(x^{T}a) =
d(a^{T}x) = a^{T}
dx
 d(bx^{T}a) =
ba^{T} dx
 d(AXB): =
(A dX B): =
(B^{T} ⊗ A) dX:
 d(a^{T}Xb) = (b ⊗
a)^{T} dX: =
(ab^{T}):^{T} dX:
 d(a^{T}Xa) =
d(a^{T}X^{T}a) =
(a ⊗ a)^{T} dX: =
(aa^{T}):^{T}
dX:
 [X_{[m#n]}]
d(AX): = (I_{n}
⊗ A) dX:
 [X_{[m#n]}]
d(XB): = (dX B): =
(B^{T} ⊗ I_{m})
dX:
 [x_{[n]}]
d(xb^{T}): =
(dx b^{T}): = (b ⊗
I_{n}) dx
 d(AX^{T}B): =
(B^{T} ⊗ A)
dX^{T}:
 d(a^{T}X^{T}b) =
(a ⊗ b)^{T} dX:
= (ab^{T}):^{T}
dX^{T}:=
(ba^{T}):^{T}
dX:
 d(x) =
x^{1}x^{T}
dx
 [x: Complex] d
(x^{H}A): = A^{T}
dx^{C}
 d(X_{[m#n]} ⊗
A_{[p#q]}): = (I_{n}
⊗ T_{q,m} ⊗
I_{p})(I_{mn} ⊗ A:)
dX: = (I_{nq} ⊗
T_{m,p} )(I_{n} ⊗ A:
⊗ I_{m}) dX:
 d(A_{[p#q]} ⊗
X_{[m#n]}): = (I_{q}
⊗ T_{n,p} ⊗
I_{m})(A: ⊗ I_{mn})
dX: = (T_{m,n} ⊗
I_{pq} )(I_{n} ⊗ A:
⊗ I_{m}) dX:
Differentials of Quadratic
Products
 d(Ax+b)^{T}C(Dx+e)
= ((Ax+b)^{T}CD +
(Dx+e)^{T}C^{T}A)
dx
 d(x^{T}Cx) =
x^{T}(C+C^{T})dx
= [C=C^{T}]
2x^{T}Cdx
 d(Ax+b)^{T} (Dx+e) =
( (Ax+b)^{T}D +
(Dx+e)^{T}A)dx
 d(Ax+b)^{T} (Ax+b) =
2(Ax+b)^{T}Adx
 d(Ax+b)^{T}C(Ax+b)
= [C=C^{T}]
2(Ax+b)^{T}CA dx
 d(Ax+b)^{H}C(Dx+e)
= (Ax+b)^{H}CD dx +
(Dx+e)^{T}C^{T}A^{C}
dx^{C}
 d (x^{H}Cx)
=x^{H}C dx
+x^{T}C^{T}
dx^{C} = [C=C^{H}]
2(x^{H}C dx)^{R}
 d (x^{H}x) =
2(x^{H} dx)^{R}
 d(a^{T}X^{T}Xb) =
X(ab^{T} +
ba^{T}):^{T} dX:
 d(a^{T}X^{T}Xa) =
2(Xaa^{T} ):^{T}
dX:
 d(a^{T}X^{T}CXb)
= (C^{T}Xab^{T} +
CXba^{T}):^{T} dX:
 d(a^{T}X^{T}CXa)
= ((C + C^{T})Xaa^{T}
):^{T} dX: = [C=C^{T}]
2(CXaa^{T}):^{T}
dX:
 d((Xa+b)^{T}C(Xa+b)) =
((C+C^{T})(Xa+b)a^{T}
):^{T} dX:
 [X_{[n#n]}]
d(X^{2}): = (XdX + dX X):
= (I_{n} ⊗ X + X^{T}
⊗ I_{n}) dX:
 [X_{[m#n]}]
d(X^{T}CX): =
(I_{n} ⊗ X^{T}C)
dX: + (X^{T}C^{T}
⊗ I_{n})
dX^{T}: = (I_{n}
⊗
X^{T}C+T_{n,n}(I_{n}⊗
X^{T}C^{T})) dX:
 [X_{[m#n]},
C_{[m#m]}=C^{T}]
d(X^{T}CX): =
(I_{n×n}+T_{n,n})(I_{n}⊗
X^{T}C) dX:
 [X_{[m#n]}]
d(X^{T}X): =
(I_{n} ⊗ X^{T})
dX: + (X^{T} ⊗
I_{n}) dX^{T}: =
(I_{n×n} +
T_{n,n})(I_{n} ⊗
X^{T}) dX:
 [X_{[m#n]}]
d(X^{H}CX): =
(X^{H}CdX): +
(d(X^{H}) CX): =
(I_{n} ⊗
X^{H}C) dX: +
(X^{T}C^{T} ⊗
I_{n}) dX^{H}:
 grad((Ax+b)^{H}(Ax+b))
= 2A^{H}(Ax+b)
Differentials of Cubic
Products
 d(xx^{T}Ax) =
(xx^{T}(A+A^{T})+x^{T}Ax×I
)dx
 d(xx^{T}x) =
(2xx^{T}+x^{T}x×I
)dx
 [X_{[m#n]}]
d(XAX^{T}BX): =
(X^{T}B^{T}XA^{T}
⊗ I_{m} + I_{n} ⊗
XAX^{T}B) dX: +
(X^{T}B ⊗ XA)
dX^{T}: =
(X^{T}B^{T}XA^{T}
⊗ I_{m} + T_{n,m}(XA
⊗ X^{T}B) +
I_{n} ⊗ XAX^{T}B)
dX:
 [X_{[m#n]}]
d(XX^{T}X): =
(X^{T}X ⊗ I_{m} +
I_{n} ⊗ XX^{T})
dX: + (X^{T} ⊗ X)
dX^{T}: = (X^{T}X
⊗ I_{m} + T_{n,m}(X
⊗ X^{T}) + I_{n}⊗
XX^{T}) dX:
 [X_{[m#n]}]
d(XAXBX): =
(X^{T}B^{T}X^{T}A^{T}
⊗ I_{m} +
X^{T}B^{T} ⊗ XA +
I_{n} ⊗ XAXB) dX:
 [X_{[n#n]}]
d(X^{3}): =
((X^{T})^{2} ⊗
I_{n} + X^{T} ⊗ X
+ I_{n} ⊗ X^{2})
dX:
Differentials of Inverses
 d(X^{1}) = X^{1}dX
X^{1} [2.1]
 d(X^{1}): =
(X^{T} ⊗ X^{1})
dX:
 d(a^{T}X^{1}b) = 
(X^{T}ab^{T}X^{T} ):^{T}
dX: =  (ab^{T}):^{T}
(X^{T} ⊗ X^{}^{1})
dX: [2.8]
 d(tr(A^{T}X^{1}B)) =
d(tr(B^{T}X^{T}A)) =
(X^{T}AB^{T}X^{T}):^{T}
dX: = (AB^{T}):^{T}
(X^{T} ⊗ X^{1})
dX:
Differentials of Trace
Note: matrix dimensions must result in an n*n argument for tr().
 d(tr(Y))=tr(dY)
 d(tr(X)) = d(tr(X^{T})) =
I:^{T} dX: [2.4]
 d(tr(X^{k}))
=k(X^{k}^{1})^{T}:^{T}
dX:
 d(tr(AX^{k})) =
(SUM_{r=0:k1}(X^{r}AX^{kr}^{1})^{T}
):^{T} dX:
 d(tr(AX^{1}B)) =
(X^{1}BAX^{1})^{T}:^{T}
dX:=
(X^{T}A^{T}B^{T}X^{T}):^{T}
dX: [2.5]
 d(tr(AX^{1}))
=d(tr(X^{1}A)) =
(X^{T}A^{T}X^{T}
):^{T} dX:
 d(tr(A^{T}XB^{T})) =
d(tr(BX^{T}A)) =
(AB):^{T}
dX: [2.4]
 d(tr(XA^{T})) =
d(tr(A^{T}X))
=d(tr(X^{T}A)) =
d(tr(AX^{T})) =
A:^{T} dX:

d(tr(A^{T}X^{1}B^{T}))
= d(tr(BX^{T}A)) =
(X^{T}ABX^{T}):^{T}
dX: = (AB):^{T}
(X^{T} ⊗ X^{1})
dX:
 d(tr(AXBX^{T}C)) =
(A^{T}C^{T}XB^{T}
+ CAXB):^{T} dX:
 d(tr(XAX^{T})) =
d(tr(AX^{T}X)) =
d(tr(X^{T}XA)) =(
X(A+A^{T})):^{T}
dX:
 (tr(X^{T}AX)) =
d(tr(AXX^{T})) =
d(tr(XX^{T}A)) =
((A+A^{T})X):^{T}
dX:
 d(tr(XX^{T})) =
d(tr(X^{T}X)) = 2X:^{T}
dX:
 d(tr(AXBX)) =
(A^{T}X^{T}B^{T}
+
B^{T}X^{T}A^{T}
):^{T} dX:
 d(tr((AXb+c)(AXb+c)^{T})
=
2(A^{T}(AXb+c)b^{T}):^{T}
dX:
 [C=C^{T}]
d(tr((X^{T}CX)^{1}A) =
d(tr(A (X^{T}CX)^{1}) =
((CX(X^{T}CX)^{1})(A+A^{T})(X^{T}CX)^{1}):^{T}
dX:
 [B=B^{T},
C=C^{T}]
d(tr((X^{T}CX)^{1}(X^{T}BX))
= d(tr(
(X^{T}BX)(X^{T}CX)^{1})
=
2(BX(X^{T}CX)^{1}(CX(X^{T}CX)^{1})X^{T}BX(X^{T}CX)^{1}
):^{T} dX:
 [D=D^{H}]
d(tr((AXB+C)D(AXB+C)^{H}))
=
((2A^{H}(AXB+C)DB^{H}):^{H}
dX:)^{R} [2.6]

d(tr((AXB+C)(AXB+C)^{H}))
=
((2A^{H}(AXB+C)B^{H}):^{H}
dX:)^{R}
 [D=D^{H}]
d(tr(XDX^{H})) =
((2XD):^{H}
dX:)^{R}
 d(tr(XX^{H})) =
(2X:^{H} dX:)^{R}
Trace Minimization
 [D=D^{H}]
argmin_{X}{tr((AXB+C)D(AXB+C)^{H}}
=
(A^{H}A)^{1}A^{H}CDB^{H}(BDB^{H})^{1} [2.7]
Note: matrix dimensions must result in an n#n argument for
det(). Some of the expressions below involve inverses: these forms apply only
if the quantity being inverted is square and nonsingular; alternative forms
involving the adjoint, ADJ(), do not have
the nonsingular requirement.
 d(det(X)) = d(det(X^{T}))
= ADJ(X^{T}):^{T}
dX: = det(X)
(X^{T}):^{T}
dX: [2.9]
 d(det(A^{T}XB)) =
d(det(B^{T}X^{T}A)) =
(A ADJ(A^{T}XB)^{T}B^{T}):^{T}
dX: = [A,B:
nonsingular] det(A^{T}XB) ×
(X^{T}):^{T} dX:
[2.10]
 d(ln(det(A^{T}XB))) = [A,B: nonsingular]
(X^{T}):^{T} dX:
[2.11]
 d(ln(det(X))) =
(X^{T}):^{T}
dX:
 d(det(X^{k})) =
d(det(X)^{k}) = k ×
det(X^{k}) ×
(X^{T}):^{T} dX:
[2.12]
 d(ln(det(X^{k}))) = k
× (X^{T}):^{T}
dX:
 d(det(X^{T}CX)) = [C=C^{T}] 2(CX ADJ(X^{T}CX)):^{T}
dX: =
2det(X^{T}CX)×(CX(X^{T}CX)^{1}):^{T}
dX: [2.13]
 = [C=C^{T},
CX: nonsingular]
2det(X^{T}CX)×(X^{T}):^{T}
dX:
 d(ln(det(X^{T}CX))) = [C=C^{T}]
2(CX(X^{T}CX)^{1}):^{T}
dX:
 = [C=C^{T},
CX:
nonsingular] 2(X^{T}):^{T}
dX:
 d(det(X^{H}CX)) =
det(X^{H}CX) ×
(C^{T}X^{C}
(X^{T}C^{T}X^{C})^{1}):^{T}dX:
+
(CX(X^{H}CX)^{1}):^{T}
dX^{C}:) [2.14]
 d(ln(det(X^{H}CX))) =
(C^{T}X^{C}
(X^{T}C^{T}X^{C})^{1}):^{T}dX:
+
(CX(X^{H}CX)^{1}):^{T}
dX^{C}: [2.15]
dY/dX is called the Jacobian Matrix
of Y: with respect to X: and
J_{X}(Y)=det(dY/dX) is
the corresponding Jacobian. The Jacobian occurs when changing variables
in an integration:
Integral(f(Y)dY:)=Integral(f(Y(X))
det(dY/dX) dX:).

J_{X}(X_{[n#n]}^{1})=
(1)^{n}det(X)^{2n}
Hessian matrix
If f is a real function of x then the Hermitian matrix H_{x}
f = (d/dx
(df/dx)^{H})^{T} is
the Hessian matrix of f(x). A value of x for which
grad f(x) = 0 corresponds to a minimum, maximum or
saddle point according to whether H_{x} f is
positive definite, negative definite or indefinite.
 [Real] H_{x} f
= d/dx (df/dx)^{T}
 H_{x} f is symmetric
 H_{x} (a^{T}x) = 0
 H_{x}
(Ax+b)^{T}C(Dx+e) =
A^{T}CD +
D^{T}C^{T}A
 H_{x} (Ax+b)^{T}
(Dx+e) = A^{T}D +
D^{T}A
 H_{x}
(Ax+b)^{T}C(Ax+b) =
A^{T}(C + C^{T})A =
[C=C^{T}]
2A^{T}CA
 H_{x} (Ax+b)^{T}
(Ax+b) = 2A^{T}A
 H_{x} (x^{T}Cx) =
C+C^{T} = [C=C^{T}] 2C
 H_{x} (x^{T}x) =
2I
 [x: Complex] H_{x}
f = (d/dx
(df/dx)^{H})^{T} =
d/dx^{C}
(df/dx)^{T}
 H_{x} f is hermitian
 H_{x}
(Ax+b)^{H}C(Ax+b) =
[C=C^{H}]
(A^{H}CA)^{T} [2.16]
 H_{x} (x^{H}Cx) =
[C=C^{H}]
C^{T}
This page is part of The Matrix Reference
Manual. Copyright Â© 19982005 Mike Brookes, Imperial
College, London, UK. See the file gfl.html for copying
instructions. Please send any comments or suggestions to "mike.brookes" at
"imperial.ac.uk".
Updated: $Id: calculus.html 10071 20170822 14:26:16Z dmb $