Stochastic Matrices

Notation

In all the expressions below, x is a vector of real or complex random variables with whose mean vector and covariance matrix are given by: <x> = m and Cov(x)=<(x-m)(x-m)^H> = S. We define the real-valued vector of variances s=diag(S).

•, ÷, ^•2, ^•^½, abs() and exp() are elementwise operators for multiplication, division, square, square root, absolute value and exponentiation. ⊗ denotes the Kroneker product and j dentotes (-1)^½. <Y> denotes the expected value of Y. ||y|| = √(y^Hy) is the Euclidean vector norm and |Y| is the matrix determinant.

Vectors and matrices a, A, b, B, c, C, d and D are constant (i.e. not dependent on x).

General Properties

The covariance matrix S is Hermitian and positive semi-definite.
S is strictly positive definite unless there is a deterministic relation between the elements of x of the form a^Hx = 0 for some non-zero a.
If the elements of x are uniformly spaced samples from a continuous signal, then S is Toeplitz.
The symmetric correlation coefficient matrix (also called correlation matrix) is Corr(x) = S÷(ss^T)^½ = DIAG(s^•-^½) S DIAG(s^•-^½).
WARNING: Correlation matrix is also used by some to dentote the matrix <xx^T> = S + mm^T.
- The Correlation Coefficient between x_i and x_j equals Corr(x)_i,j = <x_i x_j>/âˆš(<x_i²><x_i²>) and has magnitude ≤ 1. [5.11]
The precision matrix is T = S^-1.

Special Distributions

The expressions for cubic and quartic expectations given below are restricted to the following special distributions:

Independent

[x:Real Independent] means that the components of x are real and independent. In particular, we require that <x_i^px_k^q≥<x_i^p><x_k^q>. We define the r^th moment, m_r=<(x-m)^•r>, where the r^th power of the vector is elementwise. Note that S=DIAG(m₂) and m₁=0.

Real Gaussian

[x:Real Gaussian] means that the components of x_[n] are Real and have a multivariate Gaussian pdf: x ~ N(x ; m, S) = |2πS|^-½exp( -½(x-m)^T S^-1 (x-m) ) where S is symmetric and +ve semidefinite.
- ln(p(x)) = -½ ln(det(2π×S)) - ½(x-m)^T S^-1 (x-m)
- max_x(N(x ; m, S)) = N(m ; m, S) = |2πS|^-½ = (2π)^-½n|S|^-½ where |.| denotes the determinant
If x is both Gaussian and Independent then the r^th moment is m_r = 0 for odd r and m_r = diag( (½S)^½^r r! / (½r)!) for even r.
N(x ; m, S) = N(m ; x, S) = |A| N(Ax+b ; Am+b, ASA^T) for any b and non-singular A.
- N(x ; m, S) = aⁿ N(ax+b ; am+b, a²S) for any b and non-zero a.
- N(x ; m, S) = N(-m ; -x, S) = N(m ; x, S) = N(x+a ; m+a, S) for any a.
N(Ax+b; m, S) = N(x; u, C) × N(b; m, S) / N(0; u, C) where C = (A^TS^-1A)^-1 and u = CA^TS^-1(m-b) for any A (not necessarily square) with full column rank.
- N(ax+b; m, S) = N(x; u, c²) × N(b; m, S) / N(0; u, c²) where c² = (a^TS^-1a)^-1 and u = c²a^TS^-1(m-b)
If x ~ N(x; m, S) then
- y = Ax+b ~ N(y; Am+b, ASA^T) [5.10]
  - y = ax+b ~ N(y; am+b, a²S)
- y = F^-T(x-m) ~ N(y; 0, I) where F^TF= S. It is not necessary for F to be symmetric but it can always be chosen to be [see Hermitian].
  - If SQ = QD with Q an orthonormal set of eigenvectors and D diagonal the corresponding positive eigenvalues, then we can define F= D^½Q^T giving F^-T= QD^-½.
- x | A^Tx=b ~ N(x; (I-HA^T)m+Hb, (I-HA^T)S) where H=SA(A^TSA)⁺ where ()⁺ denotes the pseudoinverse or, if non-singular, the inverse. The symmetry of the covariance may be shown explicitly by writing (I-HA^T)S) = S - SA(A^TSA)⁺A^TS. [5.9]
  - x | a^Tx=b ~ N(x; m+(b - a^Tm)(a^TSa)^-1×Sa, S - (a^TSa)^-1×Saa^TS)
Orthant Probabilities
- Define P(x>m) to be the probability that all elements of x exceed the corresponding element of m and define s_i,j and s_i to be elements of S and s respectively. The following formulae do not generalize to n>3:
  - [n=2]: P(x>m) = 0.25+(2π)^-1arcsin(s_1,2(s₁s₂)^-½)
  - [n=3]: P(x>m) = 0.125+(4π)^-1(arcsin(s_1,2(s₁s₂)^-½) + arcsin(s_1,3(s₁s₃)^-½) + arcsin(s₂_,3(s₂s₃)^-½))
Joint Distribution
If [x; y] ~ N([x; y]; [p; q], [P R^T; R Q]) then in the sections below, we define the regression coefficient matrix of x on y as F=R^TQ^-1 .
- Linear Sum
  - z = Ax+By+c ~ N(z; Ap+Bq+c, APA^T + BRA^T + AR^TB^T + BQB^T)
- Conditional Distributions
  - x | y ~ N(x; p+F(y-q), P - FR). [5.8]
    - The mean, p+F(y-q), is the regression function of x on y.
    - The covariance, C = P - FR is is the Schur complement of Q in [P R^T; R Q].
  - y | x ~ N(y; q+RP^-1(x-p), Q - RP^-1R^T). The covariance is the Schur complement of P in [P R^T; R Q].
- Independence: The following are equivalent:
  1. x and y are independent
  2. R = 0
  3. W = 0 in the precision matrix: T = S^-1 = [P R^T; R Q]^-1 = [U W^T; W V]
  4. N([x; y]; [p; q], [P R^T; R Q]) = N(x; p, P) × N(y; q, Q)
- Multiple Correlation Coefficients
  The vector of multiple correlation coefficients between x and y has the same dimension as x and is given by g_x|y = âˆš(diag(FR ÷ P) ) where the âˆš() function and ÷ are elementwise.
  - The minimum (over A) of tr(Cov(x - A^Ty)) is obtained when A = F^T and is equal to tr(P - FR).
  - The maximum (over a) of the correlation between x_i and a^Ty is obtained when a = (F^T)_i = Q^-1r_i and is equal to (g_x|y)_i. [5.13]
    - All elements of g_x|y lie in the range 0 to 1.
  - diag(Cov(x|y) ÷ Cov(x)) = diag((P - FR) ÷ P) = (1- g_x|y • g_x|y) where • and ÷ denote elementwise multiplication and division.
    - Var(x_i|y) = (1-(g_x|y)_i²) Var(x_i) showing that conditioning reduces variance.
  - (g_x|y)_i² = 1 - p_ii^-1 det([p_ii , r_i^T; r_i , Q]) det(Q)^-1 [5.14]
- Precision Matrix:
  The precision matrix T = S^-1 = [P R^T; R Q]^-1 = [U W^T; W V].
  - If we write the and define, then [3.5]
    - U = Cov(x | y)^-1 = (P - FR)^-1
    - W = -F^TU
    - V = Q^-1+F^TUF
  - The elements of T are given by
    - t_ii = Var(x_i | x_\i) where x_\i denotes the vector x with x_i deleted.
    - t_ij = -Corr(x_i, x_j | x_\i,j)×(t_ii t_jj)^½
      - t_ij = 0 iff x_i and x_j are conditionally independent given x_\i,j.
  - If I, J and K are a partitioning of the indices 1:n, then
    - The submatrix T_I,J = 0 iff x_I and x_J are conditionally independent: p(x_I, x_J | x_K) = p(x_I | x_K) × p(x_J | x_K)
Product of Gaussians:
- N(x ; a, A) N(x ; b, B) = N(a ; b , A+B) × N(x ; c, C) [5.1] where
  - C = (A^-1+B^-1)^-1 = A(A+B)^-1B = B(A+B)^-1A = (I - K)A
  - c = C(A^-1a+B^-1b) = A(A+B)^-1b + B(A+B)^-1a = a + K(b - a)
  - K = CB^-1 = A(A+B)^-1 called the Kalman Gain in a Kalman filer.
Power of Gaussian:
- N(x ; a, A)^m = |2πm^1/(m-1)A|^½(1-m) × N(x ; a, m^-1A) [5.2]
  - N(x ; a, A)² = |4πA|^-½ × N(x ; a, ½A)
  - N(x ; a, A)^½ = |8πA|^¼ × N(x ; a, 2A)
Quotient of Gaussians:
- N(x ; c, C) / N(x ; a, A) = N(x ; b, B) / N(a ; b , A+B) provided (A-C) is non-singular, where
  - B = (C^-1-A^-1)^-1 = A(A-C)^-1C = C(A-C)^-1A
  - b = B(C^-1c-A^-1a) = A(A-C)^-1c - C(A-C)^-1a
Convolution of Gaussians
- N(x ; a, A) *N(x ; b, B) = ∫N(t ; a, A) N(x-t ; b, B)dt = N(x ; a+b, A+B)
Exponential times Gaussian
- exp(a^Tx) N(x; m, S) = exp(½a^T(2m+Sa)) N(x; m+Sa, S).
Truncated Gaussian
- Suppose y ~ kN(y; m, S) is restricted to the domain satisfying b< a^Ty <c with k a normalizing constant. Then E(y) = m+grSa and Cov(y) = S - g²vSa(Sa)^T where k=1/(F(q)-F(p)), g=1/âˆš(a^TSa), p=g(b-a^Tm), q=g(c-a^Tm), r= (f(p)-f(q))/(F(q)-F(p)), v=(q f(q) - p f(p))/(F(q) - F(p)) + r² and where f(q) and F(q) are the pdf and cdf respectively of a standard 1-dimensional Gaussian with f(q)=dF/dq. [5.22]
  - Suppose y ~ kN(y; m, S) is restricted to the domain satisfying a^Ty <c with k a normalizing constant. Then E(y) = m+grSa and Cov(y) = S - g²vSa(Sa)^T where k=1/F(q), g=1/âˆš(a^TSa), q=g(c-a^Tm), r= -f(q)/F(q), v= r² - rq.
Characteristic Function and Generating Functions:
- The characteristic function of x is a function of the real vector t and is ÃƒÂ¸(t) = <exp(jt^Tx)> = <cos(t^Tx)) + j <sin(t^Tx)> = exp(jt^Tm - ½t^TSt) where j=âˆš(-1).
- The moment generating function of x is a function of the real vector t and is M(t) = <exp(t^Tx)> = exp(t^Tm + ½t^TSt). The moments of x are the derivatives of M(t) evaluated at t=0.
- The cumulant generating function of x is a function of the real vector t and is g(t) = ln <exp(t^Tx)> = t^Tm + ½t^TSt. The cumulants of x are the derivatives of g(t) evaluated at t=0.
Exponential of Gaussian (Lognormal distribution):
The random vector exp(Ax+b), where A and b may be complex and exp() operates elementwise, follows a multivariate lognormal distribution with the following means and covariances:
- <exp(Ax+b)> = exp(Am + b + ½ diag(ASA^T))
  - <exp(a^Tx+b)> = exp(a^Tm + b + ½a^TSa)
  - <exp(ax+b)> = exp(am + b + ½a²s) where s=diag(S)
  - <exp(jx)> = exp(jm - ½s) where s=diag(S)
- Cov(exp(Ax+b)) = (<exp(Ax+b)> <exp(Ax+b))^H > • (exp(ASA^H) - 1) = (exp(Am + b + ½ diag(ASA^T)) exp(Am + b + ½ diag(ASA^T))^H ) • (exp(ASA^H) - 1)
- Cov(exp(a^Tx+b)) = (<exp(a^Tx+b)) <exp(a^Tx+b))^H > • (exp(a^HSa) - 1) = (exp(a^Tm + b + ½ diag(a^TSa)) exp(a^Tm + b + ½ diag(a^TSa))^H ) • (exp(a^HSa) - 1)
- Cov(exp(ax+b)) = (<exp(ax+b)> <exp(ax+b)>^H ) • (exp(|a|²S) - 1) = (exp(am + b + ½a²s) exp(am + b + ½a²s)^H ) • (exp(|a|²S) - 1)
- Cov(exp(jx)) = (<exp(jx)> <exp(jx)>^H ) • (exp(S) - 1) = (exp(jm - ½s) exp(jm - ½s)^H ) • (exp(S) - 1)
Entropy
- The differential entropy of x is h(x) = -E{ln(p(x)} = ½ ln(|2 π e S|) nats = ½ log₂(|2 π e S|) bits.
Mutual Information
- If u and v with covariance matrices P and Q respectively are independent of each other and of x then
  I(Ax+u, Bx+v) = ½ log₂(|ASA^T+P|×|BSB^T+Q|/|[ASA^T+P ASB^T; BSA^T BSB^T+Q]|) = -½ log₂(det(I - (BSB^T+Q)^-1BSA^T(ASA^T+P)^-1ASB^T)) bits
  - I(Ax+u, b^Tx+v) = ½ log₂(det(ASA^T+P)(b^TSb+q)/det([ASA^T+P ASb; b^TSA^T b^TSb+q])) = -½ log₂(1 - b^TSA^T(ASA^T+P)^-1ASb / (b^TSb+q)) bits
  - I(Ax, Bx) = ½ log₂(det(ASA^T)det(BSB^T)/det([A; B] S [A; B]^T)) = -½ log₂(det(I - (BSB^T)^-1BSA^T(ASA^T)^-1ASB^T)) bits.
Cramer-Rao bound
Suppose m_[n#1] and S_[n#n] are functions of a parameter vector q_[p#1] and that we take k independent samples of x to form the columns of a data matrix X_[n#k]. In the expressions below, ⊗ denotes the kroneker product, : denotes vectorization and dS/dq is a matrix of dimension n²#p (see derivatives).
- ln(p(X)) = -½k ln(det(2π×S)) - ½ tr((X-M)^T S^-1 (X-M)) where M_[n#k] = m×1_[k#1]^T.
- The Fisher Score vector, v, is defined by v^T = d/dq (ln(p(X))
  = 1_[k#1]^T(X-M)^TS^-1 dm/dq - ½(k S^-1 - S^-1(X-M)(X-M)^TS^-1):^T dS/dq
  = 1_[k#1]^T(X-M)^TS^-1 dm/dq - ½(k S^-1:^T - ((X-M)(X-M)^T):^T (S^-1 ⊗ S^-1)) dS/dq [5.15]
  - <v≥ 0
  - [k=1] v^T = d/dq (ln(p(X)) = (x-m)^TS^-1 dm/dq - ½(S^-1 - S^-1(x-m)(x-m)^TS^-1):^T dS/dq
- The Fisher Information Matrix is defined by J = <vv^T> = k ((dm/dq)^T S^-1 dm/dq + ½(dS/dq)^T (S ⊗ S)^-1 dS/dq ) [5.16]
  - The i,j element of J is given by J_ij = k ((dm/dq_i)^T S^-1 dm/dq_j + ½ tr(R_iS^-1R_jS^-1)) where R_i satisfies R_i: = dS/dq_i
- Cramer-Rao bound: If f_[r#1](X) is a function of X with mean value g(q), then Cov(f) ≥ dg/dq J^-1 (dg/dq)^T where ≥ represents the Loewner partial order. [5.17]
  - If g(q) = aq then Cov(f) ≥ a² J^-1

Complex Gaussian

Definition: In this section, ≤> represents the Complex-to-Real isomporphism in which we replace each complex element, z, of a complex matrix C by a 2#2 real matrix [z^R -z^I; z^I z^R]=|z|×[cos(t) -sin(t); sin(t) cos(t)] where t=arg(z). <-> represents the corresponding vector mapping in which we replace each complex element, z, of a complex vector c by a 2#1 real vector [z^R; z^I ].

[x_[n]:Complex Gaussian] means that if x_[n] <-> y_[2n] , then y ~ N(y ; a, ½K) for some complex m_[n] <-> a_[2n] and +ve definite hermitian S_[n#n] ≤> K_[2n#2n]. In other words, the real and imaginary components of x are jointly gaussian with a symmetric covariance matrix that lies in the range of the complex-to-real isomorphism.

<x> = m_[n] <-> a_[2n]
Cov(x) = <(x-m)(x-m)^H> = <xx^H> - mm^H = S_[n#n] ≤> K_[2n#2n] [5.3]
N(y ; a, ½K) = |π×K|^-½exp( -(y-a)^T K^-1 (y-a) ) = |π×S|^-1exp( -(x-m)^H S^-1 (x-m) )
- If S is diagonal (and hence also real) then N(x ; m, S) = N(y ; a, ½K) = N(|x-m| ; 0, ½S) × |π×S|^-½. Thus we can express a complex pdf as a truncated real pdf of the same dimension if the components of x are independent.
K_[2n#2n] may be divided into 2#2 toeplitz blocks of the form [a -b; b a] (see Givens Rotation) where the corresponding element of S is a+jb
- All the 2#2 blocks of K that lie on the main diagonal are positive multiples of I. That is, for each component of x, the real and imaginary parts have the same variance and are uncorrelated.
- tr(K) = 2 tr(S)
Distribution of Real and Imaginary Parts:
- <x^R> = m^R
- <x^I> = m^I
- Cov(x^R) = Cov(x^I) = <(x^R-m^R)(x^R-m^R)^T> =<(x^I-m^I)(x^I-m^I)^T) = ½S^R
- <(x^I-m^I)(x^R-m^R)^T> = -<(x^R-m^R)(x^I-m^I)^T> = ½S^I

In the two following sections, we define d=s^•^½=diag(S)^½ to be a positive real-valued vector of standard deviations and write |m|=|m| and |S|=|S| for elementwise absolute value functions. The function ₁F₁(a,b;z)=M(a,b,z) is the Confluent Hypergeometric or Kummer function, hypergeom(a,b,z) in MATLAB. The function ₂F₁(a,b;z) is the Confluent Hypergeometric function, hypergeom(a,b,z) in MATLAB.

Magnitude Vector: r = abs(x) = |x|
- <r> = ½ π^½ d• ₁F₁(-0.5 , 1 ; -(|m|÷d)^•2) [R.16 (4.1)]
  - [m=0] <r> = ½ π^½ d
- diag(COV(r))= s + |m|^•2 - ¼ π s • ₁F₁(-0.5 , 1 ; -(|m|÷d)^•2)²
- [m=0] COV(r) = ¼ π (dd^T • ₂F₁([-0.5; -0.5] , 1 ; (|S|÷dd^T )^•2) - dd^T) [R.16 (4.19)]
  - [m=0] diag(COV(r)) = (1-¼ π)s
Magnitude-Squared Vector: v = r^•2 = |x|^•2
- <v> = s + |m|^•2
  - COV(v) = ABS(S+mm^H)^•2 - |mm^H|^•2 [R.16 (2.13)]
- <v^•2> = 2s² + 4s • |m|^•2 + |m|^•4
  - COV(v^•2) = 4 |S|^•4 + 4 ((S^T • (mm^H))^•2)^R + (S^T • (mm^H))^R •(16 |S|^•2 + 8 (2s + |m|^•2) (2s+ |m|^•2)^T) + 16 |S|^•2 • ((s + |m|^•2)(s + |m|^•2)^T)
- <v^•n> = sum_r=0:n n!²/(r! (n-r)!²) s^•r • |m|^•2(n-r)
- <vv^T> = |S+mm^H|^•2 + (s + |m|^•2)(s + |m|^•2)^T - |mm^H|^•2
- <v^•2v^T> = 4 |S|^•2 • ((s + |m|^•2)1^T) + 4(S^T • (mm^H))^R •((2s + |m|^•2)1^T) + (2s^•2 + 4s •|m|^•2 + |m|^•4)(s + |m|^•2)^T
- <(vv^T)^•2> = 4 |S|^•4 + 4 ((S^T • (mm^H))^•2)^R + (S^T • (mm^H))^R •(16 |S|^•2 + 8 (2s + |m|^•2) (2s+ |m|^•2)^T) + (16 |S|^•2 + 2 (s + |m|^•2)(s + |m|^•2)^T ) • ((s + |m|^•2)(s + |m|^•2)^T) + 2(ss^T)•((s + 2 |m|^•2)(s + 2 |m|^•2)^T) - |mm^H|^•4
- <v^•n(v^T)^•m> = SUM_{a=0:m+n, b=max(a-m,0):min(a,n), c=max(a-m,0):min(a,n), d=max(b,c):min(a,b+c)} m!²n!²/((n-b)!(m-a+b)!(n-c)!(m-a+c)!(d-c)!(d-b)!(a-d)!(b+c-d)!) S^•(d-c) • (S^T)^•(d-b) • (s^•(b+c-d) •m^•(n-b) •(m^C)^•(n-c)) (s^•(a-d) •m^•(n-a+b) •(m^C)^•(n-a+c))^T
- [s<<m^•2] <v^•n(v^T)^•m> = {|m|^•2n|m^H|^•2m} + {(|m|^•2n-2|m^H|^•2m-2)• (2mn(S^T • (mm^H))^R+m² |m|² s^T + n² s |m^H|²)} + {(|m|^•2n-4|m^H|^•2m-4)• (m²n²(|S|^•2+ss^T) • |mm^H|^•2 + mn(m-1)(n-1)((S^T • (mm^H))^•2)^R+ 2mn (S^T • (mm^H))^R •(m(m-1) |m|² s^T + n(n-1) s |m^H|²)+½m²(m-1)² |m|⁴ (s^T)^•2 + ½n²(n-1)² s^•2 |m^H|^•4} + …
- [m=0] <v> = s
- [m=0] COV(v) = |S|^•2
- [m=0] <v^•n> = n! s^•n
- [m=0] <vv^T> = |S|^•2 + ss^T
- [m=0] <v^•2v^T> = 4 |S|^•2 • (s1^T) + 2 s^•2s^T
- [m=0] <(vv^T)^•2> = 4 |S|^•4 + 16 |S|^•2 • (ss^T) + 4 (ss^T)^•2
- [m=0] <v^•n(v^T)^•m> = SUM_r=0:min(m,n) m!²n!²/(r!²(m-r)!(n-r)!) |S|^•2r • (s^•(n-r)(s^•(m-r))^T)

Linear Expectations

<Ax + b> = Am + b
- <Ax> = Am
- <x + b> = m + b
Cov(Ax + b) = ASA^T
<tr(Y)> = tr(<Y>) where Y depends on x.

For [x: Real Gaussian] :

<exp(jx)> = exp(jm - ½s)
- [m=0] <exp(jx)> = exp( - ½s)

Quadratic Expectations

<(Ax + a)(Bx + b)^H> = ASB^H + (Am+a)(Bm+b)^H [5.23]
- <xx^H> = S + mm^H
- <xa^H x> = (S + mm^H)a
- <x^H ax^H> = a^H(S + mm^H)
- <(Ax)(Ax)^H> = A(S + mm^H)A^H
- <(x + a)(x + a)^H> = S + (m+a)(m+a)^H
<(Ax+a)^H (Bx+b)> = <tr((Bx+b)(Ax+a)^H )> = tr(BSA^H) + (Am+a)^H (Bm+b) [5.24]
- <x^H x> = <tr(xx^H)> = tr(S) + m^H m
- <x^HAx> = tr(AS) + m^HAm
- <(Ax)^H (Ax)> = <tr(Axx^HA^H)> tr(ASA^H) + (Am)^H (Am)
- <(x+a)^H (x+a)> = tr(S) + (m+a)^H (m+a)
<(Ax + a)^C ⊗ (Bx + b)> = <(Bx + b)(Ax + a)^H): = (A^C ⊗ B) S: + (Am + a)^C ⊗ (Bm + b)
- <x^C ⊗ x> = <xx^H) = S:

For [x_[n]: Real Gaussian] :

<x • x> = diag(S) + m • m = diag(S + m • m^T) [5.6]
Cov(x • x) = 2 S • (S + 2mm^T) [5.7]
<(Ax + a) ⊗ (Bx + b)> = (A ⊗ B) S: + (Am + a) ⊗ (Bm + b)
- <x ⊗ x> = S:
<exp(jx)exp(jx)^H> = (exp(jm - ½s) exp(jm - ½s)^H) • exp(S)
- [m=0] <exp(jx)exp(jx)^H> = (exp(-½s) exp(-½s)^H) • exp(S) = exp(S - ½(1s^T + s1^T))
<exp(jx)^HAexp(jx)> =exp(jm - ½s)^H (A• exp(S))exp(jm - ½s)
- <exp(jx)^Hexp(jx)> = n

For [x: Complex Gaussian] :

<xx^T> = mm^T [5.3]
<x • x^C> = diag(S) + m • m^C [5.4]
Cov(x • x^C) = <(x • x^C)(x^T • x^H)> - <x • x^C><x • x^C>^T = S • S^C + 2(mm^H • S^T)^R [5.5]

Minimizing Quadratic Expectations

argmin_K<||(AKB+C)x||²> = -(A^HA)^-1A^HC(S+mm^H)B^H(B(S+mm^H)B^H)^-1 [5.25]
- [m=0] argmin_K<||(AKB+C)x||²> = -(A^HA)^-1A^HCSB^H(BSB^H)^-1
  - [m=0] argmin_K<||(KB+C)x|^|2> = -CSB^H(BSB^H)^-1
[x, y independent, zero-mean] argmin_K<||(AKB+C)x + (AKE+F)y||²> = -(A^HA)^-1A^H(CS_xB^H+FS_yE^H)(BS_xB^H+ES_yE^H)^-1 [5.26]
- [x, y independent, zero-mean] argmin_K<||(KB+C)x + (KE+F)y||²> = -(CS_xB^H+FS_yE^H)(BS_xB^H+ES_yE^H)^-1
  - [x, y independent, zero-mean] argmin_K<||(KB+C)x + Ky||²> = -CS_xB^H(BS_xB^H+S_yE)^-1

Cubic Expectations

For [x:Real Independent] :

<(Ax + a)(Bx + b)^T (Cx + c)> = A DIAG(B^T C) m₃ + tr(BSC^T)×(Am+a) + ASC^T (Bm+b) + (ASB^T +(Am+a)(Bm+b)^T)(Cm+c)
- <xx^T x> = m₃ + 2Sm + (tr(S)+ m^T m)×m
- <(Ax + a)(Ax + a)^T(Ax + a)> = A DIAG(A^T A) m₃ + (2ASA^T + (Am+a)(Am+a)^T)(Am+a) + tr(ASA^T)×(Am+a)
<(Ax + a)b^T(Cx + c)(Dx + d)^T > = (Am+a)b^T(CSD^T+(Cm+c) (Dm+d)^T) + (ASC^T+(Am+a)(Cm+c)^T) b (Dm+d)^T + b^T(Cm+c)* (ASD^T - (Am+a)(Dm+d)^T)
- <xb^Txx^T> = mb^T(S+mm^T) + (S+mm^T) bm^T + b^Tm* (S - mm^T)

For [x: Real Gaussian] :

<(Ax + a)(Bx + b)^T(Cx + c)> = ASB^T(Cm+c) + ASC^T(Bm+b) + tr(BSC^T)×(Am+a) + (Am+a)(Bm+b)^T(Cm+c)
- <xx^Tx> = <x^Txx^T>^T = 2Sm + (tr(S)+ m^Tm)×m
- <(Ax + a)(Ax + a)^T(Ax + a)> = (2ASA^T + (Am+a)(Am+a)^T)(Am+a) + tr(ASA^T)×(Am+a)

Quartic Expectations

For [x: Real Gaussian] :

<(Ax + a)(Bx + b)^T(Cx + c) (Dx + d)^T> = (ASB^T+(Am+a)(Bm+b)^T)(CSD^T+(Cm+c) (Dm+d)^T) + (ASC^T+(Am+a)(Cm+c)^T)(BSD^T+(Bm+b) (Dm+d)^T) + (Bm+b)^T(Cm+c)×(ASD^T - (Am+a)(Dm+d)^T) + tr(BSC^T)×(ASD^T + (Am+a)(Dm+d)^T) [5.27]
- <(Ax + a)(Bx + b)^T(Cx + c) (Dx + d)^T> = [m=0] (ASB^T+ab^T)(CSD^T+cd^T) + (ASC^T+ac^T)(BSD^T+bd^T) + b^Tc×(ASD^T - ad^T) + tr(BSC^T)×(ASD^T + ad^T)
- - <xx^TAxx^T> = <(x^TAx) × xx^T> =(S+mm^T)(A+A^T)(S+mm^T) + m^TAm * (S - mm^T) + tr(AS)×(S + mm^T)
    - <xx^Txx^T> = 2(S+mm^T)² + (tr(S)+m^Tm)×(S - mm^T)
    - [m=0] <xx^TAxx^T> = S(A+A^T)S + tr(AS)×S
    - [m=0] <xx^Txx^T> = 2S² + tr(S)×S
  - <(Ax + a)(Ax + a)^T(Ax + a) (Ax + a)^T> = 2(ASA^T+(Am+a)(Am+a)^T)² + (Am+a)^T(Am+a)×(ASA^T - (Am+a)(Am+a)^T) + tr(ASA^T)×(ASA^T + (Am+a)(Am+a)^T)
<(Ax + a)^T(Bx + b) (Cx + c)^T(Dx + d)> = tr(AS(C^TD+D^TC)SB^T) + ((Am+a)^TB + (Bm+b)^TA)S(C^T(Dm+d) + D^T(Cm+c)) + (tr(ASB^T)+(Am+a)^T(Bm+b))(tr(CSD^T)+(Cm+c)^T(Dm+d)) [5.28]
- [m=0] <(Ax + a)^T(Bx + b) (Cx + c)^T(Dx + d)> = tr(AS(C^TD+D^TC)SB^T) + (a^TB + b^TA)S(C^Td + D^Tc) + (tr(ASB^T)+a^Tb)(tr(CSD^T)+c^Td)
- <x^Txx^Tx≥ 2tr(S²) + 4m^TSm + (tr(S) + m^Tm)²
  - [m=0] <x^Txx^Tx> = 2tr(S²) + (tr(S))²
- <x^TAxx^TBx> = tr(AS(B+B^T)S) + m^T(A + A^T)S(B + B^T)m + (tr(AS)+m^TAm)(tr(BS)+m^TBm)
  - [m=0] <x^TAxx^TBx> = tr(AS(B+B^T)S) + tr(AS)×tr(BS)
<a^Txb^Txc^Txd^Tx> = (a^T(S+mm^T)b)(c^T(S+mm^T)d)+(a^T(S+mm^T)c)(b^T(S+mm^T)d)+(a^T(S+mm^T)d)(b^T(S+mm^T)c)-2a^Tmb^Tmc^Tmd^Tm
<(Ax + a)^T(Ax + a) (Ax + a)^T(Ax + a)> = 2tr(ASA^TASA^T) + 4(Am+a)^TASA^T(Am+a) + (tr(ASA^T) + (Am+a)^T(Am+a))²
<exp(jx)^Hexp(jx)exp(jx)^Hexp(jx)≥ n²

For [x: Complex Gaussian] :

<(Ax + a)(Bx + b)^H(Cx + c) (Dx + d)^H> = (ASB^H+(Am + a)(Bm + b)^H)(CSD^H+(Cm + c)(Dm + d)^H) + (Bm + b)^H(Cm + c)ASD^H + tr(CSB^H)×(ASD^H + (Am + a)(Dm + d)^H)
- [m=0] <(Ax + a)(Bx + b)^H(Cx + c) (Dx + d)^H> = (ASB^H+ab^H)(CSD^H+cd^H) + b^HcASD^H + tr(CSB^H)×(ASD^H + ad^H)
- [m=0] <xx^HAxx^H> = SAS +tr(AS)×S
- [m=0] <xx^Hxx^H> = S² +tr(S)×S
<(Ax + a)^H(Bx + b) (Cx + c)^H(Dx + d)> = <tr((Bx + b)(Ax + a)^H)tr((Dx + d)(Cx + c)^H)≥ tr(A^HBSC^HDS) + (Cm + c)^HDSA^H(Bm + b) +(Am + a)^HBSC^H(Dm + d) + (tr(BSA^H)+(Am + a)^H(Bm + b))(tr(DSC^H)+(Cm + c)^H(Dm + d))
- [m=0] <(Ax + a)^H(Bx + b) (Cx + c)^H(Dx + d)> = tr(A^HBSC^HDS) + c^HDSA^Hb +a^HBSC^Hd + (tr(BSA^H)+a^Hb)(tr(DSC^H)+c^Hd)
- <tr²((Bx + b)(Ax + a)^H)≥ tr((BSA^H)²) + 2(Am + a)^HBSA^H(Bm + b) + (tr(BSA^H)+(Am + a)^H(Bm + b))²
- <x^Hxx^Hx> = <tr²(xx^H))> = tr(S²) + 2m^HSm + (tr(S)+ m^Hm)²
- [m=0] <x^Hxx^Hx> = <tr²(xx^H))> = tr(S²) +(tr(S))²

Quintic Expressions

For [x: Real Gaussian] :

<(Ax + a)(Bx + b)^T(Cx + c) (Dx + d)^T(Ex + e)> = (ASB^T+(Am+a)(Bm+b)^T)CSE^T(Dm+d) + (ASB^T+(Am+a)(Bm+b)^T)CSD^T(Em+e) + (ASB^T+(Am+a)(Bm+b)^T)(Cm+c)(Dm+d)^T(Em+e)+ ASC^TBSE^T(Dm+d) + ASC^T(BSD^T+(Bm+b)(Dm+d)^T)(Em+e)+ ASD^T(ESB^T+(Em+e)(Bm+b)^T)(Cm+c) + ASD^TESC^T(Bm+b) + (ASE^T+(Am+a)(Em+e)^T)DSB^T(Cm+c)+ ASE^TDSC^T(Bm+b) + ASE^T(Dm+d)(Cm+c)^T(Bm+b)+ tr(B^TCSE^TDS) (Am+a) + tr(B^T(CSD^T+(Cm+c)(Dm+d)^T)ES) (Am+a)+ tr(B^TCS) (ASD^T+(Am+a)(Dm+d)^T)(Em+e) + tr(B^TCS) ASE^T(Dm+d)+ tr(D^TES) (ASB^T+(Am+a)(Bm+b)^T)(Cm+c) + tr(D^TES) ASC^T(Bm+b) + tr(B^TCS) tr(D^TES) (Am+a)
- <(Ax )(Bx )^T(Cx ) (Dx )^T(Ex)> = A(S+mm^T)B^TCSE^TDm + A(S+mm^T)B^TCSD^TEm + A(S+mm^T)B^TCmm^TD^TEm+ ASC^TBSE^TDm + ASC^TB(S+mm^T)D^TEm+ ASD^TE(S+mm^T)B^TCm + ASD^TESC^TBm + A(S+mm^T)E^TDSB^TCm+ ASE^TDSC^TBm+ ASE^TDmm^TC^TBm+ tr(B^TCSE^TDS)Am + tr(B^T(C(S+mm^T)D^T)ES)Am+ tr(B^TCS) A(S+mm^T)D^TEm+ tr(B^TCS) ASE^TDm+ tr(D^TES) A(S+mm^T)B^TCm + tr(D^TES) ASC^TBm + tr(B^TCS) tr(D^TES)Am
- <xx^Txx^Tx> = (8S² + 4Smm^T + 3mm^TS + (mm^T)²+ 2tr(S) (2S+mm^T) ) m + tr(2S²+mm^TS)+ tr(S)²) m

Recursion Formulae for High Powers

For [x: Real Gaussian] :

Define Y_n=<(xx^T)ⁿ> then Zn=<(x^Tx)ⁿ> = tr(Y_n)
- Y₁ = S+mm^T
- [m=0] Y_n+1= tr(Y_n)S+2nSY_n [5.19]
  - [m=0] Y₁=S
    - [m=0] Z₁=tr(S)
  - [m=0] Y₂=2S²+tr(S)×S
    - [m=0] Z₂=2tr(S²)+tr²(S)
  - [m=0] Y₃=8S³+4tr(S)×S²+(2tr(S²)+tr²(S))×S
    - [m=0] Z₃=8tr(S³)+6tr(S²)tr(S)+tr³(S)
  - [m=0] Y₄=48S⁴+24tr(S)×S³+(12tr(S²)+6tr²(S))×S²+(8tr(S³)+6tr(S²)tr(S)+tr³(S))×S
    - [m=0] Z₄=48tr(S⁴)+32tr(S³)tr(S)+12tr²(S²)+12tr(S²)tr²(S)+tr⁴(S)

For [x: Complex Gaussian] :

Define Y_n=<(xx^H)ⁿ> then Zn=<(x^Hx)ⁿ> = tr(Y_n)
- Y₁ = S+mm^H
- [m=0] Y_n+1= tr(Y_n)S+nSY_n [5.20]
- [m=0] Y₁=S
  - [m=0] Z₁=tr(S)
- [m=0] Y₂=S²+tr(S)×S
  - [m=0] Z₂=tr(S²)+tr²(S)
- [m=0] Y₃=4S³+4tr(S)×S²+(tr(S²)+tr²(S))×S
  - [m=0] Z₃=4tr(S³)+5tr(S²)tr(S)+tr³(S)
- [m=0] Y₄=12S⁴+12tr(S)×S³+(3tr(S²)+3tr²(S))×S²+(4tr(S³)+5tr(S²)tr(S)+tr³(S))×S
  - [m=0] Z₄=12tr(S⁴)+16tr(S³)tr(S)+3tr²(S²)+8tr(S²)tr²(S)+tr⁴(S)

Product of Vector Elements

For [x: Real Gaussian] :

[n: odd, m=0] <prod(x_[n])> = 0. [5.18]
[n: even, m=0] <prod(x_[n])> = (½n)!^-12^-½n sum_v(s_v(1),v(2)s_v(3),v(4)...s_v(n-1),v(n)) where the sum is over all n! permutations, v, of the numbers 1:n. [5.18]
Note that each term in the summation arises (½n)!2^½n times since the ½n factors s_ij can be rearranged in (½n)! orders and for each factor s_ij = s_ji since S is symmetric. Thus an equivalent formula is to omit the normalizing factor, (½n)!^-12^-½n, and restrict the summation to all distinct pairings of the numbers 1:n. This is known as Isserlis' theorem or Wick's theorem.

For [x: Complex Gaussian] :

[m=0, A,B: [m#n]] <prod([Ax; (Bx)^C])> = sum_v(c_1,v(1)c_2,v(2)...c_m,v(m)) = pet(C) where C=ASB^H, pet(C) is the permanent of a square matrix C and the sum is over all m! permutations, v, of the numbers 1:m. The product equals zero unless A and B have the same number of rows, m., i.e. unless the product containes an equal number of unconjugated and conjugated terms.
[m=0] <(a^Hxx^Hb)^r≥r! (a^HSb)^r

This page is part of The Matrix Reference Manual. Copyright © 1998-2022 Mike Brookes, Imperial College, London, UK. See the file gfl.html for copying instructions. Please send any comments or suggestions to "mike.brookes" at "imperial.ac.uk".
Updated: $Id: expect.html 11291 2021-01-05 18:26:10Z dmb $