V_GAUSSMIXK approximate Kullback-Leibler divergence between two GMMs + derivatives Inputs: with kf & kg mixtures, p data dimensions mf(kf,p) mixture means for GMM f vf(kf,p) or vf(p,p,kf) variances (diagonal or full) for GMM f [default: identity] wf(kf,1) weights for GMM f - must sum to 1 [default: uniform] mg(kg,p) mixture means for GMM g [g=f if mg,vg,wg omitted] vg(kg,p) or vg(p,p,kg) variances (diagonal or full) for GMM g [default: identity] wg(kg,1) weights for GMM g - must sum to 1 [default: uniform] Outputs: d the approximate KL divergence D(f||g)=E_f(log(f(x)/g(x))) klfg(kf,kg) the exact KL divergence between the unweighted components of f and g The Kullback-Leibler (KL) divergence, D(f||g), between two distributions, f(x) and g(x) is also known as the "relative entropy" of f relative to g. It is defined as <log(f(x)/g(x))> where <...> denotes the expectation with respect to f(x), i.e. <y(x)> = Integral(f(x) y(x) dx). The KL divergence is always >=0 and equals 0 if and only if f(x)=g(x) almost everywhere. It is not a distance because it is not symmetric between f and g and also does not satisfy the triangle inequality. It is normally appropriate for f(x) to be the "true" distribution and g(x) to be an approximation to it. See [1]. This routine calculates the "variational approximation" to the KL divergence from [2] that is exact when f and g are single component gaussians and that is zero if f=g. However, the approximation may incorrectly be negative if f and g are different. Refs: [1] S. Kullback and R. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22 (1): 79-86, 1951. doi: 10.1214/aoms/1177729694. [2] J. R. Hershey and P. A. Olsen. Approximating the kullback leibler divergence between gaussian mixture models. In Proc. IEEE Intl Conf. Acoustics, Speech and Signal Processing, volume 4, pages IV:317-320, Apr. 2007. doi: 10.1109/ICASSP.2007.366913.
0001 function [d,klfg]=v_gaussmixk(mf,vf,wf,mg,vg,wg) 0002 %V_GAUSSMIXK approximate Kullback-Leibler divergence between two GMMs + derivatives 0003 % 0004 % Inputs: with kf & kg mixtures, p data dimensions 0005 % 0006 % mf(kf,p) mixture means for GMM f 0007 % vf(kf,p) or vf(p,p,kf) variances (diagonal or full) for GMM f [default: identity] 0008 % wf(kf,1) weights for GMM f - must sum to 1 [default: uniform] 0009 % mg(kg,p) mixture means for GMM g [g=f if mg,vg,wg omitted] 0010 % vg(kg,p) or vg(p,p,kg) variances (diagonal or full) for GMM g [default: identity] 0011 % wg(kg,1) weights for GMM g - must sum to 1 [default: uniform] 0012 % 0013 % Outputs: 0014 % d the approximate KL divergence D(f||g)=E_f(log(f(x)/g(x))) 0015 % klfg(kf,kg) the exact KL divergence between the unweighted components of f and g 0016 % 0017 % The Kullback-Leibler (KL) divergence, D(f||g), between two distributions, 0018 % f(x) and g(x) is also known as the "relative entropy" of f relative to g. 0019 % It is defined as <log(f(x)/g(x))> where <...> denotes the 0020 % expectation with respect to f(x), i.e. <y(x)> = Integral(f(x) y(x) dx). 0021 % The KL divergence is always >=0 and equals 0 if and only if f(x)=g(x) 0022 % almost everywhere. It is not a distance because it is not symmetric 0023 % between f and g and also does not satisfy the triangle inequality. 0024 % It is normally appropriate for f(x) to be the "true" distribution and 0025 % g(x) to be an approximation to it. See [1]. 0026 % 0027 % This routine calculates the "variational approximation" to the KL divergence 0028 % from [2] that is exact when f and g are single component gaussians and that is zero 0029 % if f=g. However, the approximation may incorrectly be negative if f and g are different. 0030 % 0031 % Refs: 0032 % [1] S. Kullback and R. Leibler. On information and sufficiency. 0033 % Annals of Mathematical Statistics, 22 (1): 79-86, 1951. doi: 10.1214/aoms/1177729694. 0034 % [2] J. R. Hershey and P. A. Olsen. 0035 % Approximating the kullback leibler divergence between gaussian mixture models. 0036 % In Proc. IEEE Intl Conf. Acoustics, Speech and Signal Processing, volume 4, 0037 % pages IV:317-320, Apr. 2007. doi: 10.1109/ICASSP.2007.366913. 0038 0039 % Copyright (C) Mike Brookes 2012 0040 % Version: $Id: v_gaussmixk.m 10865 2018-09-21 17:22:45Z dmb $ 0041 % 0042 % VOICEBOX is a MATLAB toolbox for speech processing. 0043 % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html 0044 % 0045 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0046 % This program is free software; you can redistribute it and/or modify 0047 % it under the terms of the GNU General Public License as published by 0048 % the Free Software Foundation; either version 2 of the License, or 0049 % (at your option) any later version. 0050 % 0051 % This program is distributed in the hope that it will be useful, 0052 % but WITHOUT ANY WARRANTY; without even the implied warranty of 0053 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 0054 % GNU General Public License for more details. 0055 % 0056 % You can obtain a copy of the GNU General Public License from 0057 % http://www.gnu.org/copyleft/gpl.html or by writing to 0058 % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. 0059 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0060 [kf,p]=size(mf); 0061 if isempty(wf) 0062 wf=repmat(1/kf,kf,1); 0063 end 0064 if isempty(vf) 0065 vf=ones(kf,p); 0066 end 0067 fvf=ndims(vf)>2 || size(vf,1)>kf; % full covariance matrix vf is supplied 0068 0069 % First calculate vf covariance matrix determinants and precision matrices 0070 % and then the individual KL divergences between the components of f 0071 0072 klff=zeros(kf,kf); % space for intra-f KL negative divergence 0073 ixdf=1:kf+1:kf*kf; % indexes of diagonal elements in kf*kf matrix 0074 ixdp=(1:p+1:p*p)'; % indexes of diagonal elements in p*p matrix 0075 wkf=ones(kf,1); 0076 if fvf % vf is a full matrix 0077 dvf=zeros(kf,1); % space for log(det(vf)) 0078 for i=1:kf 0079 dvf(i)=log(det(vf(:,:,i))); 0080 end 0081 for j=1:kf % calculate KL divergence between all f mixtures and f mixture j 0082 pfj=inv(vf(:,:,j)); % precision matrix for f mixture j 0083 mffj=mf-mf(j(wkf),:); % difference in means 0084 pfjvf=reshape(pfj*reshape(vf,p,p*kf),p^2,kf); % pf(:,:,j)* all the vf matices 0085 klff(:,j)=0.5*(dvf(j)-p-dvf+sum(pfjvf(ixdp,:),1)'+sum((mffj*pfj).*mffj,2)); 0086 end 0087 else % vf is diagonal 0088 dvf=log(prod(vf,2)); % log determinant for all f mixtures 0089 pf=1./vf; % precision matrices for all f mixtures 0090 mf2p=mf.^2*pf'; 0091 mf2pd=mf2p(ixdf); % get diagonal elements 0092 klff=0.5*(dvf(:,wkf)'-dvf(:,wkf)+vf*pf'-p+mf2p+mf2pd(wkf,:)-2*mf*(mf.*pf)'); 0093 end 0094 klff(ixdf)=0; % force the diagonal elements to exact zero 0095 if nargin<4 % g mixtures omitted so make them the same as f 0096 d=0; % KL divergence from f to f is zero 0097 klfg=klff; 0098 else 0099 [kg,pg]=size(mg); 0100 if pg~=p 0101 error('GMMs must have the same data dimension (%d versus %d)',p,pg); 0102 end 0103 if nargin<6 || isempty(wg) 0104 wg=repmat(1/kg,kg,1); 0105 end 0106 if nargin<5 || isempty(vg) 0107 vg=ones(kg,p); 0108 end 0109 0110 % Calculate vg covariance matrix determinants and precision matrices 0111 % and then the individual inter-component KL divergences between components of f and g 0112 0113 klfg=zeros(kf,kg); % space for inter-f-g KL negative divergence 0114 wkg=ones(kg,1); 0115 if ndims(vg)>2 || size(vg,1)>kg % vg is a full matrix 0116 dvg=zeros(kg,1); % space for log(det(vg)) 0117 pg=zeros(p,p,kg); % space for g-mixture precision matrices 0118 if fvf % vf and vg are both full matrices 0119 for j=1:kg 0120 dvgj=log(det(vg(:,:,j))); 0121 dvg(j)=dvgj; 0122 pgj=inv(vg(:,:,j)); 0123 pg(:,:,j)=pgj; 0124 mfgj=mf-mg(j(wkf),:); 0125 pgjvf=reshape(pgj*reshape(vf,p,p*kf),p^2,kf); % pg(:,:,j) times all the vf matices 0126 klfg(:,j)=0.5*(dvgj-p-dvf+sum(pgjvf(ixdp,:),1)'+sum((mfgj*pgj).*mfgj,2)); 0127 end 0128 else % vf diagonal but vg is full 0129 for j=1:kg 0130 dvgj=log(det(vg(:,:,j))); 0131 dvg(j)=dvgj; 0132 pgj=inv(vg(:,:,j)); 0133 pg(:,:,j)=pgj; 0134 mfgj=mf-mg(j(wkf),:); 0135 klfg(:,j)=0.5*(dvgj-p-dvf+vf*pgj(ixdp)+sum((mfgj*pgj).*mfgj,2)); 0136 end 0137 end 0138 else % vg is diagonal 0139 dvg=log(prod(vg,2)); % log(det(vg)) 0140 pg=1./vg; % precision matrix pg = inv(vg) 0141 mg2p=sum(mg.^2.*pg,2)'; 0142 if fvf % vf a full matrix, vg diagonal 0143 vav=reshape(vf,p^2,kf); 0144 klfg=0.5*(dvg(:,wkf)'-dvf(:,wkg)+vav(ixdp,:)'*pg'-p+mf.^2*pg'+mg2p(wkf,:)-2*mf*(mg.*pg)'); 0145 else % vf and vg both diagonal 0146 klfg=0.5*(dvg(:,wkf)'-dvf(:,wkg)+vf*pg'-p+mf.^2*pg'+mg2p(wkf,:)-2*mf*(mg.*pg)'); 0147 end 0148 end 0149 d=wf'*(v_logsum(-klff,2,wf)-v_logsum(-klfg,2,wg)); 0150 end