V_GAUSSMIX fits a gaussian mixture pdf to a set of data observations [m,v,w,g,f]=(x,c,l,m0,v0,w0,wx) Usage: (1) [m,v,w]=v_gaussmix(x,[],[],k); % create GMM with k mixtures and diagonal covariances (2) [m,v,w]=gaussmix(x,[],[],k,'v'); % create GMM with k mixtures and full covariances Inputs: n data values, k mixtures, p parameters, l loops X(n,p) Input data vectors, one per row. C(1) Minimum variance of normalized data (Use [] to take default value of n^-2) L The integer portion of l gives a maximum loop count. The fractional portion gives an optional stopping threshold. Iteration will cease if the increase in log likelihood density per data point is less than this value. Thus l=10.001 will stop after 10 iterations or when the increase in log likelihood falls below 0.001. As a special case, if L=0, then the first three outputs are omitted. Use [] to take default value of 100.0001 M0 Number of mixtures required (or initial mixture means, M0(k,p), - see below) V0 Initialization mode: 'f' Initialize with K randomly selected data points [default] 'p' Initialize with centroids and variances of random partitions 'k' k-means algorithm ('kf' and 'kp' determine initialization) 'h' k-harmonic means algorithm ('hf' and 'hp' determine initialization) [default] 's' use unscaled data during initialization phase instead of scaling it first 'm' M0(k,p) contains the initial centres 'v' full covariance matrices Mode 'hf' [the default] generally gives the best results but 'f' is faster and often OK W0(n,1) Data point weights (need not sum to unity) Alternatively, initial values for M0, V0 and W0 can be given explicitly: M0(k,p) Initial mixture means, one row per mixture. V0(k,p) Initial mixture variances, one row per mixture. or V0(p,p,k) one full-covariance matrix per mixture W0(k,1) Initial mixture weights, one per mixture. The weights must sum to unity. WX(n,1) Data point weights (need not sum to unity) Outputs: (Note that M, V and W are omitted if L==0) M(k,p) Mixture means, one row per mixture. (omitted if L==0) V(k,p) Mixture variances, one row per mixture. (omitted if L==0) or V(p,p,k) if full covariance matrices in use (i.e. either 'v' option or V0(p,p,k) specified) W(k,1) Mixture weights, one per mixture. The weights will sum to unity. (omitted if L==0) G Average log probability of the input data points. F Fisher's Discriminant measures how well the data divides into classes. It is the ratio of the between-mixture variance to the average mixture variance: a high value means the classes (mixtures) are well separated. PP(n,1) Log probability of each data point GG(l+1,1) Average log probabilities at the beginning of each iteration and at the end The fitting procedure uses one of several initialization methods to create an initial guess for the mixture centres and then uses the EM (expectation-maximization) algorithm to refine the guess. Although the EM algorithm is deterministic, the initialization procedures use random numbers and so the routine will not give identical answers if you call it multiple times with the same input data. See v_randvec() for generating GMM data vectors.
0001 function [m,v,w,g,f,pp,gg]=v_gaussmix(x,c,l,m0,v0,w0,wx) 0002 %V_GAUSSMIX fits a gaussian mixture pdf to a set of data observations [m,v,w,g,f]=(x,c,l,m0,v0,w0,wx) 0003 % 0004 % Usage: 0005 % (1) [m,v,w]=v_gaussmix(x,[],[],k); % create GMM with k mixtures and diagonal covariances 0006 % (2) [m,v,w]=gaussmix(x,[],[],k,'v'); % create GMM with k mixtures and full covariances 0007 % 0008 % Inputs: n data values, k mixtures, p parameters, l loops 0009 % 0010 % X(n,p) Input data vectors, one per row. 0011 % C(1) Minimum variance of normalized data (Use [] to take default value of n^-2) 0012 % L The integer portion of l gives a maximum loop count. The fractional portion gives 0013 % an optional stopping threshold. Iteration will cease if the increase in 0014 % log likelihood density per data point is less than this value. Thus l=10.001 will 0015 % stop after 10 iterations or when the increase in log likelihood falls below 0016 % 0.001. 0017 % As a special case, if L=0, then the first three outputs are omitted. 0018 % Use [] to take default value of 100.0001 0019 % M0 Number of mixtures required (or initial mixture means, M0(k,p), - see below) 0020 % V0 Initialization mode: 0021 % 'f' Initialize with K randomly selected data points [default] 0022 % 'p' Initialize with centroids and variances of random partitions 0023 % 'k' k-means algorithm ('kf' and 'kp' determine initialization) 0024 % 'h' k-harmonic means algorithm ('hf' and 'hp' determine initialization) [default] 0025 % 's' use unscaled data during initialization phase instead of scaling it first 0026 % 'm' M0(k,p) contains the initial centres 0027 % 'v' full covariance matrices 0028 % Mode 'hf' [the default] generally gives the best results but 'f' is faster and often OK 0029 % W0(n,1) Data point weights (need not sum to unity) 0030 % 0031 % Alternatively, initial values for M0, V0 and W0 can be given explicitly: 0032 % 0033 % M0(k,p) Initial mixture means, one row per mixture. 0034 % V0(k,p) Initial mixture variances, one row per mixture. 0035 % or V0(p,p,k) one full-covariance matrix per mixture 0036 % W0(k,1) Initial mixture weights, one per mixture. The weights must sum to unity. 0037 % WX(n,1) Data point weights (need not sum to unity) 0038 % 0039 % Outputs: (Note that M, V and W are omitted if L==0) 0040 % 0041 % M(k,p) Mixture means, one row per mixture. (omitted if L==0) 0042 % V(k,p) Mixture variances, one row per mixture. (omitted if L==0) 0043 % or V(p,p,k) if full covariance matrices in use (i.e. either 'v' option or V0(p,p,k) specified) 0044 % W(k,1) Mixture weights, one per mixture. The weights will sum to unity. (omitted if L==0) 0045 % G Average log probability of the input data points. 0046 % F Fisher's Discriminant measures how well the data divides into classes. 0047 % It is the ratio of the between-mixture variance to the average mixture variance: a 0048 % high value means the classes (mixtures) are well separated. 0049 % PP(n,1) Log probability of each data point 0050 % GG(l+1,1) Average log probabilities at the beginning of each iteration and at the end 0051 % 0052 % The fitting procedure uses one of several initialization methods to create an initial guess 0053 % for the mixture centres and then uses the EM (expectation-maximization) algorithm to refine 0054 % the guess. Although the EM algorithm is deterministic, the initialization procedures use 0055 % random numbers and so the routine will not give identical answers if you call it multiple 0056 % times with the same input data. See v_randvec() for generating GMM data vectors. 0057 0058 % Bugs/Suggestions 0059 % (1) Allow processing in chunks by outputting/reinputting an array of sufficient statistics 0060 % (2) Other initialization options: 0061 % 'l' LBG algorithm 0062 % 'm' Move-means (dog-rabbit) algorithm 0063 % (3) Allow updating of weights-only, not means/variances 0064 % (4) Allow freezing of means and/or variances 0065 0066 % Copyright (C) Mike Brookes 2000-2009 0067 % Version: $Id: v_gaussmix.m 10865 2018-09-21 17:22:45Z dmb $ 0068 % 0069 % VOICEBOX is a MATLAB toolbox for speech processing. 0070 % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html 0071 % 0072 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0073 % This program is free software; you can redistribute it and/or modify 0074 % it under the terms of the GNU General Public License as published by 0075 % the Free Software Foundation; either version 2 of the License, or 0076 % (at your option) any later version. 0077 % 0078 % This program is distributed in the hope that it will be useful, 0079 % but WITHOUT ANY WARRANTY; without even the implied warranty of 0080 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 0081 % GNU General Public License for more details. 0082 % 0083 % You can obtain a copy of the GNU General Public License from 0084 % http://www.gnu.org/copyleft/gpl.html or by writing to 0085 % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. 0086 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0087 [n,p]=size(x); % n = number of training values, p = dimension of data vector 0088 wn=ones(n,1); 0089 memsize=v_voicebox('memsize'); % set memory size to use 0090 if isempty(c) 0091 c=1/n^2; 0092 else 0093 c=c(1); % just to prevent legacy code failing 0094 end 0095 fulliv=0; % initial variance is not full 0096 if isempty(l) 0097 l=100+1e-4; % max loop count + stopping threshold 0098 end 0099 if nargin<5 || isempty(v0) || ischar(v0) % no initial values specified for m0, v0, w0 0100 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0101 % No initialvalues given, so we must use k-means or equivalent 0102 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0103 if nargin<6 0104 if nargin<5 || isempty(v0) 0105 v0='hf'; % default initialization mode: hf 0106 end 0107 wx=wn; % no data point weights 0108 else 0109 wx=w0(:); % data point weights 0110 end 0111 wx=wx(:)/sum(wx); % normalize and force to be a column vector 0112 if any(v0=='m') 0113 k=size(m0,1); 0114 else 0115 k=m0; 0116 end 0117 fv=any(v0=='v'); % full covariance matrices requested 0118 mx0=wx'*x; % calculate mean of input data in each dimension 0119 vx0=wx'*x.^2-mx0.^2; % calculate variance of input data in each dimension 0120 sx0=sqrt(vx0)+(vx0==0); % scale factor: std of input data in each dimension (or 1 if std==0) 0121 if n<=k % each data point can have its own mixture 0122 xs=(x-mx0(wn,:))./sx0(wn,:); % scale the data 0123 m=xs(mod((1:k)-1,n)+1,:); % just include all points several times 0124 v=zeros(k,p); % will be set to floor later 0125 w=zeros(k,1); 0126 w(1:n)=1/n; 0127 if l>0 0128 l=0.1; % no point in iterating 0129 end 0130 else % more points than mixtures 0131 if any(v0=='s') 0132 xs=x; % do not scale data during initialization 0133 else 0134 xs=(x-mx0(wn,:))./sx0(wn,:); % else scale now 0135 if any(v0=='m') 0136 m=(m0-mx0(ones(k,1),:))./sx0(ones(k,1),:); % scale specified means as well 0137 end 0138 end 0139 w=repmat(1/k,k,1); % all mixtures equally likely 0140 if any(v0=='k') % k-means initialization 0141 if any(v0=='m') 0142 [m,e,j]=v_kmeans(xs,k,m); 0143 elseif any(v0=='p') 0144 [m,e,j]=v_kmeans(xs,k,'p'); 0145 else 0146 [m,e,j]=v_kmeans(xs,k,'f'); 0147 end 0148 elseif any(v0=='h') % k-harmonic means initialization 0149 if any(v0=='m') 0150 [m,e,j]=v_kmeanhar(xs,k,[],4,m); 0151 else 0152 if any(v0=='p') 0153 [m,e,j]=v_kmeanhar(xs,k,[],4,'p'); 0154 else 0155 [m,e,j]=v_kmeanhar(xs,k,[],4,'f'); 0156 end 0157 end 0158 elseif any(v0=='p') % Initialize using a random partition 0159 j=ceil(rand(n,1)*k); % allocate to random clusters 0160 j(v_rnsubset(k,n))=1:k; % but force at least one point per cluster 0161 for i=1:k 0162 m(i,:)=mean(xs(j==i,:),1); 0163 end 0164 else 0165 if any(v0=='m') 0166 m=m0; % use specified centres 0167 else 0168 m=xs(v_rnsubset(k,n),:); % Forgy initialization: sample k centres without replacement [default] 0169 end 0170 [e,j]=v_kmeans(xs,k,m,0); % find out the cluster allocation 0171 end 0172 if any(v0=='s') 0173 xs=(x-mx0(wn,:))./sx0(wn,:); % scale data now if not done previously 0174 end 0175 v=zeros(k,p); % diagonal covariances 0176 w=zeros(k,1); 0177 for i=1:k 0178 ni=sum(j==i); % number assigned to this centre 0179 w(i)=(ni+1)/(n+k); % weight of this mixture 0180 if ni 0181 v(i,:)=sum((xs(j==i,:)-repmat(m(i,:),ni,1)).^2,1)/ni; 0182 else 0183 v(i,:)=zeros(1,p); 0184 end 0185 end 0186 end 0187 else 0188 %%%%%%%%%%%%%%%%%%%%%%%% 0189 % use initial values given as input parameters 0190 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0191 if nargin<7 0192 wx=wn; % no data point weights 0193 end 0194 wx=wx(:)/sum(wx); % normalize weights and force a column vector 0195 mx0=wx'*x; % calculate mean of input data in each dimension 0196 vx0=wx'*x.^2-mx0.^2; % calculate variance of input data in each dimension 0197 sx0=sqrt(vx0); 0198 sx0(sx0==0)=1; % do not divide by zero when scaling 0199 [k,p]=size(m0); 0200 xs=(x-mx0(wn,:))./sx0(wn,:); % scale the data 0201 m=(m0-mx0(ones(k,1),:))./sx0(ones(k,1),:); % and the means 0202 v=v0; 0203 w=w0; 0204 fv=ndims(v)>2 || size(v,1)>k; % full covariance matrix is supplied 0205 if fv 0206 mk=eye(p)==0; % off-diagonal elements 0207 fulliv=any(v(repmat(mk,[1 1 k]))~=0); % check if any are non-zero 0208 if ~fulliv 0209 v=reshape(v(repmat(~mk,[1 1 k])),p,k)'./repmat(sx0.^2,k,1); % just pick out and scale the diagonal elements for now 0210 else 0211 v=v./repmat(sx0'*sx0,[1 1 k]); % scale the full covariance matrix 0212 end 0213 end 0214 end 0215 if length(wx)~=n 0216 error('%d datapoints but %d weights',n,length(wx)); 0217 end 0218 lsx=sum(log(sx0)); 0219 xsw=xs.*repmat(wx,1,p); % weighted data points 0220 if ~fulliv % initializing with diagonal covariance if v0 is either unspecified or diagonal 0221 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0222 % Diagonal Covariance matrices % 0223 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0224 v=max(v,c); % apply the lower bound 0225 xs2=xs.^2.*repmat(wx,1,p); % square and weight the data for variance calculations 0226 0227 % If data size is large then do calculations in chunks 0228 0229 nb=min(n,max(1,floor(memsize/(8*p*k)))); % chunk size for testing data points 0230 nl=ceil(n/nb); % number of chunks 0231 jx0=n-(nl-1)*nb; % size of first chunk 0232 0233 im=repmat(1:k,1,nb); im=im(:); 0234 th=(l-floor(l))*n; 0235 sd=(nargout > 3*(l~=0)); % = 1 if we are outputting log likelihood values 0236 lp=floor(l)+sd; % extra loop needed to calculate final G value 0237 0238 lpx=zeros(1,n); % log probability of each data point 0239 wk=ones(k,1); 0240 wp=ones(1,p); 0241 wnb=ones(1,nb); 0242 wnj=ones(1,jx0); 0243 0244 % EM loop 0245 0246 g=0; % dummy initial value for comparison 0247 gg=zeros(lp+1,1); 0248 ss=sd; % initialize stopping count (0 or 1) 0249 for j=1:lp 0250 g1=g; % save previous log likelihood (2*pi factor omitted) 0251 m1=m; % save previous means, variances and weights 0252 v1=v; 0253 w1=w; 0254 vi=-0.5*v.^(-1); % data-independent scale factor in exponent 0255 lvm=log(w)-0.5*sum(log(v),2); % log of external scale factor (excluding -0.5*p*log(2pi) term) 0256 0257 % first do partial chunk (of length jx0) 0258 0259 jx=jx0; 0260 ii=1:jx; % indices of data points in this chunk 0261 kk=repmat(ii,k,1); % kk(k,jx): data point index 0262 km=repmat(1:k,1,jx); % km(k,jx): mixture index 0263 py=reshape(sum((xs(kk(:),:)-m(km(:),:)).^2.*vi(km(:),:),2),k,jx)+lvm(:,wnj); % py(k,jx) pdf of each point with each mixture 0264 mx=max(py,[],1); % mx(1,jx) find normalizing factor for each data point to prevent underflow when using exp() 0265 px=exp(py-mx(wk,:)); % find normalized probability of each mixture for each datapoint 0266 ps=sum(px,1); % total normalized likelihood of each data point 0267 px=px./ps(wk,:); % relative mixture probabilities for each data point (columns sum to 1) 0268 lpx(ii)=log(ps)+mx; 0269 pk=px*wx(ii); % pk(k,1) effective fraction of data points for each mixture (could be zero due to underflow) 0270 sx=px*xsw(ii,:); 0271 sx2=px*xs2(ii,:); 0272 for il=2:nl % process the data points in chunks 0273 ix=jx+1; 0274 jx=jx+nb; % increment upper limit 0275 ii=ix:jx; % indices of data points in this chunk 0276 kk=repmat(ii,k,1); 0277 py=reshape(sum((xs(kk(:),:)-m(im,:)).^2.*vi(im,:),2),k,nb)+lvm(:,wnb); 0278 mx=max(py,[],1); % find normalizing factor for each data point to prevent underflow when using exp() 0279 px=exp(py-mx(wk,:)); % find normalized probability of each mixture for each datapoint 0280 ps=sum(px,1); % total normalized likelihood of each data point 0281 px=px./ps(wk,:); % relative mixture probabilities for each data point (columns sum to 1) 0282 lpx(ii)=log(ps)+mx; 0283 pk=pk+px*wx(ii); % pk(k,1) effective fraction of data points for each mixture (could be zero due to underflow) 0284 sx=sx+px*xsw(ii,:); 0285 sx2=sx2+px*xs2(ii,:); 0286 end 0287 g=lpx*wx; % total log probability summed over all data points 0288 gg(j)=g; % save log prob at each iteration 0289 w=pk; % normalize to get the weights 0290 if pk % if all elements of pk are non-zero 0291 m=sx./pk(:,wp); % calculate mixture means 0292 v=sx2./pk(:,wp); % and raw second moments 0293 else 0294 wm=pk==0; % mask indicating mixtures with zero weights 0295 nz=sum(wm); % number of zero-weight mixtures 0296 [vv,mk]=sort(lpx); % find the lowest probability data points 0297 m=zeros(k,p); % initialize means and variances to zero (variances are floored later) 0298 v=m; 0299 m(wm,:)=xs(mk(1:nz),:); % set zero-weight mixture means to worst-fitted data points 0300 w(wm)=1/n; % set these weights non-zero 0301 w=w*n/(n+nz); % normalize so the weights sum to unity 0302 wm=~wm; % mask for non-zero weights 0303 m(wm,:)=sx(wm,:)./pk(wm,wp); % recalculate means and raw second moments for mixtures with a non-zero weight 0304 v(wm,:)=sx2(wm,:)./pk(wm,wp); 0305 end 0306 v=max(v-m.^2,c); % convert raw second moments to variances and apply floor 0307 if g-g1<=th && j>1 0308 if ~ss, break; end % stop 0309 ss=ss-1; % stop next time 0310 end 0311 end 0312 if sd && ~fv % we need to calculate the final probabilities 0313 pp=lpx'-0.5*p*log(2*pi)-lsx; % log of total probability of each data point 0314 gg=gg(1:j)-0.5*p*log(2*pi)-lsx; % average log prob at each iteration 0315 g=gg(end); 0316 m=m1; % back up to previous iteration 0317 v=v1; 0318 w=w1; 0319 mm=sum(m,1)/k; 0320 f=(m(:)'*m(:)-k*mm(:)'*mm(:))/sum(v(:)); 0321 end 0322 if ~fv 0323 m=m.*sx0(ones(k,1),:)+mx0(ones(k,1),:); % unscale means 0324 v=v.*repmat(sx0.^2,k,1); % and variances 0325 else 0326 v1=v; 0327 v=zeros(p,p,k); 0328 mk=eye(p)==1; % mask for diagonal elements 0329 v(repmat(mk,[1 1 k]))=v1'; % set from v1 0330 end 0331 end 0332 if fv % check if full covariance matrices were requested 0333 %%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0334 % Full Covariance matrices % 0335 %%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0336 pl=p*(p+1)/2; % number of non-trivial elements in lower triangular matrix 0337 lix=1:p^2; 0338 cix=repmat(1:p,p,1); 0339 rix=cix'; 0340 lix(cix>rix)=[]; % index of lower triangular elements 0341 cix=cix(lix); % index of lower triangular columns 0342 rix=rix(lix); % index of lower triangular rows 0343 % dix=find(rix==cix); % index of lower triangular diagonal elements [unused] 0344 lixi=zeros(p,p); 0345 lixi(lix)=1:pl; 0346 lixi=lixi'; 0347 lixi(lix)=1:pl; % reverse index to build full matrices 0348 v=reshape(v,p^2,k); 0349 v=v(lix,:)'; % lower triangular in rows (k,p*(p+1)/2) 0350 0351 % If data size is large then do calculations in chunks 0352 0353 nb=min(n,max(1,floor(memsize/(24*p*k)))); % chunk size for testing data points 0354 nl=ceil(n/nb); % number of chunks 0355 jx0=n-(nl-1)*nb; % size of first chunk 0356 % 0357 th=(l-floor(l))*n; 0358 sd=(nargout > 3*(l~=0)); % = 1 if we are outputting log likelihood values 0359 lp=floor(l)+sd; % extra loop needed to calculate final G value 0360 % 0361 lpx=zeros(1,n); % log probability of each data point 0362 wk=ones(k,1); 0363 wp=ones(1,p); 0364 wpl=ones(1,pl); % 1 index for lower triangular matrix 0365 wnb=ones(1,nb); 0366 wnj=ones(1,jx0); 0367 0368 % EM loop 0369 0370 g=0; % dummy initial value for comparison 0371 gg=zeros(lp+1,1); 0372 ss=sd; % initialize stopping count (0 or 1) 0373 vi=zeros(p*k,p); % stack of k inverse cov matrices each size p*p 0374 vim=zeros(p*k,1); % stack of k vectors of the form inv(v)*m 0375 mtk=vim; % stack of k vectors of the form m 0376 lvm=zeros(k,1); 0377 wpk=repmat((1:p)',k,1); 0378 for j=1:lp 0379 g1=g; % save previous log likelihood (2*pi factor omitted) 0380 m1=m; % save previous means, variances and weights 0381 v1=v; 0382 w1=w; 0383 for ik=1:k 0384 % these lines added for debugging only 0385 % vk=reshape(v(k,lixi),p,p); 0386 % condk(ik)=cond(vk); 0387 %%%%%%%%%%%%%%%%%%%% 0388 [uvk,dvk]=eig(reshape(v(ik,lixi),p,p)); % convert lower triangular to full and find eigenvalues 0389 dvk=max(diag(dvk),c); % apply variance floor to eigenvalues 0390 vik=-0.5*uvk*diag(dvk.^(-1))*uvk'; % calculate inverse 0391 vi((ik-1)*p+(1:p),:)=vik; % vi contains all mixture inverses stacked on top of each other 0392 vim((ik-1)*p+(1:p))=vik*m(ik,:)'; % vim contains vi*m for all mixtures stacked on top of each other 0393 mtk((ik-1)*p+(1:p))=m(ik,:)'; % mtk contains all mixture means stacked on top of each other 0394 lvm(ik)=log(w(ik))-0.5*sum(log(dvk)); % vm contains the weighted sqrt of det(vi) for each mixture 0395 end 0396 % 0397 % % first do partial chunk 0398 % 0399 jx=jx0; 0400 ii=1:jx; 0401 xii=xs(ii,:).'; 0402 py=reshape(sum(reshape((vi*xii-vim(:,wnj)).*(xii(wpk,:)-mtk(:,wnj)),p,jx*k),1),k,jx)+lvm(:,wnj); 0403 mx=max(py,[],1); % find normalizing factor for each data point to prevent underflow when using exp() 0404 px=exp(py-mx(wk,:)); % find normalized probability of each mixture for each datapoint 0405 ps=sum(px,1); % total normalized likelihood of each data point 0406 px=px./ps(wk,:); % relative mixture probabilities for each data point (columns sum to 1) 0407 lpx(ii)=log(ps)+mx; 0408 pk=px*wx(ii); % effective fraction of data points for each mixture (could be zero due to underflow) 0409 sx=px*xsw(ii,:); 0410 sx2=px*(xsw(ii,rix).*xs(ii,cix)); % accumulator for variance calculation (lower tri cov matrix as a row) 0411 for il=2:nl 0412 ix=jx+1; 0413 jx=jx+nb; % increment upper limit 0414 ii=ix:jx; 0415 xii=xs(ii,:).'; 0416 py=reshape(sum(reshape((vi*xii-vim(:,wnb)).*(xii(wpk,:)-mtk(:,wnb)),p,nb*k),1),k,nb)+lvm(:,wnb); 0417 mx=max(py,[],1); % find normalizing factor for each data point to prevent underflow when using exp() 0418 px=exp(py-mx(wk,:)); % find normalized probability of each mixture for each datapoint 0419 ps=sum(px,1); % total normalized likelihood of each data point 0420 px=px./ps(wk,:); % relative mixture probabilities for each data point (columns sum to 1) 0421 lpx(ii)=log(ps)+mx; 0422 pk=pk+px*wx(ii); % effective fraction of data points for each mixture (could be zero due to underflow) 0423 sx=sx+px*xsw(ii,:); % accumulator for mean calculation 0424 sx2=sx2+px*(xsw(ii,rix).*xs(ii,cix)); % accumulator for variance calculation 0425 end 0426 g=lpx*wx; % total log probability summed over all data points 0427 gg(j)=g; % save convergence history 0428 w=pk; % w(k,1) normalize to get the column of weights 0429 if pk % if all elements of pk are non-zero 0430 m=sx./pk(:,wp); % find mean and mean square 0431 v=sx2./pk(:,wpl); 0432 else 0433 wm=pk==0; % mask indicating mixtures with zero weights 0434 nz=sum(wm); % number of zero-weight mixtures 0435 [vv,mk]=sort(lpx); % find the lowest probability data points 0436 m=zeros(k,p); % initialize means and variances to zero (variances are floored later) 0437 v=zeros(k,pl); 0438 m(wm,:)=xs(mk(1:nz),:); % set zero-weight mixture means to worst-fitted data points 0439 w(wm)=1/n; % set these weights non-zero 0440 w=w*n/(n+nz); % normalize so the weights sum to unity 0441 wm=~wm; % mask for non-zero weights 0442 m(wm,:)=sx(wm,:)./pk(wm,wp); % recalculate means and variances for mixtures with a non-zero weight 0443 v(wm,:)=sx2(wm,:)./pk(wm,wpl); 0444 end 0445 v=v-m(:,cix).*m(:,rix); % subtract off mean squared 0446 if g-g1<=th && j>1 0447 if ~ss, break; end % stop 0448 ss=ss-1; % stop next time 0449 end 0450 end 0451 if sd % we need to calculate the final probabilities 0452 pp=lpx'-0.5*p*log(2*pi)-lsx; % log of total probability of each data point 0453 gg=gg(1:j)-0.5*p*log(2*pi)-lsx; % average log prob at each iteration 0454 g=gg(end); 0455 % gg' % *** DEBUG ONLY *** 0456 m=m1; % back up to previous iteration 0457 v=zeros(p,p,k); % reserve spave for k full covariance matrices 0458 trv=0; % sum of variance matrix traces 0459 for ik=1:k % loop for each mixture to apply variance floor 0460 [uvk,dvk]=eig(reshape(v1(ik,lixi),p,p)); % convert lower triangular to full and find eigenvectors 0461 dvk=max(diag(dvk),c); % apply variance floor to eigenvalues 0462 v(:,:,ik)=uvk*diag(dvk)*uvk'; % reconstitute full matrix 0463 trv=trv+sum(dvk); % add trace to the sum 0464 end 0465 w=w1; 0466 mm=sum(m,1)/k; 0467 f=(m(:)'*m(:)-k*mm(:)'*mm(:))/trv; 0468 else 0469 v1=v; % lower triangular form 0470 v=zeros(p,p,k); % reserve space for k full covariance matrices 0471 for ik=1:k % loop for each mixture to apply variance floor 0472 [uvk,dvk]=eig(reshape(v1(ik,lixi),p,p)); % convert lower triangular to full and find eigenvectors 0473 dvk=max(diag(dvk),c); % apply variance floor 0474 v(:,:,ik)=uvk*diag(dvk)*uvk'; % reconstitute full matrix 0475 end 0476 end 0477 m=m.*sx0(ones(k,1),:)+mx0(ones(k,1),:); % unscale means 0478 v=v.*repmat(sx0'*sx0,[1 1 k]); 0479 end 0480 if l==0 % suppress the first three output arguments if l==0 0481 m=g; 0482 v=f; 0483 w=pp; 0484 end 0485