Home > voicebox > gaussmix.m

gaussmix

PURPOSE ^

GAUSSMIX fits a gaussian mixture pdf to a set of data observations [m,v,w,g,f]=(x,c,l,m0,v0,w0,wx)

SYNOPSIS ^

function [m,v,w,g,f,pp,gg]=gaussmix(x,c,l,m0,v0,w0,wx)

DESCRIPTION ^

GAUSSMIX fits a gaussian mixture pdf to a set of data observations [m,v,w,g,f]=(x,c,l,m0,v0,w0,wx)

 Usage:
    (1) [m,v,w]=gaussmix(x,[],[],k);    % create GMM with k mixtures and diagonal covariances
    (2) [m,v,w]=gaussmix(x,[],[],k,'v);    % create GMM with k mixtures and full covariances

 Inputs: n data values, k mixtures, p parameters, l loops

     X(n,p)   Input data vectors, one per row.
     c(1)     Minimum variance of normalized data (Use [] to take default value of 1/n^2)
     L        The integer portion of l gives a maximum loop count. The fractional portion gives
              an optional stopping threshold. Iteration will cease if the increase in
              log likelihood density per data point is less than this value. Thus l=10.001 will
              stop after 10 iterations or when the increase in log likelihood falls below
              0.001.
              As a special case, if L=0, then the first three outputs are omitted.
              Use [] to take default value of 100.0001
     M0       Number of mixtures required (or initial mixture means - see below)
     V0       Initialization mode:
                'f'    Initialize with K randomly selected data points [default]
                'p'    Initialize with centroids and variances of random partitions
                'k'    k-means algorithm ('kf' and 'kp' determine initialization)
                'h'    k-harmonic means algorithm ('hf' and 'hp' determine initialization) [default]
                's'    do not scale data during initialization to have equal variances
                'm'    M0 contains the initial centres
                'v'    full covariance matrices
              Mode 'hf' [the default] generally gives the best results but 'f' is faster and often OK
     W0(n,1)  Data point weights

     Alternatively, initial values for M0, V0 and W0 can be given  explicitly:

     M0(k,p)  Initial mixture means, one row per mixture.
     V0(k,p)  Initial mixture variances, one row per mixture.
      or V0(p,p,k)  one full-covariance matrix per mixture
     W0(k,1)  Initial mixture weights, one per mixture. The weights should sum to unity.
     WX(n,1)  Data point weights

 Outputs: (Note that M, V and W are omitted if L==0)

     M(k,p)   Mixture means, one row per mixture. (omitted if L==0)
     V(k,p)   Mixture variances, one row per mixture. (omitted if L==0)
       or V(p,p,k) if full covariance matrices in use (i.e. either 'v' option or V0(p,p,k) specified)
     W(k,1)   Mixture weights, one per mixture. The weights will sum to unity. (omitted if L==0)
     G       Average log probability of the input data points.
     F        Fisher's Discriminant measures how well the data divides into classes.
              It is the ratio of the between-mixture variance to the average mixture variance: a
              high value means the classes (mixtures) are well separated.
     PP(n,1)  Log probability of each data point
     GG(l+1,1) Average log probabilities at the beginning of each iteration and at the end

 The fitting procedure uses one of several initialization methods to create an initial guess
 for the mixture centres and then uses the EM (expectation-maximization) algorithm to refine
 the guess. Although the EM algorithm is deterministic, the initialization procedures use 
 random numbers and so the routine will not give identical answers if you call it multiple
 times with the same input data.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [m,v,w,g,f,pp,gg]=gaussmix(x,c,l,m0,v0,w0,wx)
0002 %GAUSSMIX fits a gaussian mixture pdf to a set of data observations [m,v,w,g,f]=(x,c,l,m0,v0,w0,wx)
0003 %
0004 % Usage:
0005 %    (1) [m,v,w]=gaussmix(x,[],[],k);    % create GMM with k mixtures and diagonal covariances
0006 %    (2) [m,v,w]=gaussmix(x,[],[],k,'v);    % create GMM with k mixtures and full covariances
0007 %
0008 % Inputs: n data values, k mixtures, p parameters, l loops
0009 %
0010 %     X(n,p)   Input data vectors, one per row.
0011 %     c(1)     Minimum variance of normalized data (Use [] to take default value of 1/n^2)
0012 %     L        The integer portion of l gives a maximum loop count. The fractional portion gives
0013 %              an optional stopping threshold. Iteration will cease if the increase in
0014 %              log likelihood density per data point is less than this value. Thus l=10.001 will
0015 %              stop after 10 iterations or when the increase in log likelihood falls below
0016 %              0.001.
0017 %              As a special case, if L=0, then the first three outputs are omitted.
0018 %              Use [] to take default value of 100.0001
0019 %     M0       Number of mixtures required (or initial mixture means - see below)
0020 %     V0       Initialization mode:
0021 %                'f'    Initialize with K randomly selected data points [default]
0022 %                'p'    Initialize with centroids and variances of random partitions
0023 %                'k'    k-means algorithm ('kf' and 'kp' determine initialization)
0024 %                'h'    k-harmonic means algorithm ('hf' and 'hp' determine initialization) [default]
0025 %                's'    do not scale data during initialization to have equal variances
0026 %                'm'    M0 contains the initial centres
0027 %                'v'    full covariance matrices
0028 %              Mode 'hf' [the default] generally gives the best results but 'f' is faster and often OK
0029 %     W0(n,1)  Data point weights
0030 %
0031 %     Alternatively, initial values for M0, V0 and W0 can be given  explicitly:
0032 %
0033 %     M0(k,p)  Initial mixture means, one row per mixture.
0034 %     V0(k,p)  Initial mixture variances, one row per mixture.
0035 %      or V0(p,p,k)  one full-covariance matrix per mixture
0036 %     W0(k,1)  Initial mixture weights, one per mixture. The weights should sum to unity.
0037 %     WX(n,1)  Data point weights
0038 %
0039 % Outputs: (Note that M, V and W are omitted if L==0)
0040 %
0041 %     M(k,p)   Mixture means, one row per mixture. (omitted if L==0)
0042 %     V(k,p)   Mixture variances, one row per mixture. (omitted if L==0)
0043 %       or V(p,p,k) if full covariance matrices in use (i.e. either 'v' option or V0(p,p,k) specified)
0044 %     W(k,1)   Mixture weights, one per mixture. The weights will sum to unity. (omitted if L==0)
0045 %     G       Average log probability of the input data points.
0046 %     F        Fisher's Discriminant measures how well the data divides into classes.
0047 %              It is the ratio of the between-mixture variance to the average mixture variance: a
0048 %              high value means the classes (mixtures) are well separated.
0049 %     PP(n,1)  Log probability of each data point
0050 %     GG(l+1,1) Average log probabilities at the beginning of each iteration and at the end
0051 %
0052 % The fitting procedure uses one of several initialization methods to create an initial guess
0053 % for the mixture centres and then uses the EM (expectation-maximization) algorithm to refine
0054 % the guess. Although the EM algorithm is deterministic, the initialization procedures use
0055 % random numbers and so the routine will not give identical answers if you call it multiple
0056 % times with the same input data.
0057 
0058 %  Bugs/Suggestions
0059 %     (1) Allow processing in chunks by outputting/reinputting an array of sufficient statistics
0060 %     (2) Other initialization options:
0061 %              'l'    LBG algorithm
0062 %              'm'    Move-means (dog-rabbit) algorithm
0063 %     (3) Allow updating of weights-only, not means/variances
0064 
0065 %      Copyright (C) Mike Brookes 2000-2009
0066 %      Version: $Id: gaussmix.m 7784 2016-04-15 11:09:50Z dmb $
0067 %
0068 %   VOICEBOX is a MATLAB toolbox for speech processing.
0069 %   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
0070 %
0071 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0072 %   This program is free software; you can redistribute it and/or modify
0073 %   it under the terms of the GNU General Public License as published by
0074 %   the Free Software Foundation; either version 2 of the License, or
0075 %   (at your option) any later version.
0076 %
0077 %   This program is distributed in the hope that it will be useful,
0078 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0079 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0080 %   GNU General Public License for more details.
0081 %
0082 %   You can obtain a copy of the GNU General Public License from
0083 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0084 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0085 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0086 
0087 [n,p]=size(x);
0088 wn=ones(n,1);
0089 mx0=sum(x,1)/n;         % calculate mean and variance of input data in each dimension
0090 vx0=sum(x.^2,1)/n-mx0.^2;
0091 sx0=sqrt(vx0);
0092 sx0(sx0==0)=1;      % do not divide by zero when scaling
0093 scaled=0;           % data is not yet scaled
0094 memsize=voicebox('memsize');    % set memory size to use
0095 if isempty(c)
0096     c=1/n^2;
0097 else
0098     c=c(1);         % just to prevent legacy code failing
0099 end
0100 fulliv=0;           % initial variance is not full
0101 if isempty(l)
0102     l=100+1e-4;         % max loop count + stopping threshold
0103 end
0104 if nargin<5 || isempty(v0) || ischar(v0)             % no initial values specified for m0, v0, w0
0105     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0106     %  No initialvalues given, so we must use k-means or equivalent
0107     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0108     if nargin<6
0109         if nargin<5 || isempty(v0)
0110             v0='hf';                 % default initialization mode: hf
0111         end
0112         wx=wn;                      % no data point weights
0113     else
0114         wx=w0(:);                   % data point weights
0115     end
0116     if any(v0=='m')
0117         k=size(m0,1);
0118     else
0119         k=m0;
0120     end
0121     fv=any(v0=='v');                % full covariance matrices requested
0122     if n<=k                         % each data point can have its own mixture
0123         xs=(x-mx0(wn,:))./sx0(wn,:);          % scale the data
0124         m=xs(mod((1:k)-1,n)+1,:);   % just include all points several times
0125         v=zeros(k,p);               % will be set to floor later
0126         w=zeros(k,1);
0127         w(1:n)=1/n;
0128         if l>0
0129             l=0.1;                  % no point in iterating
0130         end
0131     else                            % more points than mixtures
0132         if any(v0=='s')
0133             xs=x;                   % do not scale data during initialization
0134         else
0135             xs=(x-mx0(wn,:))./sx0(wn,:);  % else scale now
0136             if any(v0=='m')
0137                 m=(m0-mx0(ones(k,1),:))./sx0(ones(k,1),:);  % scale specified means as well
0138             end
0139         end
0140         w=repmat(1/k,k,1);                      % all mixtures equally likely
0141         if any(v0=='k')                         % k-means initialization
0142             if any(v0=='m')
0143                 [m,e,j]=v_kmeans(xs,k,m);
0144             elseif any(v0=='p')
0145                 [m,e,j]=v_kmeans(xs,k,'p');
0146             else
0147                 [m,e,j]=v_kmeans(xs,k,'f');
0148             end
0149         elseif any(v0=='h')                     % k-harmonic means initialization
0150             if any(v0=='m')
0151                 [m,e,j]=kmeanhar(xs,k,[],4,m);
0152             else
0153                 if any(v0=='p')
0154                     [m,e,j]=kmeanhar(xs,k,[],4,'p');
0155                 else
0156                     [m,e,j]=kmeanhar(xs,k,[],4,'f');
0157                 end
0158             end
0159         elseif any(v0=='p')                     % Initialize using a random partition
0160             j=ceil(rand(n,1)*k);                % allocate to random clusters
0161             j(rnsubset(k,n))=1:k;               % but force at least one point per cluster
0162             for i=1:k
0163                 m(i,:)=mean(xs(j==i,:),1);
0164             end
0165         else
0166             if any(v0=='m')
0167                 m=m0;                           % use specified centres
0168             else
0169                 m=xs(rnsubset(k,n),:);          % Forgy initialization: sample k centres without replacement [default]
0170             end
0171             [e,j]=v_kmeans(xs,k,m,0);             % find out the cluster allocation
0172         end
0173         if any(v0=='s')
0174             xs=(x-mx0(wn,:))./sx0(wn,:);      % scale data now if not done previously
0175         end
0176         v=zeros(k,p);                   % diagonal covariances
0177         w=zeros(k,1);
0178         for i=1:k
0179             ni=sum(j==i);               % number assigned to this centre
0180             w(i)=(ni+1)/(n+k);          % weight of this mixture
0181             if ni
0182                 v(i,:)=sum((xs(j==i,:)-repmat(m(i,:),ni,1)).^2,1)/ni;
0183             else
0184                 v(i,:)=zeros(1,p);
0185             end
0186         end
0187     end
0188 else
0189     %%%%%%%%%%%%%%%%%%%%%%%%
0190     % use initial values given as input parameters
0191     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0192     [k,p]=size(m0);
0193     xs=(x-mx0(wn,:))./sx0(wn,:);          % scale the data
0194     m=(m0-mx0(ones(k,1),:))./sx0(ones(k,1),:);          % and the means
0195     v=v0;
0196     w=w0;
0197     fv=ndims(v)>2 || size(v,1)>k;                       % full covariance matrix is supplied
0198     if fv
0199         mk=eye(p)==0;                                    % off-diagonal elements
0200         fulliv=any(v(repmat(mk,[1 1 k]))~=0);            % check if any are non-zero
0201         if ~fulliv
0202             v=reshape(v(repmat(~mk,[1 1 k])),p,k)'./repmat(sx0.^2,k,1);   % just pick out and scale the diagonal elements for now
0203         else
0204             v=v./repmat(sx0'*sx0,[1 1 k]);              % scale the full covariance matrix
0205         end
0206     end
0207     if nargin<7
0208         wx=wn;              % no data point weights
0209     end
0210 end
0211 if length(wx)~=n
0212     error('%d datapoints but %d weights',n,length(wx));
0213 end
0214 lsx=sum(log(sx0));
0215 xsw=xs.*repmat(wx,1,p); % weighted data points
0216 nwt=sum(wx);        % number of data points counting duplicates
0217 if ~fulliv          % initializing with diagonal covariance
0218     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0219     % Diagonal Covariance matrices  %
0220     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0221     v=max(v,c);         % apply the lower bound
0222     xs2=xs.^2.*repmat(wx,1,p);          % square and weight the data for variance calculations
0223 
0224     % If data size is large then do calculations in chunks
0225 
0226     nb=min(n,max(1,floor(memsize/(8*p*k))));    % chunk size for testing data points
0227     nl=ceil(n/nb);                  % number of chunks
0228     jx0=n-(nl-1)*nb;                % size of first chunk
0229 
0230     im=repmat(1:k,1,nb); im=im(:);
0231     th=(l-floor(l))*n;
0232     sd=(nargout > 3*(l~=0)); % = 1 if we are outputting log likelihood values
0233     lp=floor(l)+sd;   % extra loop needed to calculate final G value
0234 
0235     lpx=zeros(1,n);             % log probability of each data point
0236     wk=ones(k,1);
0237     wp=ones(1,p);
0238     wnb=ones(1,nb);
0239     wnj=ones(1,jx0);
0240 
0241     % EM loop
0242 
0243     g=0;                           % dummy initial value for comparison
0244     gg=zeros(lp+1,1);
0245     ss=sd;                       % initialize stopping count (0 or 1)
0246     for j=1:lp
0247         g1=g;                    % save previous log likelihood (2*pi factor omitted)
0248         m1=m;                       % save previous means, variances and weights
0249         v1=v;
0250         w1=w;
0251         vi=-0.5*v.^(-1);                % data-independent scale factor in exponent
0252         lvm=log(w)-0.5*sum(log(v),2);   % log of external scale factor (excluding -0.5*p*log(2pi) term)
0253 
0254         % first do partial chunk (of length jx0)
0255 
0256         jx=jx0;
0257         ii=1:jx;                        % indices of data points in this chunk
0258         kk=repmat(ii,k,1);              % kk(jx,k): one row per data point, one column per mixture
0259         km=repmat(1:k,1,jx);            % km(jx,k): one row per data point, one column per mixture
0260         py=reshape(sum((xs(kk(:),:)-m(km(:),:)).^2.*vi(km(:),:),2),k,jx)+lvm(:,wnj); % py(k,jx) pdf of each point with each mixture
0261         mx=max(py,[],1);                % mx(1,jx) find normalizing factor for each data point to prevent underflow when using exp()
0262         px=exp(py-mx(wk,:));            % find normalized probability of each mixture for each datapoint
0263         ps=sum(px,1);                   % total normalized likelihood of each data point
0264         px=px./ps(wk,:);                % relative mixture probabilities for each data point (columns sum to 1)
0265         lpx(ii)=log(ps)+mx;
0266         pk=px*wx(ii);                       % pk(k,1) effective number of data points for each mixture (could be zero due to underflow)
0267         sx=px*xsw(ii,:);
0268         sx2=px*xs2(ii,:);
0269         for il=2:nl                     % process the data points in chunks
0270             ix=jx+1;
0271             jx=jx+nb;                   % increment upper limit
0272             ii=ix:jx;                   % indices of data points in this chunk
0273             kk=repmat(ii,k,1);
0274             py=reshape(sum((xs(kk(:),:)-m(im,:)).^2.*vi(im,:),2),k,nb)+lvm(:,wnb);
0275             mx=max(py,[],1);            % find normalizing factor for each data point to prevent underflow when using exp()
0276             px=exp(py-mx(wk,:));        % find normalized probability of each mixture for each datapoint
0277             ps=sum(px,1);               % total normalized likelihood of each data point
0278             px=px./ps(wk,:);            % relative mixture probabilities for each data point (columns sum to 1)
0279             lpx(ii)=log(ps)+mx;
0280             pk=pk+px*wx(ii);                % pk(k,1) effective number of data points for each mixture (could be zero due to underflow)
0281             sx=sx+px*xsw(ii,:);
0282             sx2=sx2+px*xs2(ii,:);
0283         end
0284         g=lpx*wx;                       % total log probability summed over all data points
0285         gg(j)=g;                        % save log prob at each iteration
0286         w=pk/nwt;                       % normalize to get the weights
0287         if pk                           % if all elements of pk are non-zero
0288             m=sx./pk(:,wp);             % calculate mixture means
0289             v=sx2./pk(:,wp);            % and variances
0290         else
0291             wm=pk==0;                   % mask indicating mixtures with zero weights
0292             nz=sum(wm);                  % number of zero-weight mixtures
0293             [vv,mk]=sort(lpx);          % find the lowest probability data points
0294             m=zeros(k,p);               % initialize means and variances to zero (variances are floored later)
0295             v=m;
0296             m(wm,:)=xs(mk(1:nz),:);     % set zero-weight mixture means to worst-fitted data points
0297             w(wm)=1/n;                   % set these weights non-zero
0298             w=w*n/(n+nz);                % normalize so the weights sum to unity
0299             wm=~wm;                     % mask for non-zero weights
0300             m(wm,:)=sx(wm,:)./pk(wm,wp);  % recalculate means and variances for mixtures with a non-zero weight
0301             v(wm,:)=sx2(wm,:)./pk(wm,wp);
0302         end
0303         v=max(v-m.^2,c);                % apply floor to variances
0304         if g-g1<=th && j>1
0305             if ~ss, break; end  %  stop
0306             ss=ss-1;       % stop next time
0307         end
0308 
0309     end
0310     if sd && ~fv  % we need to calculate the final probabilities
0311         pp=lpx'-0.5*p*log(2*pi)-lsx;   % log of total probability of each data point
0312         gg=gg(1:j)/n-0.5*p*log(2*pi)-lsx;    % average log prob at each iteration
0313         g=gg(end);
0314         %     gg' % *** DEBUG ***
0315         m=m1;       % back up to previous iteration
0316         v=v1;
0317         w=w1;
0318         mm=sum(m,1)/k;
0319         f=(m(:)'*m(:)-k*mm(:)'*mm(:))/sum(v(:));
0320     end
0321     if ~fv
0322         m=m.*sx0(ones(k,1),:)+mx0(ones(k,1),:);    % unscale means
0323         v=v.*repmat(sx0.^2,k,1);                % and variances
0324     else
0325         v1=v;
0326         v=zeros(p,p,k);
0327         mk=eye(p)==1;                           % mask for diagonal elements
0328         v(repmat(mk,[1 1 k]))=v1';              % set from v1
0329     end
0330 end
0331 if fv              % check if full covariance matrices were requested
0332     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0333     % Full Covariance matrices  %
0334     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0335     pl=p*(p+1)/2;
0336     lix=1:p^2;
0337     cix=repmat(1:p,p,1);
0338     rix=cix';
0339     lix(cix>rix)=[];                                        % index of lower triangular elements
0340     cix=cix(lix);                                           % index of lower triangular columns
0341     rix=rix(lix);                                           % index of lower triangular rows
0342     dix=find(rix==cix);
0343     lixi=zeros(p,p);
0344     lixi(lix)=1:pl;
0345     lixi=lixi';
0346     lixi(lix)=1:pl;                                        % reverse index to build full matrices
0347     v=reshape(v,p^2,k);
0348     v=v(lix,:)';                                            % lower triangular in rows
0349 
0350     % If data size is large then do calculations in chunks
0351 
0352     nb=min(n,max(1,floor(memsize/(24*p*k))));    % chunk size for testing data points
0353     nl=ceil(n/nb);                  % number of chunks
0354     jx0=n-(nl-1)*nb;                % size of first chunk
0355     %
0356     th=(l-floor(l))*n;
0357     sd=(nargout > 3*(l~=0)); % = 1 if we are outputting log likelihood values
0358     lp=floor(l)+sd;   % extra loop needed to calculate final G value
0359     %
0360     lpx=zeros(1,n);             % log probability of each data point
0361     wk=ones(k,1);
0362     wp=ones(1,p);
0363     wpl=ones(1,pl);             % 1 index for lower triangular matrix
0364     wnb=ones(1,nb);
0365     wnj=ones(1,jx0);
0366 
0367     % EM loop
0368 
0369     g=0;                        % dummy initial value for comparison
0370     gg=zeros(lp+1,1);
0371     ss=sd;                      % initialize stopping count (0 or 1)
0372     vi=zeros(p*k,p);            % stack of k inverse cov matrices each size p*p
0373     vim=zeros(p*k,1);           % stack of k vectors of the form inv(v)*m
0374     mtk=vim;                      % stack of k vectors of the form m
0375     lvm=zeros(k,1);
0376     wpk=repmat((1:p)',k,1);
0377     for j=1:lp
0378         g1=g;                   % save previous log likelihood (2*pi factor omitted)
0379         m1=m;                    % save previous means, variances and weights
0380         v1=v;
0381         w1=w;
0382         for ik=1:k
0383 
0384             % these lines added for debugging only
0385             %             vk=reshape(v(k,lixi),p,p);
0386             %             condk(ik)=cond(vk);
0387             %%%%%%%%%%%%%%%%%%%%
0388             [uvk,dvk]=eig(reshape(v(ik,lixi),p,p));    % convert lower triangular to full and find eigenvalues
0389             dvk=max(diag(dvk),c);                    % apply variance floor to eigenvalues
0390             vik=-0.5*uvk*diag(dvk.^(-1))*uvk';      % calculate inverse
0391             vi((ik-1)*p+(1:p),:)=vik;               % vi contains all mixture inverses stacked on top of each other
0392             vim((ik-1)*p+(1:p))=vik*m(ik,:)';       % vim contains vi*m for all mixtures stacked on top of each other
0393             mtk((ik-1)*p+(1:p))=m(ik,:)';           % mtk contains all mixture means stacked on top of each other
0394             lvm(ik)=log(w(ik))-0.5*sum(log(dvk));       % vm contains the weighted sqrt of det(vi) for each mixture
0395         end
0396         %
0397         %         % first do partial chunk
0398         %
0399         jx=jx0;
0400         ii=1:jx;
0401         xii=xs(ii,:).';
0402         py=reshape(sum(reshape((vi*xii-vim(:,wnj)).*(xii(wpk,:)-mtk(:,wnj)),p,jx*k),1),k,jx)+lvm(:,wnj);
0403         mx=max(py,[],1);                % find normalizing factor for each data point to prevent underflow when using exp()
0404         px=exp(py-mx(wk,:));            % find normalized probability of each mixture for each datapoint
0405         ps=sum(px,1);                   % total normalized likelihood of each data point
0406         px=px./ps(wk,:);                % relative mixture probabilities for each data point (columns sum to 1)
0407         lpx(ii)=log(ps)+mx;
0408         pk=px*wx(ii);                       % effective number of data points for each mixture (could be zero due to underflow)
0409         sx=px*xsw(ii,:);
0410         sx2=px*(xsw(ii,rix).*xs(ii,cix));    % accumulator for variance calculation (lower tri cov matrix as a row)
0411         for il=2:nl
0412             ix=jx+1;
0413             jx=jx+nb;        % increment upper limit
0414             ii=ix:jx;
0415             xii=xs(ii,:).';
0416             py=reshape(sum(reshape((vi*xii-vim(:,wnb)).*(xii(wpk,:)-mtk(:,wnb)),p,nb*k),1),k,nb)+lvm(:,wnb);
0417             mx=max(py,[],1);                % find normalizing factor for each data point to prevent underflow when using exp()
0418             px=exp(py-mx(wk,:));            % find normalized probability of each mixture for each datapoint
0419             ps=sum(px,1);                   % total normalized likelihood of each data point
0420             px=px./ps(wk,:);                % relative mixture probabilities for each data point (columns sum to 1)
0421             lpx(ii)=log(ps)+mx;
0422             pk=pk+px*wx(ii);                    % effective number of data points for each mixture (could be zero due to underflow)
0423             sx=sx+px*xsw(ii,:);             % accumulator for mean calculation
0424             sx2=sx2+px*(xsw(ii,rix).*xs(ii,cix));    % accumulator for variance calculation
0425         end
0426         g=lpx*wx;                   % total log probability summed over all data points
0427         gg(j)=g;                    % save convergence history
0428         w=pk/nwt;                   % w(k,1) normalize to get the column of weights
0429         if pk                       % if all elements of pk are non-zero
0430             m=sx./pk(:,wp);         % find mean and mean square
0431             v=sx2./pk(:,wpl);
0432         else
0433             wm=pk==0;                       % mask indicating mixtures with zero weights
0434             nz=sum(wm);                  % number of zero-weight mixtures
0435             [vv,mk]=sort(lpx);             % find the lowest probability data points
0436             m=zeros(k,p);                   % initialize means and variances to zero (variances are floored later)
0437             v=zeros(k,pl);
0438             m(wm,:)=xs(mk(1:nz),:);                % set zero-weight mixture means to worst-fitted data points
0439             w(wm)=1/n;                      % set these weights non-zero
0440             w=w*n/(n+nz);                   % normalize so the weights sum to unity
0441             wm=~wm;                         % mask for non-zero weights
0442             m(wm,:)=sx(wm,:)./pk(wm,wp);  % recalculate means and variances for mixtures with a non-zero weight
0443             v(wm,:)=sx2(wm,:)./pk(wm,wpl);
0444         end
0445         v=v-m(:,cix).*m(:,rix);                 % subtract off mean squared
0446         if g-g1<=th && j>1
0447             if ~ss, break; end  %  stop
0448             ss=ss-1;       % stop next time
0449         end
0450     end
0451     if sd  % we need to calculate the final probabilities
0452         pp=lpx'-0.5*p*log(2*pi)-lsx;   % log of total probability of each data point
0453         gg=gg(1:j)/nwt-0.5*p*log(2*pi)-lsx;    % average log prob at each iteration
0454         g=gg(end);
0455         %             gg' % *** DEBUG ONLY ***
0456         m=m1;                                           % back up to previous iteration
0457         v=zeros(p,p,k);                                 % reserve spave for k full covariance matrices
0458         trv=0;                                          % sum of variance matrix traces
0459         for ik=1:k                                      % loop for each mixture to apply variance floor
0460             [uvk,dvk]=eig(reshape(v1(ik,lixi),p,p));    % convert lower triangular to full and find eigenvectors
0461             dvk=max(diag(dvk),c);                       % apply variance floor to eigenvalues
0462             v(:,:,ik)=uvk*diag(dvk)*uvk';               % reconstitute full matrix
0463             trv=trv+sum(dvk);                           % add trace to the sum
0464         end
0465         w=w1;
0466         mm=sum(m,1)/k;
0467         f=(m(:)'*m(:)-k*mm(:)'*mm(:))/trv;
0468     else
0469         v1=v;                                           % lower triangular form
0470         v=zeros(p,p,k);                                 % reserve spave for k full covariance matrices
0471         for ik=1:k                                      % loop for each mixture to apply variance floor
0472             [uvk,dvk,]=eig(reshape(v1(ik,lixi),p,p));    % convert lower triangular to full and find eigenvectors
0473             dvk=max(diag(dvk),c);                       % apply variance floor
0474             v(:,:,ik)=uvk*diag(dvk)*uvk';               % reconstitute full matrix
0475         end
0476     end
0477     m=m.*sx0(ones(k,1),:)+mx0(ones(k,1),:);  % unscale means
0478     v=v.*repmat(sx0'*sx0,[1 1 k]);
0479 end
0480 if l==0         % suppress the first three output arguments if l==0
0481     m=g;
0482     v=f;
0483     w=pp;
0484 end
0485

Generated on Tue 10-Oct-2017 08:30:10 by m2html © 2003