v_gaussmix

PURPOSE ^

V_GAUSSMIX fits a gaussian mixture pdf to a set of data observations [m,v,w,g,f]=(x,c,l,m0,v0,w0,wx)

SYNOPSIS ^

function [m,v,w,g,f,pp,gg]=v_gaussmix(x,c,l,m0,v0,w0,wx)

DESCRIPTION ^

V_GAUSSMIX fits a gaussian mixture pdf to a set of data observations [m,v,w,g,f]=(x,c,l,m0,v0,w0,wx)

 Usage:
    (1) [m,v,w]=v_gaussmix(x,[],[],k);    % create GMM with k mixtures and diagonal covariances
    (2) [m,v,w]=gaussmix(x,[],[],k,'v');    % create GMM with k mixtures and full covariances

 Inputs: n data values, k mixtures, p parameters, l loops

     X(n,p)   Input data vectors, one per row.
     C(1)     Minimum variance of normalized data (Use [] to take default value of n^-2)
     L        The integer portion of l gives a maximum loop count. The fractional portion gives
              an optional stopping threshold. Iteration will cease if the increase in
              log likelihood density per data point is less than this value. Thus l=10.001 will
              stop after 10 iterations or when the increase in log likelihood falls below
              0.001.
              As a special case, if L=0, then the first three outputs are omitted.
              Use [] to take default value of 100.0001
     M0       Number of mixtures required (or initial mixture means, M0(k,p), - see below)
     V0       Initialization mode:
                'f'    Initialize with K randomly selected data points [default]
                'p'    Initialize with centroids and variances of random partitions
                'k'    k-means algorithm ('kf' and 'kp' determine initialization)
                'h'    k-harmonic means algorithm ('hf' and 'hp' determine initialization) [default]
                's'    use unscaled data during initialization phase instead of scaling it first
                'm'    M0(k,p) contains the initial centres
                'v'    full covariance matrices
              Mode 'hf' [the default] generally gives the best results but 'f' is faster and often OK
     W0(n,1)  Data point weights (need not sum to unity)

   Alternatively, initial values for M0, V0 and W0 can be given  explicitly:

     M0(k,p)       Initial mixture means, one row per mixture.
     V0(k,p)       Initial mixture variances, one row per mixture.
      or V0(p,p,k)    one full-covariance matrix per mixture
     W0(k,1)       Initial mixture weights, one per mixture. The weights must sum to unity.
     WX(n,1)       Data point weights (need not sum to unity)

 Outputs: (Note that M, V and W are omitted if L==0)

     M(k,p)   Mixture means, one row per mixture. (omitted if L==0)
     V(k,p)   Mixture variances, one row per mixture. (omitted if L==0)
       or V(p,p,k) if full covariance matrices in use (i.e. either 'v' option or V0(p,p,k) specified)
     W(k,1)   Mixture weights, one per mixture. The weights will sum to unity. (omitted if L==0)
     G       Average log probability of the input data points.
     F        Fisher's Discriminant measures how well the data divides into classes.
              It is the ratio of the between-mixture variance to the average mixture variance: a
              high value means the classes (mixtures) are well separated.
     PP(n,1)  Log probability of each data point
     GG(l+1,1) Average log probabilities at the beginning of each iteration and at the end

 The fitting procedure uses one of several initialization methods to create an initial guess
 for the mixture centres and then uses the EM (expectation-maximization) algorithm to refine
 the guess. Although the EM algorithm is deterministic, the initialization procedures use
 random numbers and so the routine will not give identical answers if you call it multiple
 times with the same input data. See v_randvec() for generating GMM data vectors.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [m,v,w,g,f,pp,gg]=v_gaussmix(x,c,l,m0,v0,w0,wx)
0002 %V_GAUSSMIX fits a gaussian mixture pdf to a set of data observations [m,v,w,g,f]=(x,c,l,m0,v0,w0,wx)
0003 %
0004 % Usage:
0005 %    (1) [m,v,w]=v_gaussmix(x,[],[],k);    % create GMM with k mixtures and diagonal covariances
0006 %    (2) [m,v,w]=gaussmix(x,[],[],k,'v');    % create GMM with k mixtures and full covariances
0007 %
0008 % Inputs: n data values, k mixtures, p parameters, l loops
0009 %
0010 %     X(n,p)   Input data vectors, one per row.
0011 %     C(1)     Minimum variance of normalized data (Use [] to take default value of n^-2)
0012 %     L        The integer portion of l gives a maximum loop count. The fractional portion gives
0013 %              an optional stopping threshold. Iteration will cease if the increase in
0014 %              log likelihood density per data point is less than this value. Thus l=10.001 will
0015 %              stop after 10 iterations or when the increase in log likelihood falls below
0016 %              0.001.
0017 %              As a special case, if L=0, then the first three outputs are omitted.
0018 %              Use [] to take default value of 100.0001
0019 %     M0       Number of mixtures required (or initial mixture means, M0(k,p), - see below)
0020 %     V0       Initialization mode:
0021 %                'f'    Initialize with K randomly selected data points [default]
0022 %                'p'    Initialize with centroids and variances of random partitions
0023 %                'k'    k-means algorithm ('kf' and 'kp' determine initialization)
0024 %                'h'    k-harmonic means algorithm ('hf' and 'hp' determine initialization) [default]
0025 %                's'    use unscaled data during initialization phase instead of scaling it first
0026 %                'm'    M0(k,p) contains the initial centres
0027 %                'v'    full covariance matrices
0028 %              Mode 'hf' [the default] generally gives the best results but 'f' is faster and often OK
0029 %     W0(n,1)  Data point weights (need not sum to unity)
0030 %
0031 %   Alternatively, initial values for M0, V0 and W0 can be given  explicitly:
0032 %
0033 %     M0(k,p)       Initial mixture means, one row per mixture.
0034 %     V0(k,p)       Initial mixture variances, one row per mixture.
0035 %      or V0(p,p,k)    one full-covariance matrix per mixture
0036 %     W0(k,1)       Initial mixture weights, one per mixture. The weights must sum to unity.
0037 %     WX(n,1)       Data point weights (need not sum to unity)
0038 %
0039 % Outputs: (Note that M, V and W are omitted if L==0)
0040 %
0041 %     M(k,p)   Mixture means, one row per mixture. (omitted if L==0)
0042 %     V(k,p)   Mixture variances, one row per mixture. (omitted if L==0)
0043 %       or V(p,p,k) if full covariance matrices in use (i.e. either 'v' option or V0(p,p,k) specified)
0044 %     W(k,1)   Mixture weights, one per mixture. The weights will sum to unity. (omitted if L==0)
0045 %     G       Average log probability of the input data points.
0046 %     F        Fisher's Discriminant measures how well the data divides into classes.
0047 %              It is the ratio of the between-mixture variance to the average mixture variance: a
0048 %              high value means the classes (mixtures) are well separated.
0049 %     PP(n,1)  Log probability of each data point
0050 %     GG(l+1,1) Average log probabilities at the beginning of each iteration and at the end
0051 %
0052 % The fitting procedure uses one of several initialization methods to create an initial guess
0053 % for the mixture centres and then uses the EM (expectation-maximization) algorithm to refine
0054 % the guess. Although the EM algorithm is deterministic, the initialization procedures use
0055 % random numbers and so the routine will not give identical answers if you call it multiple
0056 % times with the same input data. See v_randvec() for generating GMM data vectors.
0057 
0058 %  Bugs/Suggestions
0059 %     (1) Allow processing in chunks by outputting/reinputting an array of sufficient statistics
0060 %     (2) Other initialization options:
0061 %              'l'    LBG algorithm
0062 %              'm'    Move-means (dog-rabbit) algorithm
0063 %     (3) Allow updating of weights-only, not means/variances
0064 %     (4) Allow freezing of means and/or variances
0065 
0066 %      Copyright (C) Mike Brookes 2000-2009
0067 %      Version: $Id: v_gaussmix.m 10865 2018-09-21 17:22:45Z dmb $
0068 %
0069 %   VOICEBOX is a MATLAB toolbox for speech processing.
0070 %   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
0071 %
0072 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0073 %   This program is free software; you can redistribute it and/or modify
0074 %   it under the terms of the GNU General Public License as published by
0075 %   the Free Software Foundation; either version 2 of the License, or
0076 %   (at your option) any later version.
0077 %
0078 %   This program is distributed in the hope that it will be useful,
0079 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0080 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0081 %   GNU General Public License for more details.
0082 %
0083 %   You can obtain a copy of the GNU General Public License from
0084 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0085 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0086 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0087 [n,p]=size(x); % n = number of training values, p = dimension of data vector
0088 wn=ones(n,1);
0089 memsize=v_voicebox('memsize');    % set memory size to use
0090 if isempty(c)
0091     c=1/n^2;
0092 else
0093     c=c(1);         % just to prevent legacy code failing
0094 end
0095 fulliv=0;           % initial variance is not full
0096 if isempty(l)
0097     l=100+1e-4;         % max loop count + stopping threshold
0098 end
0099 if nargin<5 || isempty(v0) || ischar(v0)             % no initial values specified for m0, v0, w0
0100     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0101     %  No initialvalues given, so we must use k-means or equivalent
0102     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0103     if nargin<6
0104         if nargin<5 || isempty(v0)
0105             v0='hf';                 % default initialization mode: hf
0106         end
0107         wx=wn;                      % no data point weights
0108     else
0109         wx=w0(:);                   % data point weights
0110     end
0111     wx=wx(:)/sum(wx);               % normalize and force to be a column vector
0112     if any(v0=='m')
0113         k=size(m0,1);
0114     else
0115         k=m0;
0116     end
0117     fv=any(v0=='v');                % full covariance matrices requested
0118     mx0=wx'*x;                      % calculate mean of input data in each dimension
0119     vx0=wx'*x.^2-mx0.^2;            % calculate variance of input data in each dimension
0120     sx0=sqrt(vx0)+(vx0==0);         % scale factor: std of input data in each dimension (or 1 if std==0)
0121     if n<=k                         % each data point can have its own mixture
0122         xs=(x-mx0(wn,:))./sx0(wn,:);          % scale the data
0123         m=xs(mod((1:k)-1,n)+1,:);   % just include all points several times
0124         v=zeros(k,p);               % will be set to floor later
0125         w=zeros(k,1);
0126         w(1:n)=1/n;
0127         if l>0
0128             l=0.1;                  % no point in iterating
0129         end
0130     else                            % more points than mixtures
0131         if any(v0=='s')
0132             xs=x;                   % do not scale data during initialization
0133         else
0134             xs=(x-mx0(wn,:))./sx0(wn,:);  % else scale now
0135             if any(v0=='m')
0136                 m=(m0-mx0(ones(k,1),:))./sx0(ones(k,1),:);  % scale specified means as well
0137             end
0138         end
0139         w=repmat(1/k,k,1);                      % all mixtures equally likely
0140         if any(v0=='k')                         % k-means initialization
0141             if any(v0=='m')
0142                 [m,e,j]=v_kmeans(xs,k,m);
0143             elseif any(v0=='p')
0144                 [m,e,j]=v_kmeans(xs,k,'p');
0145             else
0146                 [m,e,j]=v_kmeans(xs,k,'f');
0147             end
0148         elseif any(v0=='h')                     % k-harmonic means initialization
0149             if any(v0=='m')
0150                 [m,e,j]=v_kmeanhar(xs,k,[],4,m);
0151             else
0152                 if any(v0=='p')
0153                     [m,e,j]=v_kmeanhar(xs,k,[],4,'p');
0154                 else
0155                     [m,e,j]=v_kmeanhar(xs,k,[],4,'f');
0156                 end
0157             end
0158         elseif any(v0=='p')                     % Initialize using a random partition
0159             j=ceil(rand(n,1)*k);                % allocate to random clusters
0160             j(v_rnsubset(k,n))=1:k;               % but force at least one point per cluster
0161             for i=1:k
0162                 m(i,:)=mean(xs(j==i,:),1);
0163             end
0164         else
0165             if any(v0=='m')
0166                 m=m0;                           % use specified centres
0167             else
0168                 m=xs(v_rnsubset(k,n),:);          % Forgy initialization: sample k centres without replacement [default]
0169             end
0170             [e,j]=v_kmeans(xs,k,m,0);             % find out the cluster allocation
0171         end
0172         if any(v0=='s')
0173             xs=(x-mx0(wn,:))./sx0(wn,:);      % scale data now if not done previously
0174         end
0175         v=zeros(k,p);                   % diagonal covariances
0176         w=zeros(k,1);
0177         for i=1:k
0178             ni=sum(j==i);               % number assigned to this centre
0179             w(i)=(ni+1)/(n+k);          % weight of this mixture
0180             if ni
0181                 v(i,:)=sum((xs(j==i,:)-repmat(m(i,:),ni,1)).^2,1)/ni;
0182             else
0183                 v(i,:)=zeros(1,p);
0184             end
0185         end
0186     end
0187 else
0188     %%%%%%%%%%%%%%%%%%%%%%%%
0189     % use initial values given as input parameters
0190     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0191     if nargin<7
0192         wx=wn;              % no data point weights
0193     end
0194     wx=wx(:)/sum(wx); % normalize weights and force a column vector
0195     mx0=wx'*x;         % calculate mean of input data in each dimension
0196     vx0=wx'*x.^2-mx0.^2; % calculate variance of input data in each dimension
0197     sx0=sqrt(vx0);
0198     sx0(sx0==0)=1;      % do not divide by zero when scaling
0199     [k,p]=size(m0);
0200     xs=(x-mx0(wn,:))./sx0(wn,:);          % scale the data
0201     m=(m0-mx0(ones(k,1),:))./sx0(ones(k,1),:);          % and the means
0202     v=v0;
0203     w=w0;
0204     fv=ndims(v)>2 || size(v,1)>k;                       % full covariance matrix is supplied
0205     if fv
0206         mk=eye(p)==0;                                    % off-diagonal elements
0207         fulliv=any(v(repmat(mk,[1 1 k]))~=0);            % check if any are non-zero
0208         if ~fulliv
0209             v=reshape(v(repmat(~mk,[1 1 k])),p,k)'./repmat(sx0.^2,k,1);   % just pick out and scale the diagonal elements for now
0210         else
0211             v=v./repmat(sx0'*sx0,[1 1 k]);              % scale the full covariance matrix
0212         end
0213     end
0214 end
0215 if length(wx)~=n
0216     error('%d datapoints but %d weights',n,length(wx));
0217 end
0218 lsx=sum(log(sx0));
0219 xsw=xs.*repmat(wx,1,p); % weighted data points
0220 if ~fulliv          % initializing with diagonal covariance if v0 is either unspecified or diagonal
0221     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0222     % Diagonal Covariance matrices  %
0223     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0224     v=max(v,c);         % apply the lower bound
0225     xs2=xs.^2.*repmat(wx,1,p);          % square and weight the data for variance calculations
0226     
0227     % If data size is large then do calculations in chunks
0228     
0229     nb=min(n,max(1,floor(memsize/(8*p*k))));    % chunk size for testing data points
0230     nl=ceil(n/nb);                  % number of chunks
0231     jx0=n-(nl-1)*nb;                % size of first chunk
0232     
0233     im=repmat(1:k,1,nb); im=im(:);
0234     th=(l-floor(l))*n;
0235     sd=(nargout > 3*(l~=0)); % = 1 if we are outputting log likelihood values
0236     lp=floor(l)+sd;   % extra loop needed to calculate final G value
0237     
0238     lpx=zeros(1,n);             % log probability of each data point
0239     wk=ones(k,1);
0240     wp=ones(1,p);
0241     wnb=ones(1,nb);
0242     wnj=ones(1,jx0);
0243     
0244     % EM loop
0245     
0246     g=0;                            % dummy initial value for comparison
0247     gg=zeros(lp+1,1);
0248     ss=sd;                          % initialize stopping count (0 or 1)
0249     for j=1:lp
0250         g1=g;                       % save previous log likelihood (2*pi factor omitted)
0251         m1=m;                       % save previous means, variances and weights
0252         v1=v;
0253         w1=w;
0254         vi=-0.5*v.^(-1);                % data-independent scale factor in exponent
0255         lvm=log(w)-0.5*sum(log(v),2);   % log of external scale factor (excluding -0.5*p*log(2pi) term)
0256         
0257         % first do partial chunk (of length jx0)
0258         
0259         jx=jx0;
0260         ii=1:jx;                        % indices of data points in this chunk
0261         kk=repmat(ii,k,1);              % kk(k,jx): data point index
0262         km=repmat(1:k,1,jx);            % km(k,jx): mixture index
0263         py=reshape(sum((xs(kk(:),:)-m(km(:),:)).^2.*vi(km(:),:),2),k,jx)+lvm(:,wnj); % py(k,jx) pdf of each point with each mixture
0264         mx=max(py,[],1);                % mx(1,jx) find normalizing factor for each data point to prevent underflow when using exp()
0265         px=exp(py-mx(wk,:));            % find normalized probability of each mixture for each datapoint
0266         ps=sum(px,1);                   % total normalized likelihood of each data point
0267         px=px./ps(wk,:);                % relative mixture probabilities for each data point (columns sum to 1)
0268         lpx(ii)=log(ps)+mx;
0269         pk=px*wx(ii);                   % pk(k,1) effective fraction of data points for each mixture (could be zero due to underflow)
0270         sx=px*xsw(ii,:);
0271         sx2=px*xs2(ii,:);
0272         for il=2:nl                     % process the data points in chunks
0273             ix=jx+1;
0274             jx=jx+nb;                   % increment upper limit
0275             ii=ix:jx;                   % indices of data points in this chunk
0276             kk=repmat(ii,k,1);
0277             py=reshape(sum((xs(kk(:),:)-m(im,:)).^2.*vi(im,:),2),k,nb)+lvm(:,wnb);
0278             mx=max(py,[],1);            % find normalizing factor for each data point to prevent underflow when using exp()
0279             px=exp(py-mx(wk,:));        % find normalized probability of each mixture for each datapoint
0280             ps=sum(px,1);               % total normalized likelihood of each data point
0281             px=px./ps(wk,:);            % relative mixture probabilities for each data point (columns sum to 1)
0282             lpx(ii)=log(ps)+mx;
0283             pk=pk+px*wx(ii);            % pk(k,1) effective fraction of data points for each mixture (could be zero due to underflow)
0284             sx=sx+px*xsw(ii,:);
0285             sx2=sx2+px*xs2(ii,:);
0286         end
0287         g=lpx*wx;                       % total log probability summed over all data points
0288         gg(j)=g;                        % save log prob at each iteration
0289         w=pk;                           % normalize to get the weights
0290         if pk                           % if all elements of pk are non-zero
0291             m=sx./pk(:,wp);             % calculate mixture means
0292             v=sx2./pk(:,wp);            % and raw second moments
0293         else
0294             wm=pk==0;                   % mask indicating mixtures with zero weights
0295             nz=sum(wm);                  % number of zero-weight mixtures
0296             [vv,mk]=sort(lpx);          % find the lowest probability data points
0297             m=zeros(k,p);               % initialize means and variances to zero (variances are floored later)
0298             v=m;
0299             m(wm,:)=xs(mk(1:nz),:);     % set zero-weight mixture means to worst-fitted data points
0300             w(wm)=1/n;                   % set these weights non-zero
0301             w=w*n/(n+nz);                % normalize so the weights sum to unity
0302             wm=~wm;                     % mask for non-zero weights
0303             m(wm,:)=sx(wm,:)./pk(wm,wp);  % recalculate means and raw second moments for mixtures with a non-zero weight
0304             v(wm,:)=sx2(wm,:)./pk(wm,wp);
0305         end
0306         v=max(v-m.^2,c);                % convert raw second moments to variances and apply floor
0307         if g-g1<=th && j>1
0308             if ~ss, break; end          %  stop
0309             ss=ss-1;                    % stop next time
0310         end
0311     end
0312     if sd && ~fv                        % we need to calculate the final probabilities
0313         pp=lpx'-0.5*p*log(2*pi)-lsx;    % log of total probability of each data point
0314         gg=gg(1:j)-0.5*p*log(2*pi)-lsx; % average log prob at each iteration
0315         g=gg(end);
0316         m=m1;                           % back up to previous iteration
0317         v=v1;
0318         w=w1;
0319         mm=sum(m,1)/k;
0320         f=(m(:)'*m(:)-k*mm(:)'*mm(:))/sum(v(:));
0321     end
0322     if ~fv
0323         m=m.*sx0(ones(k,1),:)+mx0(ones(k,1),:);    % unscale means
0324         v=v.*repmat(sx0.^2,k,1);                % and variances
0325     else
0326         v1=v;
0327         v=zeros(p,p,k);
0328         mk=eye(p)==1;                           % mask for diagonal elements
0329         v(repmat(mk,[1 1 k]))=v1';              % set from v1
0330     end
0331 end
0332 if fv              % check if full covariance matrices were requested
0333     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0334     % Full Covariance matrices  %
0335     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0336     pl=p*(p+1)/2;                                           % number of non-trivial elements in lower triangular matrix
0337     lix=1:p^2;
0338     cix=repmat(1:p,p,1);
0339     rix=cix';
0340     lix(cix>rix)=[];                                        % index of lower triangular elements
0341     cix=cix(lix);                                           % index of lower triangular columns
0342     rix=rix(lix);                                           % index of lower triangular rows
0343     % dix=find(rix==cix);                                   % index of lower triangular diagonal elements [unused]
0344     lixi=zeros(p,p);
0345     lixi(lix)=1:pl;
0346     lixi=lixi';
0347     lixi(lix)=1:pl;                                         % reverse index to build full matrices
0348     v=reshape(v,p^2,k);
0349     v=v(lix,:)';                                            % lower triangular in rows (k,p*(p+1)/2)
0350     
0351     % If data size is large then do calculations in chunks
0352     
0353     nb=min(n,max(1,floor(memsize/(24*p*k))));   % chunk size for testing data points
0354     nl=ceil(n/nb);                              % number of chunks
0355     jx0=n-(nl-1)*nb;                            % size of first chunk
0356     %
0357     th=(l-floor(l))*n;
0358     sd=(nargout > 3*(l~=0));                    % = 1 if we are outputting log likelihood values
0359     lp=floor(l)+sd;                             % extra loop needed to calculate final G value
0360     %
0361     lpx=zeros(1,n);                             % log probability of each data point
0362     wk=ones(k,1);
0363     wp=ones(1,p);
0364     wpl=ones(1,pl);                             % 1 index for lower triangular matrix
0365     wnb=ones(1,nb);
0366     wnj=ones(1,jx0);
0367     
0368     % EM loop
0369     
0370     g=0;                        % dummy initial value for comparison
0371     gg=zeros(lp+1,1);
0372     ss=sd;                      % initialize stopping count (0 or 1)
0373     vi=zeros(p*k,p);            % stack of k inverse cov matrices each size p*p
0374     vim=zeros(p*k,1);           % stack of k vectors of the form inv(v)*m
0375     mtk=vim;                      % stack of k vectors of the form m
0376     lvm=zeros(k,1);
0377     wpk=repmat((1:p)',k,1);
0378     for j=1:lp
0379         g1=g;                   % save previous log likelihood (2*pi factor omitted)
0380         m1=m;                    % save previous means, variances and weights
0381         v1=v;
0382         w1=w;
0383         for ik=1:k            
0384             % these lines added for debugging only
0385             %             vk=reshape(v(k,lixi),p,p);
0386             %             condk(ik)=cond(vk);
0387             %%%%%%%%%%%%%%%%%%%%
0388             [uvk,dvk]=eig(reshape(v(ik,lixi),p,p));    % convert lower triangular to full and find eigenvalues
0389             dvk=max(diag(dvk),c);                    % apply variance floor to eigenvalues
0390             vik=-0.5*uvk*diag(dvk.^(-1))*uvk';      % calculate inverse
0391             vi((ik-1)*p+(1:p),:)=vik;               % vi contains all mixture inverses stacked on top of each other
0392             vim((ik-1)*p+(1:p))=vik*m(ik,:)';       % vim contains vi*m for all mixtures stacked on top of each other
0393             mtk((ik-1)*p+(1:p))=m(ik,:)';           % mtk contains all mixture means stacked on top of each other
0394             lvm(ik)=log(w(ik))-0.5*sum(log(dvk));       % vm contains the weighted sqrt of det(vi) for each mixture
0395         end
0396         %
0397         %         % first do partial chunk
0398         %
0399         jx=jx0;
0400         ii=1:jx;
0401         xii=xs(ii,:).';
0402         py=reshape(sum(reshape((vi*xii-vim(:,wnj)).*(xii(wpk,:)-mtk(:,wnj)),p,jx*k),1),k,jx)+lvm(:,wnj);
0403         mx=max(py,[],1);                % find normalizing factor for each data point to prevent underflow when using exp()
0404         px=exp(py-mx(wk,:));            % find normalized probability of each mixture for each datapoint
0405         ps=sum(px,1);                   % total normalized likelihood of each data point
0406         px=px./ps(wk,:);                % relative mixture probabilities for each data point (columns sum to 1)
0407         lpx(ii)=log(ps)+mx;
0408         pk=px*wx(ii);                       % effective fraction of data points for each mixture (could be zero due to underflow)
0409         sx=px*xsw(ii,:);
0410         sx2=px*(xsw(ii,rix).*xs(ii,cix));    % accumulator for variance calculation (lower tri cov matrix as a row)
0411         for il=2:nl
0412             ix=jx+1;
0413             jx=jx+nb;                       % increment upper limit
0414             ii=ix:jx;
0415             xii=xs(ii,:).';
0416             py=reshape(sum(reshape((vi*xii-vim(:,wnb)).*(xii(wpk,:)-mtk(:,wnb)),p,nb*k),1),k,nb)+lvm(:,wnb);
0417             mx=max(py,[],1);                % find normalizing factor for each data point to prevent underflow when using exp()
0418             px=exp(py-mx(wk,:));            % find normalized probability of each mixture for each datapoint
0419             ps=sum(px,1);                   % total normalized likelihood of each data point
0420             px=px./ps(wk,:);                % relative mixture probabilities for each data point (columns sum to 1)
0421             lpx(ii)=log(ps)+mx;
0422             pk=pk+px*wx(ii);                % effective fraction of data points for each mixture (could be zero due to underflow)
0423             sx=sx+px*xsw(ii,:);             % accumulator for mean calculation
0424             sx2=sx2+px*(xsw(ii,rix).*xs(ii,cix));    % accumulator for variance calculation
0425         end
0426         g=lpx*wx;                           % total log probability summed over all data points
0427         gg(j)=g;                            % save convergence history
0428         w=pk;                               % w(k,1) normalize to get the column of weights
0429         if pk                               % if all elements of pk are non-zero
0430             m=sx./pk(:,wp);                 % find mean and mean square
0431             v=sx2./pk(:,wpl);
0432         else
0433             wm=pk==0;                       % mask indicating mixtures with zero weights
0434             nz=sum(wm);                     % number of zero-weight mixtures
0435             [vv,mk]=sort(lpx);              % find the lowest probability data points
0436             m=zeros(k,p);                   % initialize means and variances to zero (variances are floored later)
0437             v=zeros(k,pl);
0438             m(wm,:)=xs(mk(1:nz),:);         % set zero-weight mixture means to worst-fitted data points
0439             w(wm)=1/n;                      % set these weights non-zero
0440             w=w*n/(n+nz);                   % normalize so the weights sum to unity
0441             wm=~wm;                         % mask for non-zero weights
0442             m(wm,:)=sx(wm,:)./pk(wm,wp);    % recalculate means and variances for mixtures with a non-zero weight
0443             v(wm,:)=sx2(wm,:)./pk(wm,wpl);
0444         end
0445         v=v-m(:,cix).*m(:,rix);             % subtract off mean squared
0446         if g-g1<=th && j>1
0447             if ~ss, break; end              %  stop
0448             ss=ss-1;                        % stop next time
0449         end
0450     end
0451     if sd                                   % we need to calculate the final probabilities
0452         pp=lpx'-0.5*p*log(2*pi)-lsx;        % log of total probability of each data point
0453         gg=gg(1:j)-0.5*p*log(2*pi)-lsx;     % average log prob at each iteration
0454         g=gg(end);
0455         %             gg' % *** DEBUG ONLY ***
0456         m=m1;                                           % back up to previous iteration
0457         v=zeros(p,p,k);                                 % reserve spave for k full covariance matrices
0458         trv=0;                                          % sum of variance matrix traces
0459         for ik=1:k                                      % loop for each mixture to apply variance floor
0460             [uvk,dvk]=eig(reshape(v1(ik,lixi),p,p));    % convert lower triangular to full and find eigenvectors
0461             dvk=max(diag(dvk),c);                       % apply variance floor to eigenvalues
0462             v(:,:,ik)=uvk*diag(dvk)*uvk';               % reconstitute full matrix
0463             trv=trv+sum(dvk);                           % add trace to the sum
0464         end
0465         w=w1;
0466         mm=sum(m,1)/k;
0467         f=(m(:)'*m(:)-k*mm(:)'*mm(:))/trv;
0468     else
0469         v1=v;                                           % lower triangular form
0470         v=zeros(p,p,k);                                 % reserve space for k full covariance matrices
0471         for ik=1:k                                      % loop for each mixture to apply variance floor
0472             [uvk,dvk]=eig(reshape(v1(ik,lixi),p,p));    % convert lower triangular to full and find eigenvectors
0473             dvk=max(diag(dvk),c);                       % apply variance floor
0474             v(:,:,ik)=uvk*diag(dvk)*uvk';               % reconstitute full matrix
0475         end
0476     end
0477     m=m.*sx0(ones(k,1),:)+mx0(ones(k,1),:);             % unscale means
0478     v=v.*repmat(sx0'*sx0,[1 1 k]);
0479 end
0480 if l==0                                                 % suppress the first three output arguments if l==0
0481     m=g;
0482     v=f;
0483     w=pp;
0484 end
0485

Generated by m2html © 2003