Home > voicebox > gaussmix.m

# gaussmix

## PURPOSE

GAUSSMIX fits a gaussian mixture pdf to a set of data observations [m,v,w,g,f]=(x,c,l,m0,v0,w0,wx)

## SYNOPSIS

function [m,v,w,g,f,pp,gg]=gaussmix(x,c,l,m0,v0,w0,wx)

## DESCRIPTION

```GAUSSMIX fits a gaussian mixture pdf to a set of data observations [m,v,w,g,f]=(x,c,l,m0,v0,w0,wx)

Usage:
(1) [m,v,w]=gaussmix(x,[],[],k);    % create GMM with k mixtures and diagonal covariances
(2) [m,v,w]=gaussmix(x,[],[],k,'v);    % create GMM with k mixtures and full covariances

Inputs: n data values, k mixtures, p parameters, l loops

X(n,p)   Input data vectors, one per row.
c(1)     Minimum variance of normalized data (Use [] to take default value of 1/n^2)
L        The integer portion of l gives a maximum loop count. The fractional portion gives
an optional stopping threshold. Iteration will cease if the increase in
log likelihood density per data point is less than this value. Thus l=10.001 will
stop after 10 iterations or when the increase in log likelihood falls below
0.001.
As a special case, if L=0, then the first three outputs are omitted.
Use [] to take default value of 100.0001
M0       Number of mixtures required (or initial mixture means - see below)
V0       Initialization mode:
'f'    Initialize with K randomly selected data points [default]
'p'    Initialize with centroids and variances of random partitions
'k'    k-means algorithm ('kf' and 'kp' determine initialization)
'h'    k-harmonic means algorithm ('hf' and 'hp' determine initialization) [default]
's'    do not scale data during initialization to have equal variances
'm'    M0 contains the initial centres
'v'    full covariance matrices
Mode 'hf' [the default] generally gives the best results but 'f' is faster and often OK
W0(n,1)  Data point weights

Alternatively, initial values for M0, V0 and W0 can be given  explicitly:

M0(k,p)  Initial mixture means, one row per mixture.
V0(k,p)  Initial mixture variances, one row per mixture.
or V0(p,p,k)  one full-covariance matrix per mixture
W0(k,1)  Initial mixture weights, one per mixture. The weights should sum to unity.
WX(n,1)  Data point weights

Outputs: (Note that M, V and W are omitted if L==0)

M(k,p)   Mixture means, one row per mixture. (omitted if L==0)
V(k,p)   Mixture variances, one row per mixture. (omitted if L==0)
or V(p,p,k) if full covariance matrices in use (i.e. either 'v' option or V0(p,p,k) specified)
W(k,1)   Mixture weights, one per mixture. The weights will sum to unity. (omitted if L==0)
G       Average log probability of the input data points.
F        Fisher's Discriminant measures how well the data divides into classes.
It is the ratio of the between-mixture variance to the average mixture variance: a
high value means the classes (mixtures) are well separated.
PP(n,1)  Log probability of each data point
GG(l+1,1) Average log probabilities at the beginning of each iteration and at the end

The fitting procedure uses one of several initialization methods to create an initial guess
for the mixture centres and then uses the EM (expectation-maximization) algorithm to refine
the guess. Although the EM algorithm is deterministic, the initialization procedures use
random numbers and so the routine will not give identical answers if you call it multiple
times with the same input data.```

## CROSS-REFERENCE INFORMATION

This function calls:
• kmeanhar KMEANS Vector quantisation using K-harmonic means algorithm [X,G,XN,GG]=(D,K,L,E,X0)
• rnsubset RNSUBSET choose k distinct random integers from 1:n M=(K,N)
• v_kmeans V_KMEANS Vector quantisation using K-means algorithm [X,ESQ,J]=(D,K,X0,L)
• voicebox VOICEBOX set global parameters for Voicebox functions Y=(FIELD,VAL)
This function is called by:
• v_sigma Singularity in EGG by Multiscale Analysis (SIGMA) Algorithm

## SOURCE CODE

```0001 function [m,v,w,g,f,pp,gg]=gaussmix(x,c,l,m0,v0,w0,wx)
0002 %GAUSSMIX fits a gaussian mixture pdf to a set of data observations [m,v,w,g,f]=(x,c,l,m0,v0,w0,wx)
0003 %
0004 % Usage:
0005 %    (1) [m,v,w]=gaussmix(x,[],[],k);    % create GMM with k mixtures and diagonal covariances
0006 %    (2) [m,v,w]=gaussmix(x,[],[],k,'v);    % create GMM with k mixtures and full covariances
0007 %
0008 % Inputs: n data values, k mixtures, p parameters, l loops
0009 %
0010 %     X(n,p)   Input data vectors, one per row.
0011 %     c(1)     Minimum variance of normalized data (Use [] to take default value of 1/n^2)
0012 %     L        The integer portion of l gives a maximum loop count. The fractional portion gives
0013 %              an optional stopping threshold. Iteration will cease if the increase in
0014 %              log likelihood density per data point is less than this value. Thus l=10.001 will
0015 %              stop after 10 iterations or when the increase in log likelihood falls below
0016 %              0.001.
0017 %              As a special case, if L=0, then the first three outputs are omitted.
0018 %              Use [] to take default value of 100.0001
0019 %     M0       Number of mixtures required (or initial mixture means - see below)
0020 %     V0       Initialization mode:
0021 %                'f'    Initialize with K randomly selected data points [default]
0022 %                'p'    Initialize with centroids and variances of random partitions
0023 %                'k'    k-means algorithm ('kf' and 'kp' determine initialization)
0024 %                'h'    k-harmonic means algorithm ('hf' and 'hp' determine initialization) [default]
0025 %                's'    do not scale data during initialization to have equal variances
0026 %                'm'    M0 contains the initial centres
0027 %                'v'    full covariance matrices
0028 %              Mode 'hf' [the default] generally gives the best results but 'f' is faster and often OK
0029 %     W0(n,1)  Data point weights
0030 %
0031 %     Alternatively, initial values for M0, V0 and W0 can be given  explicitly:
0032 %
0033 %     M0(k,p)  Initial mixture means, one row per mixture.
0034 %     V0(k,p)  Initial mixture variances, one row per mixture.
0035 %      or V0(p,p,k)  one full-covariance matrix per mixture
0036 %     W0(k,1)  Initial mixture weights, one per mixture. The weights should sum to unity.
0037 %     WX(n,1)  Data point weights
0038 %
0039 % Outputs: (Note that M, V and W are omitted if L==0)
0040 %
0041 %     M(k,p)   Mixture means, one row per mixture. (omitted if L==0)
0042 %     V(k,p)   Mixture variances, one row per mixture. (omitted if L==0)
0043 %       or V(p,p,k) if full covariance matrices in use (i.e. either 'v' option or V0(p,p,k) specified)
0044 %     W(k,1)   Mixture weights, one per mixture. The weights will sum to unity. (omitted if L==0)
0045 %     G       Average log probability of the input data points.
0046 %     F        Fisher's Discriminant measures how well the data divides into classes.
0047 %              It is the ratio of the between-mixture variance to the average mixture variance: a
0048 %              high value means the classes (mixtures) are well separated.
0049 %     PP(n,1)  Log probability of each data point
0050 %     GG(l+1,1) Average log probabilities at the beginning of each iteration and at the end
0051 %
0052 % The fitting procedure uses one of several initialization methods to create an initial guess
0053 % for the mixture centres and then uses the EM (expectation-maximization) algorithm to refine
0054 % the guess. Although the EM algorithm is deterministic, the initialization procedures use
0055 % random numbers and so the routine will not give identical answers if you call it multiple
0056 % times with the same input data.
0057
0058 %  Bugs/Suggestions
0059 %     (1) Allow processing in chunks by outputting/reinputting an array of sufficient statistics
0060 %     (2) Other initialization options:
0061 %              'l'    LBG algorithm
0062 %              'm'    Move-means (dog-rabbit) algorithm
0063 %     (3) Allow updating of weights-only, not means/variances
0064 %     (4) Allow freezing of means and/or variances
0065
0066 %      Copyright (C) Mike Brookes 2000-2009
0067 %      Version: \$Id: gaussmix.m 10656 2018-07-16 20:58:55Z dmb \$
0068 %
0069 %   VOICEBOX is a MATLAB toolbox for speech processing.
0071 %
0072 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0073 %   This program is free software; you can redistribute it and/or modify
0075 %   the Free Software Foundation; either version 2 of the License, or
0076 %   (at your option) any later version.
0077 %
0078 %   This program is distributed in the hope that it will be useful,
0079 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0080 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0081 %   GNU General Public License for more details.
0082 %
0083 %   You can obtain a copy of the GNU General Public License from
0084 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0085 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0086 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0087 [n,p]=size(x); % n = number of training values, p = dimension of data vector
0088 wn=ones(n,1);
0089 memsize=voicebox('memsize');    % set memory size to use
0090 if isempty(c)
0091     c=1/n^2;
0092 else
0093     c=c(1);         % just to prevent legacy code failing
0094 end
0095 fulliv=0;           % initial variance is not full
0096 if isempty(l)
0097     l=100+1e-4;         % max loop count + stopping threshold
0098 end
0099 if nargin<5 || isempty(v0) || ischar(v0)             % no initial values specified for m0, v0, w0
0100     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0101     %  No initialvalues given, so we must use k-means or equivalent
0102     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0103     if nargin<6
0104         if nargin<5 || isempty(v0)
0105             v0='hf';                 % default initialization mode: hf
0106         end
0107         wx=wn;                      % no data point weights
0108     else
0109         wx=w0(:);                   % data point weights
0110     end
0111     wx=wx/sum(wx);
0112     if any(v0=='m')
0113         k=size(m0,1);
0114     else
0115         k=m0;
0116     end
0117     fv=any(v0=='v');                % full covariance matrices requested
0118     mx0=wx'*x;         % calculate mean of input data in each dimension
0119     vx0=wx'*x.^2-mx0.^2; % calculate variance of input data in each dimension
0120     sx0=sqrt(vx0);
0121     sx0(sx0==0)=1;      % do not divide by zero when scaling
0122     if n<=k                         % each data point can have its own mixture
0123         xs=(x-mx0(wn,:))./sx0(wn,:);          % scale the data
0124         m=xs(mod((1:k)-1,n)+1,:);   % just include all points several times
0125         v=zeros(k,p);               % will be set to floor later
0126         w=zeros(k,1);
0127         w(1:n)=1/n;
0128         if l>0
0129             l=0.1;                  % no point in iterating
0130         end
0131     else                            % more points than mixtures
0132         if any(v0=='s')
0133             xs=x;                   % do not scale data during initialization
0134         else
0135             xs=(x-mx0(wn,:))./sx0(wn,:);  % else scale now
0136             if any(v0=='m')
0137                 m=(m0-mx0(ones(k,1),:))./sx0(ones(k,1),:);  % scale specified means as well
0138             end
0139         end
0140         w=repmat(1/k,k,1);                      % all mixtures equally likely
0141         if any(v0=='k')                         % k-means initialization
0142             if any(v0=='m')
0143                 [m,e,j]=v_kmeans(xs,k,m);
0144             elseif any(v0=='p')
0145                 [m,e,j]=v_kmeans(xs,k,'p');
0146             else
0147                 [m,e,j]=v_kmeans(xs,k,'f');
0148             end
0149         elseif any(v0=='h')                     % k-harmonic means initialization
0150             if any(v0=='m')
0151                 [m,e,j]=kmeanhar(xs,k,[],4,m);
0152             else
0153                 if any(v0=='p')
0154                     [m,e,j]=kmeanhar(xs,k,[],4,'p');
0155                 else
0156                     [m,e,j]=kmeanhar(xs,k,[],4,'f');
0157                 end
0158             end
0159         elseif any(v0=='p')                     % Initialize using a random partition
0160             j=ceil(rand(n,1)*k);                % allocate to random clusters
0161             j(rnsubset(k,n))=1:k;               % but force at least one point per cluster
0162             for i=1:k
0163                 m(i,:)=mean(xs(j==i,:),1);
0164             end
0165         else
0166             if any(v0=='m')
0167                 m=m0;                           % use specified centres
0168             else
0169                 m=xs(rnsubset(k,n),:);          % Forgy initialization: sample k centres without replacement [default]
0170             end
0171             [e,j]=v_kmeans(xs,k,m,0);             % find out the cluster allocation
0172         end
0173         if any(v0=='s')
0174             xs=(x-mx0(wn,:))./sx0(wn,:);      % scale data now if not done previously
0175         end
0176         v=zeros(k,p);                   % diagonal covariances
0177         w=zeros(k,1);
0178         for i=1:k
0179             ni=sum(j==i);               % number assigned to this centre
0180             w(i)=(ni+1)/(n+k);          % weight of this mixture
0181             if ni
0182                 v(i,:)=sum((xs(j==i,:)-repmat(m(i,:),ni,1)).^2,1)/ni;
0183             else
0184                 v(i,:)=zeros(1,p);
0185             end
0186         end
0187     end
0188 else
0189     %%%%%%%%%%%%%%%%%%%%%%%%
0190     % use initial values given as input parameters
0191     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0192     if nargin<7
0193         wx=wn;              % no data point weights
0194     end
0195     wx=wx(:)/sum(wx); % normalize weights and force a column vector
0196     mx0=wx'*x;         % calculate mean of input data in each dimension
0197     vx0=wx'*x.^2-mx0.^2; % calculate variance of input data in each dimension
0198     sx0=sqrt(vx0);
0199     sx0(sx0==0)=1;      % do not divide by zero when scaling
0200     [k,p]=size(m0);
0201     xs=(x-mx0(wn,:))./sx0(wn,:);          % scale the data
0202     m=(m0-mx0(ones(k,1),:))./sx0(ones(k,1),:);          % and the means
0203     v=v0;
0204     w=w0;
0205     fv=ndims(v)>2 || size(v,1)>k;                       % full covariance matrix is supplied
0206     if fv
0207         mk=eye(p)==0;                                    % off-diagonal elements
0208         fulliv=any(v(repmat(mk,[1 1 k]))~=0);            % check if any are non-zero
0209         if ~fulliv
0210             v=reshape(v(repmat(~mk,[1 1 k])),p,k)'./repmat(sx0.^2,k,1);   % just pick out and scale the diagonal elements for now
0211         else
0212             v=v./repmat(sx0'*sx0,[1 1 k]);              % scale the full covariance matrix
0213         end
0214     end
0215 end
0216 if length(wx)~=n
0217     error('%d datapoints but %d weights',n,length(wx));
0218 end
0219 lsx=sum(log(sx0));
0220 xsw=xs.*repmat(wx,1,p); % weighted data points
0221 if ~fulliv          % initializing with diagonal covariance
0222     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0223     % Diagonal Covariance matrices  %
0224     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0225     v=max(v,c);         % apply the lower bound
0226     xs2=xs.^2.*repmat(wx,1,p);          % square and weight the data for variance calculations
0227
0228     % If data size is large then do calculations in chunks
0229
0230     nb=min(n,max(1,floor(memsize/(8*p*k))));    % chunk size for testing data points
0231     nl=ceil(n/nb);                  % number of chunks
0232     jx0=n-(nl-1)*nb;                % size of first chunk
0233
0234     im=repmat(1:k,1,nb); im=im(:);
0235     th=(l-floor(l))*n;
0236     sd=(nargout > 3*(l~=0)); % = 1 if we are outputting log likelihood values
0237     lp=floor(l)+sd;   % extra loop needed to calculate final G value
0238
0239     lpx=zeros(1,n);             % log probability of each data point
0240     wk=ones(k,1);
0241     wp=ones(1,p);
0242     wnb=ones(1,nb);
0243     wnj=ones(1,jx0);
0244
0245     % EM loop
0246
0247     g=0;                            % dummy initial value for comparison
0248     gg=zeros(lp+1,1);
0249     ss=sd;                          % initialize stopping count (0 or 1)
0250     for j=1:lp
0251         g1=g;                       % save previous log likelihood (2*pi factor omitted)
0252         m1=m;                       % save previous means, variances and weights
0253         v1=v;
0254         w1=w;
0255         vi=-0.5*v.^(-1);                % data-independent scale factor in exponent
0256         lvm=log(w)-0.5*sum(log(v),2);   % log of external scale factor (excluding -0.5*p*log(2pi) term)
0257
0258         % first do partial chunk (of length jx0)
0259
0260         jx=jx0;
0261         ii=1:jx;                        % indices of data points in this chunk
0262         kk=repmat(ii,k,1);              % kk(jx,k): one row per data point, one column per mixture
0263         km=repmat(1:k,1,jx);            % km(jx,k): one row per data point, one column per mixture
0264         py=reshape(sum((xs(kk(:),:)-m(km(:),:)).^2.*vi(km(:),:),2),k,jx)+lvm(:,wnj); % py(k,jx) pdf of each point with each mixture
0265         mx=max(py,[],1);                % mx(1,jx) find normalizing factor for each data point to prevent underflow when using exp()
0266         px=exp(py-mx(wk,:));            % find normalized probability of each mixture for each datapoint
0267         ps=sum(px,1);                   % total normalized likelihood of each data point
0268         px=px./ps(wk,:);                % relative mixture probabilities for each data point (columns sum to 1)
0269         lpx(ii)=log(ps)+mx;
0270         pk=px*wx(ii);                   % pk(k,1) effective fraction of data points for each mixture (could be zero due to underflow)
0271         sx=px*xsw(ii,:);
0272         sx2=px*xs2(ii,:);
0273         for il=2:nl                     % process the data points in chunks
0274             ix=jx+1;
0275             jx=jx+nb;                   % increment upper limit
0276             ii=ix:jx;                   % indices of data points in this chunk
0277             kk=repmat(ii,k,1);
0278             py=reshape(sum((xs(kk(:),:)-m(im,:)).^2.*vi(im,:),2),k,nb)+lvm(:,wnb);
0279             mx=max(py,[],1);            % find normalizing factor for each data point to prevent underflow when using exp()
0280             px=exp(py-mx(wk,:));        % find normalized probability of each mixture for each datapoint
0281             ps=sum(px,1);               % total normalized likelihood of each data point
0282             px=px./ps(wk,:);            % relative mixture probabilities for each data point (columns sum to 1)
0283             lpx(ii)=log(ps)+mx;
0284             pk=pk+px*wx(ii);            % pk(k,1) effective fraction of data points for each mixture (could be zero due to underflow)
0285             sx=sx+px*xsw(ii,:);
0286             sx2=sx2+px*xs2(ii,:);
0287         end
0288         g=lpx*wx;                       % total log probability summed over all data points
0289         gg(j)=g;                        % save log prob at each iteration
0290         w=pk;                       % normalize to get the weights
0291         if pk                           % if all elements of pk are non-zero
0292             m=sx./pk(:,wp);             % calculate mixture means
0293             v=sx2./pk(:,wp);            % and variances
0294         else
0295             wm=pk==0;                   % mask indicating mixtures with zero weights
0296             nz=sum(wm);                  % number of zero-weight mixtures
0297             [vv,mk]=sort(lpx);          % find the lowest probability data points
0298             m=zeros(k,p);               % initialize means and variances to zero (variances are floored later)
0299             v=m;
0300             m(wm,:)=xs(mk(1:nz),:);     % set zero-weight mixture means to worst-fitted data points
0301             w(wm)=1/n;                   % set these weights non-zero
0302             w=w*n/(n+nz);                % normalize so the weights sum to unity
0303             wm=~wm;                     % mask for non-zero weights
0304             m(wm,:)=sx(wm,:)./pk(wm,wp);  % recalculate means and variances for mixtures with a non-zero weight
0305             v(wm,:)=sx2(wm,:)./pk(wm,wp);
0306         end
0307         v=max(v-m.^2,c);                % apply floor to variances
0308         if g-g1<=th && j>1
0309             if ~ss, break; end  %  stop
0310             ss=ss-1;       % stop next time
0311         end
0312     end
0313     if sd && ~fv  % we need to calculate the final probabilities
0314         pp=lpx'-0.5*p*log(2*pi)-lsx;   % log of total probability of each data point
0315         gg=gg(1:j)-0.5*p*log(2*pi)-lsx;    % average log prob at each iteration
0316         g=gg(end);
0317         m=m1;       % back up to previous iteration
0318         v=v1;
0319         w=w1;
0320         mm=sum(m,1)/k;
0321         f=(m(:)'*m(:)-k*mm(:)'*mm(:))/sum(v(:));
0322     end
0323     if ~fv
0324         m=m.*sx0(ones(k,1),:)+mx0(ones(k,1),:);    % unscale means
0325         v=v.*repmat(sx0.^2,k,1);                % and variances
0326     else
0327         v1=v;
0328         v=zeros(p,p,k);
0329         mk=eye(p)==1;                           % mask for diagonal elements
0330         v(repmat(mk,[1 1 k]))=v1';              % set from v1
0331     end
0332 end
0333 if fv              % check if full covariance matrices were requested
0334     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0335     % Full Covariance matrices  %
0336     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0337     pl=p*(p+1)/2;
0338     lix=1:p^2;
0339     cix=repmat(1:p,p,1);
0340     rix=cix';
0341     lix(cix>rix)=[];                                        % index of lower triangular elements
0342     cix=cix(lix);                                           % index of lower triangular columns
0343     rix=rix(lix);                                           % index of lower triangular rows
0344     dix=find(rix==cix);
0345     lixi=zeros(p,p);
0346     lixi(lix)=1:pl;
0347     lixi=lixi';
0348     lixi(lix)=1:pl;                                        % reverse index to build full matrices
0349     v=reshape(v,p^2,k);
0350     v=v(lix,:)';                                            % lower triangular in rows
0351
0352     % If data size is large then do calculations in chunks
0353
0354     nb=min(n,max(1,floor(memsize/(24*p*k))));    % chunk size for testing data points
0355     nl=ceil(n/nb);                  % number of chunks
0356     jx0=n-(nl-1)*nb;                % size of first chunk
0357     %
0358     th=(l-floor(l))*n;
0359     sd=(nargout > 3*(l~=0)); % = 1 if we are outputting log likelihood values
0360     lp=floor(l)+sd;   % extra loop needed to calculate final G value
0361     %
0362     lpx=zeros(1,n);             % log probability of each data point
0363     wk=ones(k,1);
0364     wp=ones(1,p);
0365     wpl=ones(1,pl);             % 1 index for lower triangular matrix
0366     wnb=ones(1,nb);
0367     wnj=ones(1,jx0);
0368
0369     % EM loop
0370
0371     g=0;                        % dummy initial value for comparison
0372     gg=zeros(lp+1,1);
0373     ss=sd;                      % initialize stopping count (0 or 1)
0374     vi=zeros(p*k,p);            % stack of k inverse cov matrices each size p*p
0375     vim=zeros(p*k,1);           % stack of k vectors of the form inv(v)*m
0376     mtk=vim;                      % stack of k vectors of the form m
0377     lvm=zeros(k,1);
0378     wpk=repmat((1:p)',k,1);
0379     for j=1:lp
0380         g1=g;                   % save previous log likelihood (2*pi factor omitted)
0381         m1=m;                    % save previous means, variances and weights
0382         v1=v;
0383         w1=w;
0384         for ik=1:k
0385
0386             % these lines added for debugging only
0387             %             vk=reshape(v(k,lixi),p,p);
0388             %             condk(ik)=cond(vk);
0389             %%%%%%%%%%%%%%%%%%%%
0390             [uvk,dvk]=eig(reshape(v(ik,lixi),p,p));    % convert lower triangular to full and find eigenvalues
0391             dvk=max(diag(dvk),c);                    % apply variance floor to eigenvalues
0392             vik=-0.5*uvk*diag(dvk.^(-1))*uvk';      % calculate inverse
0393             vi((ik-1)*p+(1:p),:)=vik;               % vi contains all mixture inverses stacked on top of each other
0394             vim((ik-1)*p+(1:p))=vik*m(ik,:)';       % vim contains vi*m for all mixtures stacked on top of each other
0395             mtk((ik-1)*p+(1:p))=m(ik,:)';           % mtk contains all mixture means stacked on top of each other
0396             lvm(ik)=log(w(ik))-0.5*sum(log(dvk));       % vm contains the weighted sqrt of det(vi) for each mixture
0397         end
0398         %
0399         %         % first do partial chunk
0400         %
0401         jx=jx0;
0402         ii=1:jx;
0403         xii=xs(ii,:).';
0404         py=reshape(sum(reshape((vi*xii-vim(:,wnj)).*(xii(wpk,:)-mtk(:,wnj)),p,jx*k),1),k,jx)+lvm(:,wnj);
0405         mx=max(py,[],1);                % find normalizing factor for each data point to prevent underflow when using exp()
0406         px=exp(py-mx(wk,:));            % find normalized probability of each mixture for each datapoint
0407         ps=sum(px,1);                   % total normalized likelihood of each data point
0408         px=px./ps(wk,:);                % relative mixture probabilities for each data point (columns sum to 1)
0409         lpx(ii)=log(ps)+mx;
0410         pk=px*wx(ii);                       % effective fraction of data points for each mixture (could be zero due to underflow)
0411         sx=px*xsw(ii,:);
0412         sx2=px*(xsw(ii,rix).*xs(ii,cix));    % accumulator for variance calculation (lower tri cov matrix as a row)
0413         for il=2:nl
0414             ix=jx+1;
0415             jx=jx+nb;        % increment upper limit
0416             ii=ix:jx;
0417             xii=xs(ii,:).';
0418             py=reshape(sum(reshape((vi*xii-vim(:,wnb)).*(xii(wpk,:)-mtk(:,wnb)),p,nb*k),1),k,nb)+lvm(:,wnb);
0419             mx=max(py,[],1);                % find normalizing factor for each data point to prevent underflow when using exp()
0420             px=exp(py-mx(wk,:));            % find normalized probability of each mixture for each datapoint
0421             ps=sum(px,1);                   % total normalized likelihood of each data point
0422             px=px./ps(wk,:);                % relative mixture probabilities for each data point (columns sum to 1)
0423             lpx(ii)=log(ps)+mx;
0424             pk=pk+px*wx(ii);                    % effective fraction of data points for each mixture (could be zero due to underflow)
0425             sx=sx+px*xsw(ii,:);             % accumulator for mean calculation
0426             sx2=sx2+px*(xsw(ii,rix).*xs(ii,cix));    % accumulator for variance calculation
0427         end
0428         g=lpx*wx;                   % total log probability summed over all data points
0429         gg(j)=g;                    % save convergence history
0430         w=pk;                   % w(k,1) normalize to get the column of weights
0431         if pk                       % if all elements of pk are non-zero
0432             m=sx./pk(:,wp);         % find mean and mean square
0433             v=sx2./pk(:,wpl);
0434         else
0435             wm=pk==0;                       % mask indicating mixtures with zero weights
0436             nz=sum(wm);                  % number of zero-weight mixtures
0437             [vv,mk]=sort(lpx);             % find the lowest probability data points
0438             m=zeros(k,p);                   % initialize means and variances to zero (variances are floored later)
0439             v=zeros(k,pl);
0440             m(wm,:)=xs(mk(1:nz),:);                % set zero-weight mixture means to worst-fitted data points
0441             w(wm)=1/n;                      % set these weights non-zero
0442             w=w*n/(n+nz);                   % normalize so the weights sum to unity
0443             wm=~wm;                         % mask for non-zero weights
0444             m(wm,:)=sx(wm,:)./pk(wm,wp);  % recalculate means and variances for mixtures with a non-zero weight
0445             v(wm,:)=sx2(wm,:)./pk(wm,wpl);
0446         end
0447         v=v-m(:,cix).*m(:,rix);                 % subtract off mean squared
0448         if g-g1<=th && j>1
0449             if ~ss, break; end  %  stop
0450             ss=ss-1;       % stop next time
0451         end
0452     end
0453     if sd  % we need to calculate the final probabilities
0454         pp=lpx'-0.5*p*log(2*pi)-lsx;   % log of total probability of each data point
0455         gg=gg(1:j)-0.5*p*log(2*pi)-lsx;    % average log prob at each iteration
0456         g=gg(end);
0457         %             gg' % *** DEBUG ONLY ***
0458         m=m1;                                           % back up to previous iteration
0459         v=zeros(p,p,k);                                 % reserve spave for k full covariance matrices
0460         trv=0;                                          % sum of variance matrix traces
0461         for ik=1:k                                      % loop for each mixture to apply variance floor
0462             [uvk,dvk]=eig(reshape(v1(ik,lixi),p,p));    % convert lower triangular to full and find eigenvectors
0463             dvk=max(diag(dvk),c);                       % apply variance floor to eigenvalues
0464             v(:,:,ik)=uvk*diag(dvk)*uvk';               % reconstitute full matrix
0465             trv=trv+sum(dvk);                           % add trace to the sum
0466         end
0467         w=w1;
0468         mm=sum(m,1)/k;
0469         f=(m(:)'*m(:)-k*mm(:)'*mm(:))/trv;
0470     else
0471         v1=v;                                           % lower triangular form
0472         v=zeros(p,p,k);                                 % reserve spave for k full covariance matrices
0473         for ik=1:k                                      % loop for each mixture to apply variance floor
0474             [uvk,dvk,]=eig(reshape(v1(ik,lixi),p,p));    % convert lower triangular to full and find eigenvectors
0475             dvk=max(diag(dvk),c);                       % apply variance floor
0476             v(:,:,ik)=uvk*diag(dvk)*uvk';               % reconstitute full matrix
0477         end
0478     end
0479     m=m.*sx0(ones(k,1),:)+mx0(ones(k,1),:);  % unscale means
0480     v=v.*repmat(sx0'*sx0,[1 1 k]);
0481 end
0482 if l==0         % suppress the first three output arguments if l==0
0483     m=g;
0484     v=f;
0485     w=pp;
0486 end
0487```

Generated on Mon 06-Aug-2018 14:48:32 by m2html © 2003