Home > voicebox > dypsa.m

dypsa

PURPOSE ^

DYPSA Derive glottal closure instances from speech [gci,goi] = (s,fs)

SYNOPSIS ^

function [gci,goi] = dypsa(s,fs)

DESCRIPTION ^

DYPSA   Derive glottal closure instances from speech [gci,goi] = (s,fs)
   Note: Needs to be combined with a voiced-voiceless detector to eliminate
   spurious closures in unvoiced and silent regions.

   Inputs:
   s     is the speech signal
   fs    is the sampling frequncy

   Outputs:
   gci   is a vector of glottal closure sample numbers
   gco   is a vector of glottal opening sample numbers derived from
         an assumed constant closed-phase fraction

   References:
   [1]  P. A. Naylor, A. Kounoudes, J. Gudnason, and M. Brookes, “Estimation of Glottal Closure
        Instants in Voiced Speech using the DYPSA Algorithm,” IEEE Trans on Speech and Audio
        Processing, vol. 15, pp. 34–43, Jan. 2007.
   [2]  M. Brookes, P. A. Naylor, and J. Gudnason, “A Quantitative Assessment of Group Delay Methods
        for Identifying Glottal Closures in Voiced Speech,” IEEE Trans on Speech & Audio Processing,
        vol. 14, no. 2, pp. 456–466, Mar. 2006.
   [3]  A. Kounoudes, P. A. Naylor, and M. Brookes, “The DYPSA algorithm for estimation of glottal
        closure instants in voiced speech,” in Proc ICASSP 2002, vol. 1, Orlando, 2002, pp. 349–352.
   [4]  C. Ma, Y. Kamp, and L. F. Willems, “A Frobenius norm approach to glottal closure detection
        from the speech signal,” IEEE Trans. Speech Audio Processing, vol. 2, pp. 258–265, Apr. 1994.
   [5]  A. Kounoudes, “Epoch Estimation for Closed-Phase Analysis of Speech,” PhD Thesis,
        Imperial College, 2001.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [gci,goi] = dypsa(s,fs)
0002 %DYPSA   Derive glottal closure instances from speech [gci,goi] = (s,fs)
0003 %   Note: Needs to be combined with a voiced-voiceless detector to eliminate
0004 %   spurious closures in unvoiced and silent regions.
0005 %
0006 %   Inputs:
0007 %   s     is the speech signal
0008 %   fs    is the sampling frequncy
0009 %
0010 %   Outputs:
0011 %   gci   is a vector of glottal closure sample numbers
0012 %   gco   is a vector of glottal opening sample numbers derived from
0013 %         an assumed constant closed-phase fraction
0014 %
0015 %   References:
0016 %   [1]  P. A. Naylor, A. Kounoudes, J. Gudnason, and M. Brookes, “Estimation of Glottal Closure
0017 %        Instants in Voiced Speech using the DYPSA Algorithm,” IEEE Trans on Speech and Audio
0018 %        Processing, vol. 15, pp. 34–43, Jan. 2007.
0019 %   [2]  M. Brookes, P. A. Naylor, and J. Gudnason, “A Quantitative Assessment of Group Delay Methods
0020 %        for Identifying Glottal Closures in Voiced Speech,” IEEE Trans on Speech & Audio Processing,
0021 %        vol. 14, no. 2, pp. 456–466, Mar. 2006.
0022 %   [3]  A. Kounoudes, P. A. Naylor, and M. Brookes, “The DYPSA algorithm for estimation of glottal
0023 %        closure instants in voiced speech,” in Proc ICASSP 2002, vol. 1, Orlando, 2002, pp. 349–352.
0024 %   [4]  C. Ma, Y. Kamp, and L. F. Willems, “A Frobenius norm approach to glottal closure detection
0025 %        from the speech signal,” IEEE Trans. Speech Audio Processing, vol. 2, pp. 258–265, Apr. 1994.
0026 %   [5]  A. Kounoudes, “Epoch Estimation for Closed-Phase Analysis of Speech,” PhD Thesis,
0027 %        Imperial College, 2001.
0028 
0029 % Algorithm Parameters
0030 %       The following parameters are defined in voicebox()
0031 %
0032 %   dy_cpfrac=0.3;           % presumed closed phase fraction of larynx cycle
0033 %   dy_cproj=0.2;            % cost of projected candidate
0034 %   dy_cspurt=-0.45;         % cost of a talkspurt
0035 %   dy_dopsp=1;              % Use phase slope projection (1) or not (0)?
0036 %   dy_ewdly=0.0008;         % window delay for energy cost function term [~ energy peak delay from closure] (sec)
0037 %   dy_ewlen=0.003;          % window length for energy cost function term (sec)
0038 %   dy_ewtaper=0.001;        % taper length for energy cost function window (sec)
0039 %   dy_fwlen=0.00045;        % window length used to smooth group delay (sec)
0040 %   dy_fxmax=500;            % max larynx frequency (Hz)
0041 %   dy_fxmin=50;             % min larynx frequency (Hz)
0042 %   dy_fxminf=60;            % min larynx frequency (Hz) [used for Frobenius norm only]
0043 %   dy_gwlen=0.0030;         % group delay evaluation window length (sec)
0044 %   dy_lpcdur=0.020;         % lpc analysis frame length (sec)
0045 %   dy_lpcn=2;               % lpc additional poles
0046 %   dy_lpcnf=0.001;          % lpc poles per Hz (1/Hz)
0047 %   dy_lpcstep=0.010;        % lpc analysis step (sec)
0048 %   dy_nbest=5;              % Number of NBest paths to keep
0049 %   dy_preemph=50;           % pre-emphasis filter frequency (Hz) (to avoid preemphasis, make this very large)
0050 %   dy_spitch=0.2;           % scale factor for pitch deviation cost
0051 %   dy_wener=0.3;            % DP energy weighting
0052 %   dy_wpitch=0.5;           % DP pitch weighting
0053 %   dy_wslope=0.1;           % DP group delay slope weighting
0054 %   dy_wxcorr=0.8;           % DP cross correlation weighting
0055 %   dy_xwlen=0.01;           % cross-correlation length for waveform similarity (sec)
0056 
0057 %   Revision History:
0058 %
0059 %   3.0 - 29 Jun 2006  - Rewrote DP function to improve speed
0060 %   2.6 - 29 Jun 2006  - Tidied up algorithm parameters
0061 %   2.4 - 10 Jun 2006  - Made into a single file aand put into VOICEBOX
0062 %   2.3 - 18 Mar 2005  - Removed 4kHz filtering of phase-slope function
0063 %   2.2 - 05 Oct 2004  -  dpgci uses the slopes returned from xewgrdel
0064 %                      -  gdwav from speech with fs<9000 is not filtered
0065 %                      -  Various outputs and inputs of functions have been
0066 %                         removed since now there is no plotting
0067 %   1.0 - 30 Jan 2001  - Initial version [5]
0068 
0069 %   Bugs:
0070 %         1. Allow the projections only to extend to the end of the larynx cycle
0071 %         2. Compensate for false pitch period cost at the start of a voicespurt
0072 %         3. Should include energy and pahse-slope costs for the first closeure of a voicespurt
0073 %         4. should delete candidates that are too close to the beginning or end of speech for the cost measures
0074 %            currently this is 200 samples fixed in the main routine but it should adapt to window lengths of
0075 %            cross-correlation, lpc and energy measures.
0076 %         5. Should have an integrated voiced/voiceless detector
0077 %         6. Allow dypsa to be called in chunks for a long speech file
0078 %         7. Do forward & backward passes to allow for gradient descent and/or discriminative training
0079 %         8. Glottal opening approximation does not really belong in this function
0080 %         9. The cross correlation window is asymmetric (and overcomplex) for no very good reason
0081 %        10. Incorporate -0.5 factor into dy_wxcorr and abolish erroneous (nx2-1)/(nx2-2) factor
0082 %        11. Add in talkspurt cost at the beginning rather than the end of a spurt (more efficient)
0083 %        12. Remove qmin>2 condition from voicespurt start detection (DYPSA 2 compatibility) in two places
0084 %        13. Include energy and phase-slope costs at the start of a voicespurt
0085 %        14. Single-closure voicespurt are only allowed if nbest=1 (should always be forbidden)
0086 %        15. Penultimate closure candidate is always acceptd
0087 %        16. Final element of gcic, Cfn and Ch is unused
0088 %        17. Needs to cope better with irregular voicing (e.g. creaky voice)
0089 %        18. Should give non-integer GCI positions for improved accuracy
0090 %        19. Remove constraint that first voicespurt cannot begin until qrmax after the first candidate
0091 
0092 %      Copyright (C) Tasos Kounoudes, Jon Gudnason, Patrick Naylor and Mike Brookes 2006
0093 %      Version: $Id: dypsa.m 713 2011-10-16 14:45:43Z dmb $
0094 %
0095 %   VOICEBOX is a MATLAB toolbox for speech processing.
0096 %   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
0097 %
0098 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0099 %   This program is free software; you can redistribute it and/or modify
0100 %   it under the terms of the GNU General Public License as published by
0101 %   the Free Software Foundation; either version 2 of the License, or
0102 %   (at your option) any later version.
0103 %
0104 %   This program is distributed in the hope that it will be useful,
0105 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0106 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0107 %   GNU General Public License for more details.
0108 %
0109 %   You can obtain a copy of the GNU General Public License from
0110 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0111 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0112 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0113 
0114 % Extract algorithm constants from VOICEBOX
0115 
0116 
0117 dy_preemph=voicebox('dy_preemph');
0118 dy_lpcstep=voicebox('dy_lpcstep');
0119 dy_lpcdur=voicebox('dy_lpcdur');
0120 dy_dopsp=voicebox('dy_dopsp');              % Use phase slope projection (1) or not (0)?
0121 dy_ewtaper=voicebox('dy_ewtaper');        % Prediction order of FrobNorm method  in seconds
0122 dy_ewlen=voicebox('dy_ewlen');        % windowlength of FrobNorm method  in seconds
0123 dy_ewdly=voicebox('dy_ewdly');        % shift for assymetric speech shape at start of voiced cycle
0124 dy_cpfrac=voicebox('dy_cpfrac');        % presumed ratio of larynx cycle that is closed
0125 dy_lpcnf=voicebox('dy_lpcnf');          % lpc poles per Hz (1/Hz)
0126 dy_lpcn=voicebox('dy_lpcn');            % lpc additional poles
0127 
0128 lpcord=ceil(fs*dy_lpcnf+dy_lpcn);       % lpc poles
0129 
0130 %PreEmphasise input speech
0131 s_used=filter([1 -exp(-2*pi*dy_preemph/fs)],1,s);
0132 
0133 % perform LPC analysis, AC method with Hamming windowing
0134 [ar, e, k] = lpcauto(s_used,lpcord,floor([dy_lpcstep dy_lpcdur]*fs));
0135 
0136 if any(any(isinf(ar)))    % if the data is bad and gives infinite prediction coefficients we return with a warning
0137     warning('No GCIs returned');
0138     gci=[];
0139     return;
0140 end;
0141 
0142 % compute the prediction residual
0143 r = lpcifilt(s_used,ar,k); 
0144 
0145 % compute the group delay function:  EW method from reference [2] above
0146 [zcr_cand,sew,gdwav,toff]=xewgrdel(r,fs); 
0147 gdwav=-[zeros(toff,1); gdwav(1:end-toff)];
0148 zcr_cand=[round(zcr_cand), ones(size(zcr_cand))];   %flag zero crossing candidates with ones
0149 
0150 sew=0.5+sew';  %the phase slope cost of each candidate
0151 
0152 pro_cand=[];
0153 if dy_dopsp ~= 0
0154     pro_cand = psp(gdwav,fs);
0155     pro_cand = [pro_cand, zeros(length(pro_cand),1)]; %flag projected candidates with zeros
0156     sew =      [sew zeros(1,size(pro_cand,1))];      %the phase slope cost of a projected candidate is zero
0157 end;
0158 
0159 %Sort the zero crossing and projected candidates together and remove any candidates that
0160 %are within 200 samples from the speech boundary
0161 [gcic,sin] = sortrows([zcr_cand; pro_cand],1);  
0162 sew=sew(sin);
0163 sin=find(and(200<gcic,gcic<length(gdwav)-200));
0164 gcic=gcic(sin,:);
0165 sew=sew(sin);
0166 
0167 % compute the frobenious norm function used for a cost in the DP
0168 fnwav=frobfun(s_used,dy_ewtaper*fs,dy_ewlen*fs,dy_ewdly*fs);
0169 
0170 %Dynamic programming, picks the most likely candidates based on the pitch consistency, energy etc.
0171 [gci] = dpgci(gcic, s_used(:), sew, fnwav, fs);
0172 
0173 %Evaluate goi ... determined to be dy_cpfrac percentage of the larynx cylce away from the last gci
0174 goi=simplegci2goi(gci,dy_cpfrac);
0175 
0176 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0177 
0178 function z = psp(g,fs)
0179 %PSP  Calculates the phase slope projections of the group delay function
0180 %   Z = PSP(G) computes the
0181 
0182 %   Revision History:
0183 %       V1.0 July 12th 2002:
0184 %            Nov. 6th 2002 : if statement added to remove "last" midpoint
0185 
0186 g = g(:);
0187 
0188 gdot = [diff(g);0];
0189 gdotdot = [diff(gdot);0];
0190 
0191 % find the turning points  as follows: [tp_number, index_of_tp, min(1) or max(-1), g(index_of_tp)]
0192 turningPoints = zcr(gdot);
0193 turningPoints = [[1:length(turningPoints)]', turningPoints, sign(gdotdot(turningPoints)), g(turningPoints)];
0194 
0195 % useful for debug/plotting
0196 %tplot = zeros(length(g),1);
0197 %tplot(turningPoints(:,1)) = turningPoints(:,2);
0198 
0199 % find any maxima which are < 0
0200 negmaxima = turningPoints(find(turningPoints(:,3) == -1 & turningPoints(:,4) < 0 & turningPoints(:,1)~=1),:);  %Change 01.05.2003 JG: The first row can't be included
0201 
0202 % find the midpoint between the preceding min and the negative max
0203 nmi = negmaxima(:,1);
0204 midPointIndex = turningPoints(nmi-1,2) + round(0.5*(turningPoints(nmi,2) - turningPoints(nmi-1,2)));
0205 midPointValue = g(midPointIndex);
0206 
0207 % project a zero crossing with unit slope
0208 nz = midPointIndex - round(midPointValue);
0209 
0210 % find any minima which are > 0
0211 posminima = turningPoints(find(turningPoints(:,3) == 1 & turningPoints(:,4) > 0),:);
0212 
0213 % find the midpoint between the positive min and the following max
0214 pmi = posminima(:,1); 
0215 
0216 %Remove last midpoint if it is the last sample
0217 if ~isempty(pmi), if pmi(end)==size(turningPoints,1), pmi=pmi(1:end-1); end; end;
0218 
0219 midPointIndex = turningPoints(pmi,2) + round(0.5*(turningPoints(pmi+1,2) - turningPoints(pmi,2)));
0220 midPointValue = g(midPointIndex);
0221 
0222 % project a zero crossing with unit slope
0223 pz = midPointIndex - round(midPointValue);
0224 
0225 z = sort([nz;pz]);
0226 
0227 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0228 
0229 function i = zcr(x, p)
0230 %ZCR  Finds the indices in a vector to  zero crossings
0231 %   I = ZCR(X) finds the indices of vector X which are closest to zero-crossings.
0232 %   I = ZCR(X, P) finds indices for positive-going zeros-crossings for P=1 and
0233 %   negative-going zero-crossings for P=0.
0234 
0235 x = x(:);
0236 
0237 if (nargin==2)
0238     if (p==0) 
0239         z1 = zcrp(x);   % find positive going zero-crossings
0240     elseif (p==1) 
0241         z1 = zcrp(-x);  % find negative going zero-crossings
0242     else
0243         error('ZCR: invalid input parameter 2: must be 0 or 1');
0244     end
0245 else
0246     z1 = [zcrp(x); zcrp(-x)];
0247 end
0248 
0249 % find crossings when x==0 exactly
0250 z0 = find( (x(1:length(x))==0) & ([x(2:length(x));0] ~= 0));
0251 
0252 % concatenate and sort the two types of zero-crossings
0253 i = sort([z0; z1]);
0254 
0255 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0256 
0257 function zz = zcrp(xx)  %only used in zcr
0258 % find positive-going zero-crossing
0259 z1 = find(diff(sign(xx)) == -2);
0260 % find which out of current sample or next sample is closer to zero
0261 [m, z2] = min([abs(xx(z1)), abs(xx(z1+1))], [], 2);
0262 zz =  z1 -1 + z2;
0263 
0264 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0265 
0266 function [frob]=frobfun(sp,p,m,offset)
0267 
0268 % [frob]=frobfun(sp,p,m)
0269 %
0270 % sp is the speech signal assumed to be preemphasised
0271 % p  is the prediction order  : recomended to be 1 ms in above paper
0272 % m  is the window length     : recomended to be 1 ms in above paper
0273 % offset is shift for assymetric speech shape at start of voiced cycle -
0274 % default 1.5ms.
0275 %
0276 % This function implements the frobenius norm based measure C defined in [4] below.
0277 % It equals the square of the Frobenius norm of the m by p+1 data matrix divided by p+1
0278 %
0279 % Reference:
0280 %   [4]  C. Ma, Y. Kamp, and L. F. Willems, “A Frobenius norm approach to glottal closure detection
0281 %        from the speech signal,” IEEE Trans. Speech Audio Processing, vol. 2, pp. 258–265, Apr. 1994.
0282 
0283 
0284 %   Revision History:
0285 %       V1.0 July 12th 2002:
0286 %            Nov. 6th 2002 : if statement added to remove "last" midpoint
0287 
0288 %force p m and offset to be integers
0289 p=round(p);
0290 m=round(m);
0291 offset=round(offset);
0292 
0293 w=(p+1)*ones(1,m+p);
0294 w(1:p)=1:p;
0295 w(m+1:p+m)=p:-1:1;
0296 
0297 w=w./(p+1); 
0298 frob=filter(w,1,sp.^2);
0299 frob(1:(round((p+m-1)/2) + offset))=[];
0300 
0301 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0302 
0303 function goi=simplegci2goi(gci,pr)
0304 
0305 % Estimate glottal opening instants by assuming a fixed closed-phase fraction
0306 
0307 gci=round(gci);
0308 maxpitch=max(medfilt1(diff(gci),7));
0309 
0310 % calculate opening instants
0311 for kg=1:length(gci)-1
0312     goi(kg)=gci(kg)+min(pr*(gci(kg+1)-gci(kg)),pr*maxpitch);
0313 end;
0314 kg=kg+1;
0315 goi(kg)=round(gci(kg)+pr*(gci(kg)-gci(kg-1)));  %use the previous pitch period instead
0316 goi=round(goi);
0317 
0318 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0319 
0320 function [tew,sew,y,toff]=xewgrdel(u,fs)
0321 
0322 % implement EW group delay epoch extraction
0323 
0324 dy_gwlen=voicebox('dy_gwlen');          % group delay evaluation window length
0325 dy_fwlen=voicebox('dy_fwlen');          % window length used to smooth group delay
0326 
0327 % perform group delay calculation
0328 
0329 gw=2*floor(dy_gwlen*fs/2)+1;            % force window length to be odd
0330 ghw=window('hamming',gw,'s');
0331 ghw = ghw(:);                           % force to be a column (dmb thinks window gives a row - and he should know as he wrote it!)
0332 ghwn=ghw'.*(gw-1:-2:1-gw)/2;            % weighted window: zero in middle
0333 
0334 u2=u.^2;
0335 yn=filter(ghwn,1,u2);
0336 yd=filter(ghw,1,u2);
0337 yd(abs(yd)<eps)=10*eps;                 % prevent infinities
0338 y=yn(gw:end)./yd(gw:end);               % delete filter startup transient
0339 toff=(gw-1)/2;
0340 fw=2*floor(dy_fwlen*fs/2)+1;            % force window length to be odd
0341 if fw>1
0342     daw=window('hamming',fw,'s');
0343     y=filter(daw,1,y)/sum(daw);         % low pass filter
0344     toff=toff-(fw-1)/2;
0345 end
0346 [tew,sew]=zerocros(y,'n');              % find zero crossings
0347 
0348 tew=tew+toff;                           % compensate for filter delay and frame advance
0349 
0350 
0351 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0352 
0353 function Cfn=fnrg(gcic,frob,fs)
0354 
0355 %Frobenious Energy Cost
0356 
0357 dy_fxminf=voicebox('dy_fxminf');
0358 frob=frob(:)';
0359 mm=round(fs/dy_fxminf);
0360 mfrob=maxfilt(frob,1,mm);
0361 mfrob=[mfrob(floor(mm/2)+1:end) max(frob(end-ceil(mm/2):end))*ones(1,floor(mm/2))];
0362 rfr=frob./mfrob;
0363 Cfn=0.5-rfr(round(gcic));
0364 
0365 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0366 
0367 function gci=dpgci(gcic, s, Ch, fnwav, fs)
0368 
0369 %DPGCI   Choose the best Glottal Closure Instances with Dynamic Programming
0370 %   gci=dpgci(gcic, s(:), Ch, fnwav, fs) returns vectors of sample indices corresponding
0371 %   to the instants of glottal closure in the speech signal s at sampling frequency fs Hz.
0372 %
0373 %   Inputs:
0374 %   gcic    is a matrix whos first column are the glottal closure instance candidates and
0375 %           the second column is 1 if the corresponding gci is derived from a zero crossing
0376 %           but zero if the gci is from a a projected zero crossing
0377 %   s       is the speech signal - MUST be a column vector
0378 %   Ch      the phase slope cost of every candidate
0379 %   fnwav   is the frobenious norm function of s
0380 %   fs      is the sampling frequncy
0381 %
0382 %   Outputs:
0383 %   gci     is a vector of glottal closure instances chosen by the DP
0384 
0385 
0386 
0387 %   Revision History:
0388 %   Bugs:  Constants are hardwired but defined in a structure like pv (defined in grpdelpv)
0389 %
0390 
0391 % get algorithm parameters from voicebox()
0392 
0393 dy_fxmin=voicebox('dy_fxmin');        % min larynx frequency (Hz)
0394 dy_fxmax=voicebox('dy_fxmax');        % min larynx frequency (Hz)
0395 dy_xwlen=voicebox('dy_xwlen');        % cross-correlation length for waveform similarity (sec)
0396 dy_nbest=voicebox('dy_nbest');        % Number of NBest paths to keep
0397 dy_spitch=voicebox('dy_spitch');              % scale factor for pitch deviation cost
0398 wproj=voicebox('dy_cproj');           % cost of projected candidate
0399 dy_cspurt=voicebox('dy_cspurt');           % cost of a talkspurt
0400 dy_wpitch=voicebox('dy_wpitch');           % DP pitch weighting
0401 dy_wener=voicebox('dy_wener');           % DP energy weighting
0402 dy_wslope=voicebox('dy_wslope');           % DP group delay slope weighting
0403 dy_wxcorr=voicebox('dy_wxcorr');           % DP cross correlation weighting
0404 
0405 
0406 
0407 %Constants
0408 Ncand=length(gcic);
0409 sv2i=-(2*dy_spitch^2)^(-1);              % scale factor for pitch deviation cost
0410 
0411 %Limit the search:
0412 qrmin=ceil(fs/dy_fxmax);
0413 qrmax=floor(fs/dy_fxmin);
0414 
0415 %Cost and tracking r = current, q = previous, p = preprevious
0416 cost=zeros(Ncand, dy_nbest); cost(:,:)=inf;    %Cost matrix, one row for each candidate
0417 maxcost=zeros(Ncand,1); maxcost(:,:)=inf;   %Maximum cost in each row
0418 imaxcost=ones(Ncand,1);                     %Index of maximum cost
0419 
0420 prev = ones(Ncand, dy_nbest);                  %index of previous, q candidates
0421 ind = ones(Ncand, dy_nbest);                   %index of p in row q (from prev)
0422 qbest = [zeros(Ncand,1), ones(Ncand,2)]; % the minimum cost in any previous q [cost,q,i]
0423 
0424 Cfn=fnrg(gcic(:,1),fnwav,fs);  %Frob.Energy Cost
0425 
0426 %Add start and end state
0427 % === should probably delete candidates that are too close to either end of the input
0428 % === why do we ever need the additional one at the tail end ?
0429 gcic=[[gcic(1,1)-qrmax-2 0];gcic;[gcic(end,1)+qrmax+2 0]];
0430 Cfn=[0 Cfn 0];
0431 Ch = [0 Ch 0];
0432 
0433 % first do parallelized version
0434 
0435 nxc=ceil(dy_xwlen*fs);       % cross correlation window length in samples
0436 % === should delete any gci's that are within this of the end.
0437 % === and for energy window
0438 % rather complicated window specification is for compatibility with DYPSA 2
0439 % === +1 below is for compatibility - probably a bug
0440 wavix=(-floor(nxc/2):floor(nxc/2)+1)';                 % indexes for segments [nx2,1]
0441 nx2=length(wavix);
0442 sqnx2=sqrt(nx2);
0443 g_cr=dy_wener*Cfn+dy_wslope*Ch+wproj*(1-gcic(:,2))';  % fixed costs
0444 
0445 g_n=gcic(:,1)';                  % gci sample number [1,Ncand+2]
0446 g_pr=gcic(:,2)';                 % unprojected flag [1,Ncand+2]
0447 g_sqm=zeros(1,Ncand+1);         % stores: sqrt(nx2) * mean for speech similarity waveform
0448 g_sd=zeros(1,Ncand+1);         % stores: 1/(Std deviation * sqrt(nx2)) for speech similarity waveform
0449 f_pq=zeros((Ncand+1)*dy_nbest,1);   % (q-p) period for each node
0450 f_c=repmat(Inf,(Ncand+1)*dy_nbest,1);    % cumulative cost for each node - initialise to inf
0451 f_c(1)=0;                       % initial cost of zero for starting node
0452 % f_costs=zeros(Ncand*dy_nbest,6);   % === debugging only remember costs of candidate
0453 f_f=ones((Ncand+1)*dy_nbest,1);    % previous node in path
0454 f_fb=ones((Ncand+1),1);    % points back to best end-of-spurt node
0455 fbestc=0;                       % cost of best end-of-spurt node
0456 
0457 qmin=2;
0458 for r=2:Ncand+1   
0459 %     if r==86
0460 %         r;
0461 %     end
0462     r_n=g_n(r);             % sample number of r = current candidate
0463     rix=dy_nbest*(r-1)+(1:dy_nbest);    % index range within node variables
0464     
0465     % determine the range of feasible q candidates
0466     qmin0=qmin;
0467     qmin=find(g_n(qmin0-1:r-1)<r_n-qrmax);      % qmin is the nearest candidate that is >qrmax away
0468     qmin=qmin(end)+qmin0-1;             % convert to absolute index of first viable candidate
0469     qmax=find(g_n(qmin-1:r-1)<=r_n-qrmin);      % qmax is the nearest candidate that is >=qrmin away
0470     qmax=qmax(end)+qmin-2;
0471     
0472     
0473     % calculate waveform similarity cost measure statistics
0474     
0475     sr=s(r_n+wavix);        % note s MUST be a column vector so sr is also
0476     wsum=sum(sr);
0477     g_sqm(r)=wsum/sqnx2;                % mean * sqrt(nx2)
0478     g_sd(r)=1/sqrt(sr.'*sr-wsum^2/nx2);   % 1/(Std deviation * sqrt(nx2))
0479     
0480     % now process the candidates
0481     
0482     if qmin<=qmax
0483         qix=qmin:qmax;      % q index
0484         nq=length(qix);
0485         % === should integrate the -0.5 into dy_wxcorr
0486         % === the factor (nx2-1)/(nx2-2) is to compensate for a bug in swsc()
0487         q_cas=-0.5*(nx2-1)/(nx2-2)*dy_wxcorr*(sum(s(repmat(g_n(qix),nx2,1)+repmat(wavix,1,nq)).*repmat(sr,1,nq),1)-g_sqm(qix)*g_sqm(r)).*g_sd(qix)*g_sd(r);
0488         % compare: i=35; Ca=swsc(g_n(qix(i)),g_n(r),s,fs); [i qix(i) r  g_n(qix(i)) g_n(r) dy_wxcorr*Ca q_cas(i)]
0489         
0490         % now calculate pitch deviation cost
0491         
0492         fix = 1+(qmin-1)*dy_nbest:qmax*dy_nbest;    % node index range
0493         f_qr=repmat(r_n-g_n(qix),dy_nbest,1);    % (r-p) period for each node
0494         f_pr=f_qr(:)+f_pq(fix);
0495         % === could absorb the 2 into sv2i
0496         f_nx=2-2*f_pr./(f_pr+abs(f_qr(:)-f_pq(fix)));
0497         f_cp=dy_wpitch*(0.5-exp(sv2i*f_nx.^2));
0498         % === fudge to match dypsa2.4 - could more efficiently be added
0499         % === onto the cost of a talkspurt end
0500         % === should be a voicebox parameter anyway
0501         f_cp(f_pq(fix)==0)=dy_cspurt*dy_wpitch;
0502         
0503         % now find the N-best paths
0504         
0505         [r_cnb,nbix]=sort(f_c(fix)+f_cp+reshape(repmat(q_cas,dy_nbest,1),nq*dy_nbest,1));
0506         f_c(rix)=r_cnb(1:dy_nbest)+g_cr(r);     % costs
0507         f_f(rix)=nbix(1:dy_nbest)+(qmin-1)*dy_nbest;       % traceback nodes
0508         f_pq(rix)=f_qr(nbix(1:dy_nbest));       % previous period
0509         % === f_costs is only for debugging
0510 %         r;
0511 %         f_costs(rix,1)=f_c(fix(nbix(1:dy_nbest)));
0512 %         f_costs(rix,2)=wproj*(1-gcic(r,2));
0513 %         f_costs(rix,3)=f_cp(nbix(1:dy_nbest));
0514 %         f_costs(rix,4)=dy_wener*Cfn(r);
0515 %         f_costs(rix,5)=dy_wslope*Ch(r);
0516 %         f_costs(rix,6)=reshape(q_cas(1+floor((nbix(1:dy_nbest)-1)/dy_nbest)),dy_nbest,1);
0517         
0518         % check cost of using this candidate as the start of a new spurt
0519         % ==== the qmin>2 condition is for compatibility with dypsa 2 and
0520         % prevents any spurts starting until at least qrmax past the first
0521         % gci. This is probably a bug (see again below)
0522         iNb=rix(end);        
0523         if (qmin>2) && (f_c(f_fb(qmin-1))+wproj*(1-gcic(r,2))<f_c(iNb))        % compare with worst of Nbest paths
0524             f_f(iNb)=f_fb(qmin-1);
0525             % === for now we exclude the energy and phase-slope costs for compatibility with dypsa2
0526             % === this is probably a bug
0527             f_c(iNb)=f_c(f_fb(qmin-1))+wproj*(1-gcic(r,2));     % replace worst of the costs with start voicespurt cost
0528             f_pq(iNb)=0;                    % false pq period
0529         end
0530         if f_c(rix(1))<fbestc
0531             f_fb(r)=rix(1);                          % points to the node with lowest end-of-spurt cost
0532             % === should compensate for the pitch period cost incurred at the start of the next spurt
0533             % === note that a node can never be a one-node voicespurt on its own unless dy_nbest=1
0534             % since the start voices[purt option replaced the worst Nbest cost. This is probably good but
0535             % is a bit inconsistent.
0536             fbestc=f_c(rix(1));
0537         else
0538             f_fb(r)=f_fb(r-1);
0539         end
0540     else            % no viable candidates - must be the start of a voicespurt if anything
0541         % === for now we exclude the energy and phase-slope costs for compatibility with dypsa2
0542         % === this is probably a bug
0543         % ==== the qmin>2 condition is for compatibility with dypsa 2 and
0544         % prevents any spurts starting until at least qrmax past the first
0545         % gci. This is probably a bug (see again above)
0546         if (qmin>2)
0547             f_c(rix(1))=f_c(f_fb(qmin-1))+wproj*(1-gcic(r,2));  % cost of new voicespurt
0548             f_f(rix)=f_fb(qmin-1);                              % traceback to previous talkspurt end
0549             f_pq(rix)=0;                                        % previous period
0550         end
0551         f_fb(r)=f_fb(r-1);                                  % cannot be the end of a voicespurt
0552     end
0553 end
0554 
0555 % now do the traceback
0556 
0557 gci = zeros(1,Ncand+1);
0558 
0559 % === for compatibility with dypsa2, we force the penultimate candidate to be accepted
0560 % === should be: i=f_fb(Ncand+1) but instead we pick the best of the penultimate candidate
0561 i=rix(1)-dy_nbest;
0562 if f_c(i-dy_nbest+1)<f_c(i)     % check if start of a talkspurt
0563     i=i-dy_nbest+1;
0564 end
0565 k=1;
0566 while i>1
0567     j=1+floor((i-1)/dy_nbest);          % convert node number to candidate number
0568     gci(k)=g_n(j);
0569     i=f_f(i);
0570     k=k+1;
0571 end
0572 gci=gci(k-1:-1:1);           % put into ascending order
0573 
0574 
0575 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0576 
0577

Generated on Thu 02-Feb-2012 09:15:04 by m2html © 2003