Home > voicebox > dypsa.m

dypsa

PURPOSE ^

DYPSA Derive glottal closure instances from speech [gci,goi] = (s,fs)

SYNOPSIS ^

function [gci,goi] = dypsa(s,fs)

DESCRIPTION ^

DYPSA   Derive glottal closure instances from speech [gci,goi] = (s,fs)
   Note: Needs to be combined with a voiced-voiceless detector to eliminate
   spurious closures in unvoiced and silent regions.

   Inputs:
   s     is the speech signal
   fs    is the sampling frequncy

   Outputs:
   gci   is a vector of glottal closure sample numbers
   gco   is a vector of glottal opening sample numbers derived from
         an assumed constant closed-phase fraction

   References:
   [1]  P. A. Naylor, A. Kounoudes, J. Gudnason, and M. Brookes, “Estimation of Glottal Closure
        Instants in Voiced Speech using the DYPSA Algorithm,” IEEE Trans on Speech and Audio
        Processing, vol. 15, pp. 34–43, Jan. 2007.
   [2]  M. Brookes, P. A. Naylor, and J. Gudnason, “A Quantitative Assessment of Group Delay Methods
        for Identifying Glottal Closures in Voiced Speech,” IEEE Trans on Speech & Audio Processing,
        vol. 14, no. 2, pp. 456–466, Mar. 2006.
   [3]  A. Kounoudes, P. A. Naylor, and M. Brookes, “The DYPSA algorithm for estimation of glottal
        closure instants in voiced speech,” in Proc ICASSP 2002, vol. 1, Orlando, 2002, pp. 349–352.
   [4]  C. Ma, Y. Kamp, and L. F. Willems, “A Frobenius norm approach to glottal closure detection
        from the speech signal,” IEEE Trans. Speech Audio Processing, vol. 2, pp. 258–265, Apr. 1994.
   [5]  A. Kounoudes, “Epoch Estimation for Closed-Phase Analysis of Speech,” PhD Thesis,
        Imperial College, 2001.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [gci,goi] = dypsa(s,fs)
0002 %DYPSA   Derive glottal closure instances from speech [gci,goi] = (s,fs)
0003 %   Note: Needs to be combined with a voiced-voiceless detector to eliminate
0004 %   spurious closures in unvoiced and silent regions.
0005 %
0006 %   Inputs:
0007 %   s     is the speech signal
0008 %   fs    is the sampling frequncy
0009 %
0010 %   Outputs:
0011 %   gci   is a vector of glottal closure sample numbers
0012 %   gco   is a vector of glottal opening sample numbers derived from
0013 %         an assumed constant closed-phase fraction
0014 %
0015 %   References:
0016 %   [1]  P. A. Naylor, A. Kounoudes, J. Gudnason, and M. Brookes, “Estimation of Glottal Closure
0017 %        Instants in Voiced Speech using the DYPSA Algorithm,” IEEE Trans on Speech and Audio
0018 %        Processing, vol. 15, pp. 34–43, Jan. 2007.
0019 %   [2]  M. Brookes, P. A. Naylor, and J. Gudnason, “A Quantitative Assessment of Group Delay Methods
0020 %        for Identifying Glottal Closures in Voiced Speech,” IEEE Trans on Speech & Audio Processing,
0021 %        vol. 14, no. 2, pp. 456–466, Mar. 2006.
0022 %   [3]  A. Kounoudes, P. A. Naylor, and M. Brookes, “The DYPSA algorithm for estimation of glottal
0023 %        closure instants in voiced speech,” in Proc ICASSP 2002, vol. 1, Orlando, 2002, pp. 349–352.
0024 %   [4]  C. Ma, Y. Kamp, and L. F. Willems, “A Frobenius norm approach to glottal closure detection
0025 %        from the speech signal,” IEEE Trans. Speech Audio Processing, vol. 2, pp. 258–265, Apr. 1994.
0026 %   [5]  A. Kounoudes, “Epoch Estimation for Closed-Phase Analysis of Speech,” PhD Thesis,
0027 %        Imperial College, 2001.
0028 
0029 % Algorithm Parameters
0030 %       The following parameters are defined in voicebox()
0031 %
0032 %   dy_cpfrac=0.3;           % presumed closed phase fraction of larynx cycle
0033 %   dy_cproj=0.2;            % cost of projected candidate
0034 %   dy_cspurt=-0.45;         % cost of a talkspurt
0035 %   dy_dopsp=1;              % Use phase slope projection (1) or not (0)?
0036 %   dy_ewdly=0.0008;         % window delay for energy cost function term [~ energy peak delay from closure] (sec)
0037 %   dy_ewlen=0.003;          % window length for energy cost function term (sec)
0038 %   dy_ewtaper=0.001;        % taper length for energy cost function window (sec)
0039 %   dy_fwlen=0.00045;        % window length used to smooth group delay (sec)
0040 %   dy_fxmax=500;            % max larynx frequency (Hz)
0041 %   dy_fxmin=50;             % min larynx frequency (Hz)
0042 %   dy_fxminf=60;            % min larynx frequency (Hz) [used for Frobenius norm only]
0043 %   dy_gwlen=0.0030;         % group delay evaluation window length (sec)
0044 %   dy_lpcdur=0.020;         % lpc analysis frame length (sec)
0045 %   dy_lpcn=2;               % lpc additional poles
0046 %   dy_lpcnf=0.001;          % lpc poles per Hz (1/Hz)
0047 %   dy_lpcstep=0.010;        % lpc analysis step (sec)
0048 %   dy_nbest=5;              % Number of NBest paths to keep
0049 %   dy_preemph=50;           % pre-emphasis filter frequency (Hz) (to avoid preemphasis, make this very large)
0050 %   dy_spitch=0.2;           % scale factor for pitch deviation cost
0051 %   dy_wener=0.3;            % DP energy weighting
0052 %   dy_wpitch=0.5;           % DP pitch weighting
0053 %   dy_wslope=0.1;           % DP group delay slope weighting
0054 %   dy_wxcorr=0.8;           % DP cross correlation weighting
0055 %   dy_xwlen=0.01;           % cross-correlation length for waveform similarity (sec)
0056 
0057 %   Revision History:
0058 %
0059 %   3.0 - 29 Jun 2006  - Rewrote DP function to improve speed
0060 %   2.6 - 29 Jun 2006  - Tidied up algorithm parameters
0061 %   2.4 - 10 Jun 2006  - Made into a single file aand put into VOICEBOX
0062 %   2.3 - 18 Mar 2005  - Removed 4kHz filtering of phase-slope function
0063 %   2.2 - 05 Oct 2004  -  dpgci uses the slopes returned from xewgrdel
0064 %                      -  gdwav from speech with fs<9000 is not filtered
0065 %                      -  Various outputs and inputs of functions have been
0066 %                         removed since now there is no plotting
0067 %   1.0 - 30 Jan 2001  - Initial version [5]
0068 
0069 %   Bugs:
0070 %         1. Allow the projections only to extend to the end of the larynx cycle
0071 %         2. Compensate for false pitch period cost at the start of a voicespurt
0072 %         3. Should include energy and phase-slope costs for the first closure of a voicespurt
0073 %         4. should delete candidates that are too close to the beginning or end of speech for the cost measures
0074 %            currently this is 200 samples fixed in the main routine but it should adapt to window lengths of
0075 %            cross-correlation, lpc and energy measures.
0076 %         5. Should have an integrated voiced/voiceless detector
0077 %         6. Allow dypsa to be called in chunks for a long speech file
0078 %         7. Do forward & backward passes to allow for gradient descent and/or discriminative training
0079 %         8. Glottal opening approximation does not really belong in this function
0080 %         9. The cross correlation window is asymmetric (and overcomplex) for no very good reason
0081 %        10. Incorporate -0.5 factor into dy_wxcorr and abolish erroneous (nx2-1)/(nx2-2) factor
0082 %        11. Add in talkspurt cost at the beginning rather than the end of a spurt (more efficient)
0083 %        12. Remove qmin>2 condition from voicespurt start detection (DYPSA 2 compatibility) in two places
0084 %        13. Include energy and phase-slope costs at the start of a voicespurt
0085 %        14. Single-closure voicespurt are only allowed if nbest=1 (should always be forbidden)
0086 %        15. Penultimate closure candidate is always acceptd
0087 %        16. Final element of gcic, Cfn and Ch is unused
0088 %        17. Needs to cope better with irregular voicing (e.g. creaky voice)
0089 %        18. Should give non-integer GCI positions for improved accuracy
0090 %        19. Remove constraint that first voicespurt cannot begin until qrmax after the first candidate
0091 
0092 %      Copyright (C) Tasos Kounoudes, Jon Gudnason, Patrick Naylor and Mike Brookes 2006
0093 %      Version: $Id: dypsa.m 3102 2013-06-13 21:16:11Z dmb $
0094 %
0095 %   VOICEBOX is a MATLAB toolbox for speech processing.
0096 %   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
0097 %
0098 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0099 %   This program is free software; you can redistribute it and/or modify
0100 %   it under the terms of the GNU General Public License as published by
0101 %   the Free Software Foundation; either version 2 of the License, or
0102 %   (at your option) any later version.
0103 %
0104 %   This program is distributed in the hope that it will be useful,
0105 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0106 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0107 %   GNU General Public License for more details.
0108 %
0109 %   You can obtain a copy of the GNU General Public License from
0110 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0111 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0112 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0113 
0114 % Extract algorithm constants from VOICEBOX
0115 
0116 
0117 dy_preemph=voicebox('dy_preemph');
0118 dy_lpcstep=voicebox('dy_lpcstep');
0119 dy_lpcdur=voicebox('dy_lpcdur');
0120 dy_dopsp=voicebox('dy_dopsp');              % Use phase slope projection (1) or not (0)?
0121 dy_ewtaper=voicebox('dy_ewtaper');        % Prediction order of FrobNorm method  in seconds
0122 dy_ewlen=voicebox('dy_ewlen');        % windowlength of FrobNorm method  in seconds
0123 dy_ewdly=voicebox('dy_ewdly');        % shift for assymetric speech shape at start of voiced cycle
0124 dy_cpfrac=voicebox('dy_cpfrac');        % presumed ratio of larynx cycle that is closed
0125 dy_lpcnf=voicebox('dy_lpcnf');          % lpc poles per Hz (1/Hz)
0126 dy_lpcn=voicebox('dy_lpcn');            % lpc additional poles
0127 dy_xwlen=voicebox('dy_xwlen');            % cross-correlation length for waveform similarity (sec)
0128 dy_fxminf=voicebox('dy_fxminf');            % minimum pitch for Frobenius norm calculation
0129 
0130 lpcord=ceil(fs*dy_lpcnf+dy_lpcn);       % lpc poles
0131 
0132 %PreEmphasise input speech
0133 s_used=filter([1 -exp(-2*pi*dy_preemph/fs)],1,s);
0134 
0135 % perform LPC analysis, AC method with Hamming windowing
0136 [ar, e, k] = lpcauto(s_used,lpcord,floor([dy_lpcstep dy_lpcdur]*fs));
0137 
0138 if any(any(isinf(ar)))    % if the data is bad and gives infinite prediction coefficients we return with a warning
0139     warning('No GCIs returned');
0140     gci=[];
0141     return;
0142 end;
0143 
0144 % compute the prediction residual
0145 r = lpcifilt(s_used,ar,k); 
0146 
0147 % compute the group delay function:  EW method from reference [2] above
0148 [zcr_cand,sew,gdwav,toff]=xewgrdel(r,fs); 
0149 gdwav=-[zeros(toff,1); gdwav(1:end-toff)];
0150 zcr_cand=[round(zcr_cand), ones(size(zcr_cand))];   %flag zero crossing candidates with ones
0151 
0152 sew=0.5+sew';  %the phase slope cost of each candidate
0153 
0154 pro_cand=[];
0155 if dy_dopsp ~= 0
0156     pro_cand = psp(gdwav,fs);
0157     pro_cand = [pro_cand, zeros(length(pro_cand),1)]; %flag projected candidates with zeros
0158     sew =      [sew zeros(1,size(pro_cand,1))];      %the phase slope cost of a projected candidate is zero
0159 end;
0160 
0161 %Sort the zero crossing and projected candidates together and remove any candidates that
0162 %are to close to the start or end of the speech signal because the cost functions
0163 %need room either side.
0164 
0165 [gcic,sin] = sortrows([zcr_cand; pro_cand],1);  
0166 sew=sew(sin);
0167 saf=max([200,dy_xwlen*fs/2+1,fs/dy_fxminf]);
0168 sin=find(and(saf<gcic,gcic<length(gdwav)-saf));
0169 gcic=gcic(sin,:);
0170 sew=sew(sin);
0171 
0172 % compute the frobenious norm function used for a cost in the DP
0173 fnwav=frobfun(s_used,dy_ewtaper*fs,dy_ewlen*fs,dy_ewdly*fs);
0174 
0175 %Dynamic programming, picks the most likely candidates based on the pitch consistency, energy etc.
0176 [gci] = dpgci(gcic, s_used(:), sew, fnwav, fs);
0177 
0178 %Evaluate goi ... determined to be dy_cpfrac percentage of the larynx cylce away from the last gci
0179 goi=simplegci2goi(gci,dy_cpfrac);
0180 
0181 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0182 
0183 function z = psp(g,fs)
0184 %PSP  Calculates the phase slope projections of the group delay function
0185 %   Z = PSP(G) computes the
0186 
0187 %   Revision History:
0188 %       V1.0 July 12th 2002:
0189 %            Nov. 6th 2002 : if statement added to remove "last" midpoint
0190 
0191 g = g(:);
0192 
0193 gdot = [diff(g);0];
0194 gdotdot = [diff(gdot);0];
0195 
0196 % find the turning points  as follows: [tp_number, index_of_tp, min(1) or max(-1), g(index_of_tp)]
0197 turningPoints = zcr(gdot);
0198 turningPoints = [[1:length(turningPoints)]', turningPoints, sign(gdotdot(turningPoints)), g(turningPoints)];
0199 
0200 % useful for debug/plotting
0201 %tplot = zeros(length(g),1);
0202 %tplot(turningPoints(:,1)) = turningPoints(:,2);
0203 
0204 % find any maxima which are < 0
0205 negmaxima = turningPoints(find(turningPoints(:,3) == -1 & turningPoints(:,4) < 0 & turningPoints(:,1)~=1),:);  %Change 01.05.2003 JG: The first row can't be included
0206 
0207 % find the midpoint between the preceding min and the negative max
0208 nmi = negmaxima(:,1);
0209 midPointIndex = turningPoints(nmi-1,2) + round(0.5*(turningPoints(nmi,2) - turningPoints(nmi-1,2)));
0210 midPointValue = g(midPointIndex);
0211 
0212 % project a zero crossing with unit slope
0213 nz = midPointIndex - round(midPointValue);
0214 
0215 % find any minima which are > 0
0216 posminima = turningPoints(find(turningPoints(:,3) == 1 & turningPoints(:,4) > 0),:);
0217 
0218 % find the midpoint between the positive min and the following max
0219 pmi = posminima(:,1); 
0220 
0221 %Remove last midpoint if it is the last sample
0222 if ~isempty(pmi), if pmi(end)==size(turningPoints,1), pmi=pmi(1:end-1); end; end;
0223 
0224 midPointIndex = turningPoints(pmi,2) + round(0.5*(turningPoints(pmi+1,2) - turningPoints(pmi,2)));
0225 midPointValue = g(midPointIndex);
0226 
0227 % project a zero crossing with unit slope
0228 pz = midPointIndex - round(midPointValue);
0229 
0230 z = sort([nz;pz]);
0231 
0232 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0233 
0234 function i = zcr(x, p)
0235 %ZCR  Finds the indices in a vector to  zero crossings
0236 %   I = ZCR(X) finds the indices of vector X which are closest to zero-crossings.
0237 %   I = ZCR(X, P) finds indices for positive-going zeros-crossings for P=1 and
0238 %   negative-going zero-crossings for P=0.
0239 
0240 x = x(:);
0241 
0242 if (nargin==2)
0243     if (p==0) 
0244         z1 = zcrp(x);   % find positive going zero-crossings
0245     elseif (p==1) 
0246         z1 = zcrp(-x);  % find negative going zero-crossings
0247     else
0248         error('ZCR: invalid input parameter 2: must be 0 or 1');
0249     end
0250 else
0251     z1 = [zcrp(x); zcrp(-x)];
0252 end
0253 
0254 % find crossings when x==0 exactly
0255 z0 = find( (x(1:length(x))==0) & ([x(2:length(x));0] ~= 0));
0256 
0257 % concatenate and sort the two types of zero-crossings
0258 i = sort([z0; z1]);
0259 
0260 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0261 
0262 function zz = zcrp(xx)  %only used in zcr
0263 % find positive-going zero-crossing
0264 z1 = find(diff(sign(xx)) == -2);
0265 % find which out of current sample or next sample is closer to zero
0266 [m, z2] = min([abs(xx(z1)), abs(xx(z1+1))], [], 2);
0267 zz =  z1 -1 + z2;
0268 
0269 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0270 
0271 function [frob]=frobfun(sp,p,m,offset)
0272 
0273 % [frob]=frobfun(sp,p,m)
0274 %
0275 % sp is the speech signal assumed to be preemphasised
0276 % p  is the prediction order  : recomended to be 1 ms in above paper
0277 % m  is the window length     : recomended to be 1 ms in above paper
0278 % offset is shift for assymetric speech shape at start of voiced cycle -
0279 % default 1.5ms.
0280 %
0281 % This function implements the frobenius norm based measure C defined in [4] below.
0282 % It equals the square of the Frobenius norm of the m by p+1 data matrix divided by p+1
0283 %
0284 % Reference:
0285 %   [4]  C. Ma, Y. Kamp, and L. F. Willems, “A Frobenius norm approach to glottal closure detection
0286 %        from the speech signal,” IEEE Trans. Speech Audio Processing, vol. 2, pp. 258–265, Apr. 1994.
0287 
0288 
0289 %   Revision History:
0290 %       V1.0 July 12th 2002:
0291 %            Nov. 6th 2002 : if statement added to remove "last" midpoint
0292 
0293 %force p m and offset to be integers
0294 p=round(p);
0295 m=round(m);
0296 offset=round(offset);
0297 
0298 w=(p+1)*ones(1,m+p);
0299 w(1:p)=1:p;
0300 w(m+1:p+m)=p:-1:1;
0301 
0302 w=w./(p+1); 
0303 frob=filter(w,1,sp.^2);
0304 frob(1:(round((p+m-1)/2) + offset))=[];
0305 
0306 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0307 
0308 function goi=simplegci2goi(gci,pr)
0309 
0310 % Estimate glottal opening instants by assuming a fixed closed-phase fraction
0311 
0312 gci=round(gci);
0313 maxpitch=max(medfilt1(diff(gci),7));
0314 
0315 % calculate opening instants
0316 for kg=1:length(gci)-1
0317     goi(kg)=gci(kg)+min(pr*(gci(kg+1)-gci(kg)),pr*maxpitch);
0318 end;
0319 kg=kg+1;
0320 goi(kg)=round(gci(kg)+pr*(gci(kg)-gci(kg-1)));  %use the previous pitch period instead
0321 goi=round(goi);
0322 
0323 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0324 
0325 function [tew,sew,y,toff]=xewgrdel(u,fs)
0326 
0327 % implement EW group delay epoch extraction
0328 
0329 dy_gwlen=voicebox('dy_gwlen');          % group delay evaluation window length
0330 dy_fwlen=voicebox('dy_fwlen');          % window length used to smooth group delay
0331 
0332 % perform group delay calculation
0333 
0334 gw=2*floor(dy_gwlen*fs/2)+1;            % force window length to be odd
0335 ghw=window('hamming',gw,'s');
0336 ghw = ghw(:);                           % force to be a column (dmb thinks window gives a row - and he should know as he wrote it!)
0337 ghwn=ghw'.*(gw-1:-2:1-gw)/2;            % weighted window: zero in middle
0338 
0339 u2=u.^2;
0340 yn=filter(ghwn,1,u2);
0341 yd=filter(ghw,1,u2);
0342 yd(abs(yd)<eps)=10*eps;                 % prevent infinities
0343 y=yn(gw:end)./yd(gw:end);               % delete filter startup transient
0344 toff=(gw-1)/2;
0345 fw=2*floor(dy_fwlen*fs/2)+1;            % force window length to be odd
0346 if fw>1
0347     daw=window('hamming',fw,'s');
0348     y=filter(daw,1,y)/sum(daw);         % low pass filter
0349     toff=toff-(fw-1)/2;
0350 end
0351 [tew,sew]=zerocros(y,'n');              % find zero crossings
0352 
0353 tew=tew+toff;                           % compensate for filter delay and frame advance
0354 
0355 
0356 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0357 
0358 function Cfn=fnrg(gcic,frob,fs)
0359 
0360 %Frobenious Energy Cost
0361 
0362 dy_fxminf=voicebox('dy_fxminf');
0363 frob=frob(:)';
0364 mm=round(fs/dy_fxminf);
0365 mfrob=maxfilt(frob,1,mm);
0366 mfrob=[mfrob(floor(mm/2)+1:end) max(frob(end-ceil(mm/2):end))*ones(1,floor(mm/2))];
0367 rfr=frob./mfrob;
0368 Cfn=0.5-rfr(round(gcic));
0369 
0370 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0371 
0372 function gci=dpgci(gcic, s, Ch, fnwav, fs)
0373 
0374 %DPGCI   Choose the best Glottal Closure Instances with Dynamic Programming
0375 %   gci=dpgci(gcic, s(:), Ch, fnwav, fs) returns vectors of sample indices corresponding
0376 %   to the instants of glottal closure in the speech signal s at sampling frequency fs Hz.
0377 %
0378 %   Inputs:
0379 %   gcic    is a matrix whos first column are the glottal closure instance candidates and
0380 %           the second column is 1 if the corresponding gci is derived from a zero crossing
0381 %           but zero if the gci is from a a projected zero crossing
0382 %   s       is the speech signal - MUST be a column vector
0383 %   Ch      the phase slope cost of every candidate
0384 %   fnwav   is the frobenious norm function of s
0385 %   fs      is the sampling frequncy
0386 %
0387 %   Outputs:
0388 %   gci     is a vector of glottal closure instances chosen by the DP
0389 
0390 
0391 
0392 %   Revision History:
0393 %   Bugs:  Constants are hardwired but defined in a structure like pv (defined in grpdelpv)
0394 %
0395 
0396 % get algorithm parameters from voicebox()
0397 
0398 dy_fxmin=voicebox('dy_fxmin');        % min larynx frequency (Hz)
0399 dy_fxmax=voicebox('dy_fxmax');        % min larynx frequency (Hz)
0400 dy_xwlen=voicebox('dy_xwlen');        % cross-correlation length for waveform similarity (sec)
0401 dy_nbest=voicebox('dy_nbest');        % Number of NBest paths to keep
0402 dy_spitch=voicebox('dy_spitch');              % scale factor for pitch deviation cost
0403 wproj=voicebox('dy_cproj');           % cost of projected candidate
0404 dy_cspurt=voicebox('dy_cspurt');           % cost of a talkspurt
0405 dy_wpitch=voicebox('dy_wpitch');           % DP pitch weighting
0406 dy_wener=voicebox('dy_wener');           % DP energy weighting
0407 dy_wslope=voicebox('dy_wslope');           % DP group delay slope weighting
0408 dy_wxcorr=voicebox('dy_wxcorr');           % DP cross correlation weighting
0409 
0410 
0411 
0412 %Constants
0413 Ncand=length(gcic);
0414 sv2i=-(2*dy_spitch^2)^(-1);                 % scale factor for pitch deviation cost
0415 nxc=ceil(dy_xwlen*fs);                      % cross correlation window length in samples
0416 % === should delete any gci's that are within this of the end.
0417 % === and for energy window
0418 
0419 %Limit the search:
0420 qrmin=ceil(fs/dy_fxmax);
0421 qrmax=floor(fs/dy_fxmin);
0422 
0423 %Cost and tracking r = current, q = previous, p = preprevious
0424 cost=zeros(Ncand, dy_nbest); cost(:,:)=inf;    %Cost matrix, one row for each candidate
0425 maxcost=zeros(Ncand,1); maxcost(:,:)=inf;   %Maximum cost in each row
0426 imaxcost=ones(Ncand,1);                     %Index of maximum cost
0427 
0428 prev = ones(Ncand, dy_nbest);                  %index of previous, q candidates
0429 ind = ones(Ncand, dy_nbest);                   %index of p in row q (from prev)
0430 qbest = [zeros(Ncand,1), ones(Ncand,2)]; % the minimum cost in any previous q [cost,q,i]
0431 
0432 Cfn=fnrg(gcic(:,1),fnwav,fs);  %Frob.Energy Cost
0433 
0434 %Add start and end state
0435 % === should probably delete candidates that are too close to either end of the input
0436 % === why do we ever need the additional one at the tail end ?
0437 gcic=[[gcic(1,1)-qrmax-2 0];gcic;[gcic(end,1)+qrmax+2 0]];
0438 Cfn=[0 Cfn 0];
0439 Ch = [0 Ch 0];
0440 
0441 % first do parallelized version
0442 
0443 
0444 % rather complicated window specification is for compatibility with DYPSA 2
0445 % === +1 below is for compatibility - probably a bug
0446 wavix=(-floor(nxc/2):floor(nxc/2)+1)';                 % indexes for segments [nx2,1]
0447 nx2=length(wavix);
0448 sqnx2=sqrt(nx2);
0449 g_cr=dy_wener*Cfn+dy_wslope*Ch+wproj*(1-gcic(:,2))';  % fixed costs
0450 
0451 g_n=gcic(:,1)';                  % gci sample number [1,Ncand+2]
0452 g_pr=gcic(:,2)';                 % unprojected flag [1,Ncand+2]
0453 g_sqm=zeros(1,Ncand+1);         % stores: sqrt(nx2) * mean for speech similarity waveform
0454 g_sd=zeros(1,Ncand+1);         % stores: 1/(Std deviation * sqrt(nx2)) for speech similarity waveform
0455 f_pq=zeros((Ncand+1)*dy_nbest,1);   % (q-p) period for each node
0456 f_c=repmat(Inf,(Ncand+1)*dy_nbest,1);    % cumulative cost for each node - initialise to inf
0457 f_c(1)=0;                       % initial cost of zero for starting node
0458 % f_costs=zeros(Ncand*dy_nbest,6);   % === debugging only remember costs of candidate
0459 f_f=ones((Ncand+1)*dy_nbest,1);    % previous node in path
0460 f_fb=ones((Ncand+1),1);    % points back to best end-of-spurt node
0461 fbestc=0;                       % cost of best end-of-spurt node
0462 
0463 qmin=2;
0464 for r=2:Ncand+1   
0465 %     if r==86
0466 %         r;
0467 %     end
0468     r_n=g_n(r);             % sample number of r = current candidate
0469     rix=dy_nbest*(r-1)+(1:dy_nbest);    % index range within node variables
0470     
0471     % determine the range of feasible q candidates
0472     qmin0=qmin;
0473     qmin=find(g_n(qmin0-1:r-1)<r_n-qrmax);      % qmin is the nearest candidate that is >qrmax away
0474     qmin=qmin(end)+qmin0-1;             % convert to absolute index of first viable candidate
0475     qmax=find(g_n(qmin-1:r-1)<=r_n-qrmin);      % qmax is the nearest candidate that is >=qrmin away
0476     qmax=qmax(end)+qmin-2;
0477     
0478     
0479     % calculate waveform similarity cost measure statistics
0480     
0481     sr=s(r_n+wavix);        % note s MUST be a column vector so sr is also
0482     wsum=sum(sr);
0483     g_sqm(r)=wsum/sqnx2;                % mean * sqrt(nx2)
0484     g_sd(r)=1/sqrt(sr.'*sr-wsum^2/nx2);   % 1/(Std deviation * sqrt(nx2))
0485     
0486     % now process the candidates
0487     
0488     if qmin<=qmax
0489         qix=qmin:qmax;      % q index
0490         nq=length(qix);
0491         % === should integrate the -0.5 into dy_wxcorr
0492         % === the factor (nx2-1)/(nx2-2) is to compensate for a bug in swsc()
0493         q_cas=-0.5*(nx2-1)/(nx2-2)*dy_wxcorr*(sum(s(repmat(g_n(qix),nx2,1)+repmat(wavix,1,nq)).*repmat(sr,1,nq),1)-g_sqm(qix)*g_sqm(r)).*g_sd(qix)*g_sd(r);
0494         % compare: i=35; Ca=swsc(g_n(qix(i)),g_n(r),s,fs); [i qix(i) r  g_n(qix(i)) g_n(r) dy_wxcorr*Ca q_cas(i)]
0495         
0496         % now calculate pitch deviation cost
0497         
0498         fix = 1+(qmin-1)*dy_nbest:qmax*dy_nbest;    % node index range
0499         f_qr=repmat(r_n-g_n(qix),dy_nbest,1);    % (r-p) period for each node
0500         f_pr=f_qr(:)+f_pq(fix);
0501         % === could absorb the 2 into sv2i
0502         f_nx=2-2*f_pr./(f_pr+abs(f_qr(:)-f_pq(fix)));
0503         f_cp=dy_wpitch*(0.5-exp(sv2i*f_nx.^2));
0504         % === fudge to match dypsa2.4 - could more efficiently be added
0505         % === onto the cost of a talkspurt end
0506         % === should be a voicebox parameter anyway
0507         f_cp(f_pq(fix)==0)=dy_cspurt*dy_wpitch;
0508         
0509         % now find the N-best paths
0510         
0511         [r_cnb,nbix]=sort(f_c(fix)+f_cp+reshape(repmat(q_cas,dy_nbest,1),nq*dy_nbest,1));
0512         f_c(rix)=r_cnb(1:dy_nbest)+g_cr(r);     % costs
0513         f_f(rix)=nbix(1:dy_nbest)+(qmin-1)*dy_nbest;       % traceback nodes
0514         f_pq(rix)=f_qr(nbix(1:dy_nbest));       % previous period
0515         % === f_costs is only for debugging
0516 %         r;
0517 %         f_costs(rix,1)=f_c(fix(nbix(1:dy_nbest)));
0518 %         f_costs(rix,2)=wproj*(1-gcic(r,2));
0519 %         f_costs(rix,3)=f_cp(nbix(1:dy_nbest));
0520 %         f_costs(rix,4)=dy_wener*Cfn(r);
0521 %         f_costs(rix,5)=dy_wslope*Ch(r);
0522 %         f_costs(rix,6)=reshape(q_cas(1+floor((nbix(1:dy_nbest)-1)/dy_nbest)),dy_nbest,1);
0523         
0524         % check cost of using this candidate as the start of a new spurt
0525         % ==== the qmin>2 condition is for compatibility with dypsa 2 and
0526         % prevents any spurts starting until at least qrmax past the first
0527         % gci. This is probably a bug (see again below)
0528         iNb=rix(end);        
0529         if (qmin>2) && (f_c(f_fb(qmin-1))+wproj*(1-gcic(r,2))<f_c(iNb))        % compare with worst of Nbest paths
0530             f_f(iNb)=f_fb(qmin-1);
0531             % === for now we exclude the energy and phase-slope costs for compatibility with dypsa2
0532             % === this is probably a bug
0533             f_c(iNb)=f_c(f_fb(qmin-1))+wproj*(1-gcic(r,2));     % replace worst of the costs with start voicespurt cost
0534             f_pq(iNb)=0;                    % false pq period
0535         end
0536         if f_c(rix(1))<fbestc
0537             f_fb(r)=rix(1);                          % points to the node with lowest end-of-spurt cost
0538             % === should compensate for the pitch period cost incurred at the start of the next spurt
0539             % === note that a node can never be a one-node voicespurt on its own unless dy_nbest=1
0540             % since the start voices[purt option replaced the worst Nbest cost. This is probably good but
0541             % is a bit inconsistent.
0542             fbestc=f_c(rix(1));
0543         else
0544             f_fb(r)=f_fb(r-1);
0545         end
0546     else            % no viable candidates - must be the start of a voicespurt if anything
0547         % === for now we exclude the energy and phase-slope costs for compatibility with dypsa2
0548         % === this is probably a bug
0549         % ==== the qmin>2 condition is for compatibility with dypsa 2 and
0550         % prevents any spurts starting until at least qrmax past the first
0551         % gci. This is probably a bug (see again above)
0552         if (qmin>2)
0553             f_c(rix(1))=f_c(f_fb(qmin-1))+wproj*(1-gcic(r,2));  % cost of new voicespurt
0554             f_f(rix)=f_fb(qmin-1);                              % traceback to previous talkspurt end
0555             f_pq(rix)=0;                                        % previous period
0556         end
0557         f_fb(r)=f_fb(r-1);                                  % cannot be the end of a voicespurt
0558     end
0559 end
0560 
0561 % now do the traceback
0562 
0563 gci = zeros(1,Ncand+1);
0564 
0565 % === for compatibility with dypsa2, we force the penultimate candidate to be accepted
0566 % === should be: i=f_fb(Ncand+1) but instead we pick the best of the penultimate candidate
0567 i=rix(1)-dy_nbest;
0568 if f_c(i-dy_nbest+1)<f_c(i)     % check if start of a talkspurt
0569     i=i-dy_nbest+1;
0570 end
0571 k=1;
0572 while i>1
0573     j=1+floor((i-1)/dy_nbest);          % convert node number to candidate number
0574     gci(k)=g_n(j);
0575     i=f_f(i);
0576     k=k+1;
0577 end
0578 gci=gci(k-1:-1:1);           % put into ascending order
0579 
0580 
0581 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0582 
0583

Generated on Fri 22-Sep-2017 19:37:38 by m2html © 2003