


DYPSA Derive glottal closure instances from speech [gci,goi] = (s,fs)
Note: Needs to be combined with a voiced-voiceless detector to eliminate
spurious closures in unvoiced and silent regions.
Inputs:
s is the speech signal
fs is the sampling frequncy
Outputs:
gci is a vector of glottal closure sample numbers
gco is a vector of glottal opening sample numbers derived from
an assumed constant closed-phase fraction
References:
[1] P. A. Naylor, A. Kounoudes, J. Gudnason, and M. Brookes, “Estimation of Glottal Closure
Instants in Voiced Speech using the DYPSA Algorithm,” IEEE Trans on Speech and Audio
Processing, vol. 15, pp. 34–43, Jan. 2007.
[2] M. Brookes, P. A. Naylor, and J. Gudnason, “A Quantitative Assessment of Group Delay Methods
for Identifying Glottal Closures in Voiced Speech,” IEEE Trans on Speech & Audio Processing,
vol. 14, no. 2, pp. 456–466, Mar. 2006.
[3] A. Kounoudes, P. A. Naylor, and M. Brookes, “The DYPSA algorithm for estimation of glottal
closure instants in voiced speech,” in Proc ICASSP 2002, vol. 1, Orlando, 2002, pp. 349–352.
[4] C. Ma, Y. Kamp, and L. F. Willems, “A Frobenius norm approach to glottal closure detection
from the speech signal,” IEEE Trans. Speech Audio Processing, vol. 2, pp. 258–265, Apr. 1994.
[5] A. Kounoudes, “Epoch Estimation for Closed-Phase Analysis of Speech,” PhD Thesis,
Imperial College, 2001.


0001 function [gci,goi] = dypsa(s,fs) 0002 %DYPSA Derive glottal closure instances from speech [gci,goi] = (s,fs) 0003 % Note: Needs to be combined with a voiced-voiceless detector to eliminate 0004 % spurious closures in unvoiced and silent regions. 0005 % 0006 % Inputs: 0007 % s is the speech signal 0008 % fs is the sampling frequncy 0009 % 0010 % Outputs: 0011 % gci is a vector of glottal closure sample numbers 0012 % gco is a vector of glottal opening sample numbers derived from 0013 % an assumed constant closed-phase fraction 0014 % 0015 % References: 0016 % [1] P. A. Naylor, A. Kounoudes, J. Gudnason, and M. Brookes, “Estimation of Glottal Closure 0017 % Instants in Voiced Speech using the DYPSA Algorithm,” IEEE Trans on Speech and Audio 0018 % Processing, vol. 15, pp. 34–43, Jan. 2007. 0019 % [2] M. Brookes, P. A. Naylor, and J. Gudnason, “A Quantitative Assessment of Group Delay Methods 0020 % for Identifying Glottal Closures in Voiced Speech,” IEEE Trans on Speech & Audio Processing, 0021 % vol. 14, no. 2, pp. 456–466, Mar. 2006. 0022 % [3] A. Kounoudes, P. A. Naylor, and M. Brookes, “The DYPSA algorithm for estimation of glottal 0023 % closure instants in voiced speech,” in Proc ICASSP 2002, vol. 1, Orlando, 2002, pp. 349–352. 0024 % [4] C. Ma, Y. Kamp, and L. F. Willems, “A Frobenius norm approach to glottal closure detection 0025 % from the speech signal,” IEEE Trans. Speech Audio Processing, vol. 2, pp. 258–265, Apr. 1994. 0026 % [5] A. Kounoudes, “Epoch Estimation for Closed-Phase Analysis of Speech,” PhD Thesis, 0027 % Imperial College, 2001. 0028 0029 % Algorithm Parameters 0030 % The following parameters are defined in voicebox() 0031 % 0032 % dy_cpfrac=0.3; % presumed closed phase fraction of larynx cycle 0033 % dy_cproj=0.2; % cost of projected candidate 0034 % dy_cspurt=-0.45; % cost of a talkspurt 0035 % dy_dopsp=1; % Use phase slope projection (1) or not (0)? 0036 % dy_ewdly=0.0008; % window delay for energy cost function term [~ energy peak delay from closure] (sec) 0037 % dy_ewlen=0.003; % window length for energy cost function term (sec) 0038 % dy_ewtaper=0.001; % taper length for energy cost function window (sec) 0039 % dy_fwlen=0.00045; % window length used to smooth group delay (sec) 0040 % dy_fxmax=500; % max larynx frequency (Hz) 0041 % dy_fxmin=50; % min larynx frequency (Hz) 0042 % dy_fxminf=60; % min larynx frequency (Hz) [used for Frobenius norm only] 0043 % dy_gwlen=0.0030; % group delay evaluation window length (sec) 0044 % dy_lpcdur=0.020; % lpc analysis frame length (sec) 0045 % dy_lpcn=2; % lpc additional poles 0046 % dy_lpcnf=0.001; % lpc poles per Hz (1/Hz) 0047 % dy_lpcstep=0.010; % lpc analysis step (sec) 0048 % dy_nbest=5; % Number of NBest paths to keep 0049 % dy_preemph=50; % pre-emphasis filter frequency (Hz) (to avoid preemphasis, make this very large) 0050 % dy_spitch=0.2; % scale factor for pitch deviation cost 0051 % dy_wener=0.3; % DP energy weighting 0052 % dy_wpitch=0.5; % DP pitch weighting 0053 % dy_wslope=0.1; % DP group delay slope weighting 0054 % dy_wxcorr=0.8; % DP cross correlation weighting 0055 % dy_xwlen=0.01; % cross-correlation length for waveform similarity (sec) 0056 0057 % Revision History: 0058 % 0059 % 3.0 - 29 Jun 2006 - Rewrote DP function to improve speed 0060 % 2.6 - 29 Jun 2006 - Tidied up algorithm parameters 0061 % 2.4 - 10 Jun 2006 - Made into a single file aand put into VOICEBOX 0062 % 2.3 - 18 Mar 2005 - Removed 4kHz filtering of phase-slope function 0063 % 2.2 - 05 Oct 2004 - dpgci uses the slopes returned from xewgrdel 0064 % - gdwav from speech with fs<9000 is not filtered 0065 % - Various outputs and inputs of functions have been 0066 % removed since now there is no plotting 0067 % 1.0 - 30 Jan 2001 - Initial version [5] 0068 0069 % Bugs: 0070 % 1. Allow the projections only to extend to the end of the larynx cycle 0071 % 2. Compensate for false pitch period cost at the start of a voicespurt 0072 % 3. Should include energy and pahse-slope costs for the first closeure of a voicespurt 0073 % 4. should delete candidates that are too close to the beginning or end of speech for the cost measures 0074 % currently this is 200 samples fixed in the main routine but it should adapt to window lengths of 0075 % cross-correlation, lpc and energy measures. 0076 % 5. Should have an integrated voiced/voiceless detector 0077 % 6. Allow dypsa to be called in chunks for a long speech file 0078 % 7. Do forward & backward passes to allow for gradient descent and/or discriminative training 0079 % 8. Glottal opening approximation does not really belong in this function 0080 % 9. The cross correlation window is asymmetric (and overcomplex) for no very good reason 0081 % 10. Incorporate -0.5 factor into dy_wxcorr and abolish erroneous (nx2-1)/(nx2-2) factor 0082 % 11. Add in talkspurt cost at the beginning rather than the end of a spurt (more efficient) 0083 % 12. Remove qmin>2 condition from voicespurt start detection (DYPSA 2 compatibility) in two places 0084 % 13. Include energy and phase-slope costs at the start of a voicespurt 0085 % 14. Single-closure voicespurt are only allowed if nbest=1 (should always be forbidden) 0086 % 15. Penultimate closure candidate is always acceptd 0087 % 16. Final element of gcic, Cfn and Ch is unused 0088 % 17. Needs to cope better with irregular voicing (e.g. creaky voice) 0089 % 18. Should give non-integer GCI positions for improved accuracy 0090 % 19. Remove constraint that first voicespurt cannot begin until qrmax after the first candidate 0091 0092 % Copyright (C) Tasos Kounoudes, Jon Gudnason, Patrick Naylor and Mike Brookes 2006 0093 % Version: $Id: dypsa.m 713 2011-10-16 14:45:43Z dmb $ 0094 % 0095 % VOICEBOX is a MATLAB toolbox for speech processing. 0096 % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html 0097 % 0098 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0099 % This program is free software; you can redistribute it and/or modify 0100 % it under the terms of the GNU General Public License as published by 0101 % the Free Software Foundation; either version 2 of the License, or 0102 % (at your option) any later version. 0103 % 0104 % This program is distributed in the hope that it will be useful, 0105 % but WITHOUT ANY WARRANTY; without even the implied warranty of 0106 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 0107 % GNU General Public License for more details. 0108 % 0109 % You can obtain a copy of the GNU General Public License from 0110 % http://www.gnu.org/copyleft/gpl.html or by writing to 0111 % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. 0112 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0113 0114 % Extract algorithm constants from VOICEBOX 0115 0116 0117 dy_preemph=voicebox('dy_preemph'); 0118 dy_lpcstep=voicebox('dy_lpcstep'); 0119 dy_lpcdur=voicebox('dy_lpcdur'); 0120 dy_dopsp=voicebox('dy_dopsp'); % Use phase slope projection (1) or not (0)? 0121 dy_ewtaper=voicebox('dy_ewtaper'); % Prediction order of FrobNorm method in seconds 0122 dy_ewlen=voicebox('dy_ewlen'); % windowlength of FrobNorm method in seconds 0123 dy_ewdly=voicebox('dy_ewdly'); % shift for assymetric speech shape at start of voiced cycle 0124 dy_cpfrac=voicebox('dy_cpfrac'); % presumed ratio of larynx cycle that is closed 0125 dy_lpcnf=voicebox('dy_lpcnf'); % lpc poles per Hz (1/Hz) 0126 dy_lpcn=voicebox('dy_lpcn'); % lpc additional poles 0127 0128 lpcord=ceil(fs*dy_lpcnf+dy_lpcn); % lpc poles 0129 0130 %PreEmphasise input speech 0131 s_used=filter([1 -exp(-2*pi*dy_preemph/fs)],1,s); 0132 0133 % perform LPC analysis, AC method with Hamming windowing 0134 [ar, e, k] = lpcauto(s_used,lpcord,floor([dy_lpcstep dy_lpcdur]*fs)); 0135 0136 if any(any(isinf(ar))) % if the data is bad and gives infinite prediction coefficients we return with a warning 0137 warning('No GCIs returned'); 0138 gci=[]; 0139 return; 0140 end; 0141 0142 % compute the prediction residual 0143 r = lpcifilt(s_used,ar,k); 0144 0145 % compute the group delay function: EW method from reference [2] above 0146 [zcr_cand,sew,gdwav,toff]=xewgrdel(r,fs); 0147 gdwav=-[zeros(toff,1); gdwav(1:end-toff)]; 0148 zcr_cand=[round(zcr_cand), ones(size(zcr_cand))]; %flag zero crossing candidates with ones 0149 0150 sew=0.5+sew'; %the phase slope cost of each candidate 0151 0152 pro_cand=[]; 0153 if dy_dopsp ~= 0 0154 pro_cand = psp(gdwav,fs); 0155 pro_cand = [pro_cand, zeros(length(pro_cand),1)]; %flag projected candidates with zeros 0156 sew = [sew zeros(1,size(pro_cand,1))]; %the phase slope cost of a projected candidate is zero 0157 end; 0158 0159 %Sort the zero crossing and projected candidates together and remove any candidates that 0160 %are within 200 samples from the speech boundary 0161 [gcic,sin] = sortrows([zcr_cand; pro_cand],1); 0162 sew=sew(sin); 0163 sin=find(and(200<gcic,gcic<length(gdwav)-200)); 0164 gcic=gcic(sin,:); 0165 sew=sew(sin); 0166 0167 % compute the frobenious norm function used for a cost in the DP 0168 fnwav=frobfun(s_used,dy_ewtaper*fs,dy_ewlen*fs,dy_ewdly*fs); 0169 0170 %Dynamic programming, picks the most likely candidates based on the pitch consistency, energy etc. 0171 [gci] = dpgci(gcic, s_used(:), sew, fnwav, fs); 0172 0173 %Evaluate goi ... determined to be dy_cpfrac percentage of the larynx cylce away from the last gci 0174 goi=simplegci2goi(gci,dy_cpfrac); 0175 0176 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0177 0178 function z = psp(g,fs) 0179 %PSP Calculates the phase slope projections of the group delay function 0180 % Z = PSP(G) computes the 0181 0182 % Revision History: 0183 % V1.0 July 12th 2002: 0184 % Nov. 6th 2002 : if statement added to remove "last" midpoint 0185 0186 g = g(:); 0187 0188 gdot = [diff(g);0]; 0189 gdotdot = [diff(gdot);0]; 0190 0191 % find the turning points as follows: [tp_number, index_of_tp, min(1) or max(-1), g(index_of_tp)] 0192 turningPoints = zcr(gdot); 0193 turningPoints = [[1:length(turningPoints)]', turningPoints, sign(gdotdot(turningPoints)), g(turningPoints)]; 0194 0195 % useful for debug/plotting 0196 %tplot = zeros(length(g),1); 0197 %tplot(turningPoints(:,1)) = turningPoints(:,2); 0198 0199 % find any maxima which are < 0 0200 negmaxima = turningPoints(find(turningPoints(:,3) == -1 & turningPoints(:,4) < 0 & turningPoints(:,1)~=1),:); %Change 01.05.2003 JG: The first row can't be included 0201 0202 % find the midpoint between the preceding min and the negative max 0203 nmi = negmaxima(:,1); 0204 midPointIndex = turningPoints(nmi-1,2) + round(0.5*(turningPoints(nmi,2) - turningPoints(nmi-1,2))); 0205 midPointValue = g(midPointIndex); 0206 0207 % project a zero crossing with unit slope 0208 nz = midPointIndex - round(midPointValue); 0209 0210 % find any minima which are > 0 0211 posminima = turningPoints(find(turningPoints(:,3) == 1 & turningPoints(:,4) > 0),:); 0212 0213 % find the midpoint between the positive min and the following max 0214 pmi = posminima(:,1); 0215 0216 %Remove last midpoint if it is the last sample 0217 if ~isempty(pmi), if pmi(end)==size(turningPoints,1), pmi=pmi(1:end-1); end; end; 0218 0219 midPointIndex = turningPoints(pmi,2) + round(0.5*(turningPoints(pmi+1,2) - turningPoints(pmi,2))); 0220 midPointValue = g(midPointIndex); 0221 0222 % project a zero crossing with unit slope 0223 pz = midPointIndex - round(midPointValue); 0224 0225 z = sort([nz;pz]); 0226 0227 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0228 0229 function i = zcr(x, p) 0230 %ZCR Finds the indices in a vector to zero crossings 0231 % I = ZCR(X) finds the indices of vector X which are closest to zero-crossings. 0232 % I = ZCR(X, P) finds indices for positive-going zeros-crossings for P=1 and 0233 % negative-going zero-crossings for P=0. 0234 0235 x = x(:); 0236 0237 if (nargin==2) 0238 if (p==0) 0239 z1 = zcrp(x); % find positive going zero-crossings 0240 elseif (p==1) 0241 z1 = zcrp(-x); % find negative going zero-crossings 0242 else 0243 error('ZCR: invalid input parameter 2: must be 0 or 1'); 0244 end 0245 else 0246 z1 = [zcrp(x); zcrp(-x)]; 0247 end 0248 0249 % find crossings when x==0 exactly 0250 z0 = find( (x(1:length(x))==0) & ([x(2:length(x));0] ~= 0)); 0251 0252 % concatenate and sort the two types of zero-crossings 0253 i = sort([z0; z1]); 0254 0255 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0256 0257 function zz = zcrp(xx) %only used in zcr 0258 % find positive-going zero-crossing 0259 z1 = find(diff(sign(xx)) == -2); 0260 % find which out of current sample or next sample is closer to zero 0261 [m, z2] = min([abs(xx(z1)), abs(xx(z1+1))], [], 2); 0262 zz = z1 -1 + z2; 0263 0264 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0265 0266 function [frob]=frobfun(sp,p,m,offset) 0267 0268 % [frob]=frobfun(sp,p,m) 0269 % 0270 % sp is the speech signal assumed to be preemphasised 0271 % p is the prediction order : recomended to be 1 ms in above paper 0272 % m is the window length : recomended to be 1 ms in above paper 0273 % offset is shift for assymetric speech shape at start of voiced cycle - 0274 % default 1.5ms. 0275 % 0276 % This function implements the frobenius norm based measure C defined in [4] below. 0277 % It equals the square of the Frobenius norm of the m by p+1 data matrix divided by p+1 0278 % 0279 % Reference: 0280 % [4] C. Ma, Y. Kamp, and L. F. Willems, “A Frobenius norm approach to glottal closure detection 0281 % from the speech signal,” IEEE Trans. Speech Audio Processing, vol. 2, pp. 258–265, Apr. 1994. 0282 0283 0284 % Revision History: 0285 % V1.0 July 12th 2002: 0286 % Nov. 6th 2002 : if statement added to remove "last" midpoint 0287 0288 %force p m and offset to be integers 0289 p=round(p); 0290 m=round(m); 0291 offset=round(offset); 0292 0293 w=(p+1)*ones(1,m+p); 0294 w(1:p)=1:p; 0295 w(m+1:p+m)=p:-1:1; 0296 0297 w=w./(p+1); 0298 frob=filter(w,1,sp.^2); 0299 frob(1:(round((p+m-1)/2) + offset))=[]; 0300 0301 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0302 0303 function goi=simplegci2goi(gci,pr) 0304 0305 % Estimate glottal opening instants by assuming a fixed closed-phase fraction 0306 0307 gci=round(gci); 0308 maxpitch=max(medfilt1(diff(gci),7)); 0309 0310 % calculate opening instants 0311 for kg=1:length(gci)-1 0312 goi(kg)=gci(kg)+min(pr*(gci(kg+1)-gci(kg)),pr*maxpitch); 0313 end; 0314 kg=kg+1; 0315 goi(kg)=round(gci(kg)+pr*(gci(kg)-gci(kg-1))); %use the previous pitch period instead 0316 goi=round(goi); 0317 0318 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0319 0320 function [tew,sew,y,toff]=xewgrdel(u,fs) 0321 0322 % implement EW group delay epoch extraction 0323 0324 dy_gwlen=voicebox('dy_gwlen'); % group delay evaluation window length 0325 dy_fwlen=voicebox('dy_fwlen'); % window length used to smooth group delay 0326 0327 % perform group delay calculation 0328 0329 gw=2*floor(dy_gwlen*fs/2)+1; % force window length to be odd 0330 ghw=window('hamming',gw,'s'); 0331 ghw = ghw(:); % force to be a column (dmb thinks window gives a row - and he should know as he wrote it!) 0332 ghwn=ghw'.*(gw-1:-2:1-gw)/2; % weighted window: zero in middle 0333 0334 u2=u.^2; 0335 yn=filter(ghwn,1,u2); 0336 yd=filter(ghw,1,u2); 0337 yd(abs(yd)<eps)=10*eps; % prevent infinities 0338 y=yn(gw:end)./yd(gw:end); % delete filter startup transient 0339 toff=(gw-1)/2; 0340 fw=2*floor(dy_fwlen*fs/2)+1; % force window length to be odd 0341 if fw>1 0342 daw=window('hamming',fw,'s'); 0343 y=filter(daw,1,y)/sum(daw); % low pass filter 0344 toff=toff-(fw-1)/2; 0345 end 0346 [tew,sew]=zerocros(y,'n'); % find zero crossings 0347 0348 tew=tew+toff; % compensate for filter delay and frame advance 0349 0350 0351 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0352 0353 function Cfn=fnrg(gcic,frob,fs) 0354 0355 %Frobenious Energy Cost 0356 0357 dy_fxminf=voicebox('dy_fxminf'); 0358 frob=frob(:)'; 0359 mm=round(fs/dy_fxminf); 0360 mfrob=maxfilt(frob,1,mm); 0361 mfrob=[mfrob(floor(mm/2)+1:end) max(frob(end-ceil(mm/2):end))*ones(1,floor(mm/2))]; 0362 rfr=frob./mfrob; 0363 Cfn=0.5-rfr(round(gcic)); 0364 0365 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0366 0367 function gci=dpgci(gcic, s, Ch, fnwav, fs) 0368 0369 %DPGCI Choose the best Glottal Closure Instances with Dynamic Programming 0370 % gci=dpgci(gcic, s(:), Ch, fnwav, fs) returns vectors of sample indices corresponding 0371 % to the instants of glottal closure in the speech signal s at sampling frequency fs Hz. 0372 % 0373 % Inputs: 0374 % gcic is a matrix whos first column are the glottal closure instance candidates and 0375 % the second column is 1 if the corresponding gci is derived from a zero crossing 0376 % but zero if the gci is from a a projected zero crossing 0377 % s is the speech signal - MUST be a column vector 0378 % Ch the phase slope cost of every candidate 0379 % fnwav is the frobenious norm function of s 0380 % fs is the sampling frequncy 0381 % 0382 % Outputs: 0383 % gci is a vector of glottal closure instances chosen by the DP 0384 0385 0386 0387 % Revision History: 0388 % Bugs: Constants are hardwired but defined in a structure like pv (defined in grpdelpv) 0389 % 0390 0391 % get algorithm parameters from voicebox() 0392 0393 dy_fxmin=voicebox('dy_fxmin'); % min larynx frequency (Hz) 0394 dy_fxmax=voicebox('dy_fxmax'); % min larynx frequency (Hz) 0395 dy_xwlen=voicebox('dy_xwlen'); % cross-correlation length for waveform similarity (sec) 0396 dy_nbest=voicebox('dy_nbest'); % Number of NBest paths to keep 0397 dy_spitch=voicebox('dy_spitch'); % scale factor for pitch deviation cost 0398 wproj=voicebox('dy_cproj'); % cost of projected candidate 0399 dy_cspurt=voicebox('dy_cspurt'); % cost of a talkspurt 0400 dy_wpitch=voicebox('dy_wpitch'); % DP pitch weighting 0401 dy_wener=voicebox('dy_wener'); % DP energy weighting 0402 dy_wslope=voicebox('dy_wslope'); % DP group delay slope weighting 0403 dy_wxcorr=voicebox('dy_wxcorr'); % DP cross correlation weighting 0404 0405 0406 0407 %Constants 0408 Ncand=length(gcic); 0409 sv2i=-(2*dy_spitch^2)^(-1); % scale factor for pitch deviation cost 0410 0411 %Limit the search: 0412 qrmin=ceil(fs/dy_fxmax); 0413 qrmax=floor(fs/dy_fxmin); 0414 0415 %Cost and tracking r = current, q = previous, p = preprevious 0416 cost=zeros(Ncand, dy_nbest); cost(:,:)=inf; %Cost matrix, one row for each candidate 0417 maxcost=zeros(Ncand,1); maxcost(:,:)=inf; %Maximum cost in each row 0418 imaxcost=ones(Ncand,1); %Index of maximum cost 0419 0420 prev = ones(Ncand, dy_nbest); %index of previous, q candidates 0421 ind = ones(Ncand, dy_nbest); %index of p in row q (from prev) 0422 qbest = [zeros(Ncand,1), ones(Ncand,2)]; % the minimum cost in any previous q [cost,q,i] 0423 0424 Cfn=fnrg(gcic(:,1),fnwav,fs); %Frob.Energy Cost 0425 0426 %Add start and end state 0427 % === should probably delete candidates that are too close to either end of the input 0428 % === why do we ever need the additional one at the tail end ? 0429 gcic=[[gcic(1,1)-qrmax-2 0];gcic;[gcic(end,1)+qrmax+2 0]]; 0430 Cfn=[0 Cfn 0]; 0431 Ch = [0 Ch 0]; 0432 0433 % first do parallelized version 0434 0435 nxc=ceil(dy_xwlen*fs); % cross correlation window length in samples 0436 % === should delete any gci's that are within this of the end. 0437 % === and for energy window 0438 % rather complicated window specification is for compatibility with DYPSA 2 0439 % === +1 below is for compatibility - probably a bug 0440 wavix=(-floor(nxc/2):floor(nxc/2)+1)'; % indexes for segments [nx2,1] 0441 nx2=length(wavix); 0442 sqnx2=sqrt(nx2); 0443 g_cr=dy_wener*Cfn+dy_wslope*Ch+wproj*(1-gcic(:,2))'; % fixed costs 0444 0445 g_n=gcic(:,1)'; % gci sample number [1,Ncand+2] 0446 g_pr=gcic(:,2)'; % unprojected flag [1,Ncand+2] 0447 g_sqm=zeros(1,Ncand+1); % stores: sqrt(nx2) * mean for speech similarity waveform 0448 g_sd=zeros(1,Ncand+1); % stores: 1/(Std deviation * sqrt(nx2)) for speech similarity waveform 0449 f_pq=zeros((Ncand+1)*dy_nbest,1); % (q-p) period for each node 0450 f_c=repmat(Inf,(Ncand+1)*dy_nbest,1); % cumulative cost for each node - initialise to inf 0451 f_c(1)=0; % initial cost of zero for starting node 0452 % f_costs=zeros(Ncand*dy_nbest,6); % === debugging only remember costs of candidate 0453 f_f=ones((Ncand+1)*dy_nbest,1); % previous node in path 0454 f_fb=ones((Ncand+1),1); % points back to best end-of-spurt node 0455 fbestc=0; % cost of best end-of-spurt node 0456 0457 qmin=2; 0458 for r=2:Ncand+1 0459 % if r==86 0460 % r; 0461 % end 0462 r_n=g_n(r); % sample number of r = current candidate 0463 rix=dy_nbest*(r-1)+(1:dy_nbest); % index range within node variables 0464 0465 % determine the range of feasible q candidates 0466 qmin0=qmin; 0467 qmin=find(g_n(qmin0-1:r-1)<r_n-qrmax); % qmin is the nearest candidate that is >qrmax away 0468 qmin=qmin(end)+qmin0-1; % convert to absolute index of first viable candidate 0469 qmax=find(g_n(qmin-1:r-1)<=r_n-qrmin); % qmax is the nearest candidate that is >=qrmin away 0470 qmax=qmax(end)+qmin-2; 0471 0472 0473 % calculate waveform similarity cost measure statistics 0474 0475 sr=s(r_n+wavix); % note s MUST be a column vector so sr is also 0476 wsum=sum(sr); 0477 g_sqm(r)=wsum/sqnx2; % mean * sqrt(nx2) 0478 g_sd(r)=1/sqrt(sr.'*sr-wsum^2/nx2); % 1/(Std deviation * sqrt(nx2)) 0479 0480 % now process the candidates 0481 0482 if qmin<=qmax 0483 qix=qmin:qmax; % q index 0484 nq=length(qix); 0485 % === should integrate the -0.5 into dy_wxcorr 0486 % === the factor (nx2-1)/(nx2-2) is to compensate for a bug in swsc() 0487 q_cas=-0.5*(nx2-1)/(nx2-2)*dy_wxcorr*(sum(s(repmat(g_n(qix),nx2,1)+repmat(wavix,1,nq)).*repmat(sr,1,nq),1)-g_sqm(qix)*g_sqm(r)).*g_sd(qix)*g_sd(r); 0488 % compare: i=35; Ca=swsc(g_n(qix(i)),g_n(r),s,fs); [i qix(i) r g_n(qix(i)) g_n(r) dy_wxcorr*Ca q_cas(i)] 0489 0490 % now calculate pitch deviation cost 0491 0492 fix = 1+(qmin-1)*dy_nbest:qmax*dy_nbest; % node index range 0493 f_qr=repmat(r_n-g_n(qix),dy_nbest,1); % (r-p) period for each node 0494 f_pr=f_qr(:)+f_pq(fix); 0495 % === could absorb the 2 into sv2i 0496 f_nx=2-2*f_pr./(f_pr+abs(f_qr(:)-f_pq(fix))); 0497 f_cp=dy_wpitch*(0.5-exp(sv2i*f_nx.^2)); 0498 % === fudge to match dypsa2.4 - could more efficiently be added 0499 % === onto the cost of a talkspurt end 0500 % === should be a voicebox parameter anyway 0501 f_cp(f_pq(fix)==0)=dy_cspurt*dy_wpitch; 0502 0503 % now find the N-best paths 0504 0505 [r_cnb,nbix]=sort(f_c(fix)+f_cp+reshape(repmat(q_cas,dy_nbest,1),nq*dy_nbest,1)); 0506 f_c(rix)=r_cnb(1:dy_nbest)+g_cr(r); % costs 0507 f_f(rix)=nbix(1:dy_nbest)+(qmin-1)*dy_nbest; % traceback nodes 0508 f_pq(rix)=f_qr(nbix(1:dy_nbest)); % previous period 0509 % === f_costs is only for debugging 0510 % r; 0511 % f_costs(rix,1)=f_c(fix(nbix(1:dy_nbest))); 0512 % f_costs(rix,2)=wproj*(1-gcic(r,2)); 0513 % f_costs(rix,3)=f_cp(nbix(1:dy_nbest)); 0514 % f_costs(rix,4)=dy_wener*Cfn(r); 0515 % f_costs(rix,5)=dy_wslope*Ch(r); 0516 % f_costs(rix,6)=reshape(q_cas(1+floor((nbix(1:dy_nbest)-1)/dy_nbest)),dy_nbest,1); 0517 0518 % check cost of using this candidate as the start of a new spurt 0519 % ==== the qmin>2 condition is for compatibility with dypsa 2 and 0520 % prevents any spurts starting until at least qrmax past the first 0521 % gci. This is probably a bug (see again below) 0522 iNb=rix(end); 0523 if (qmin>2) && (f_c(f_fb(qmin-1))+wproj*(1-gcic(r,2))<f_c(iNb)) % compare with worst of Nbest paths 0524 f_f(iNb)=f_fb(qmin-1); 0525 % === for now we exclude the energy and phase-slope costs for compatibility with dypsa2 0526 % === this is probably a bug 0527 f_c(iNb)=f_c(f_fb(qmin-1))+wproj*(1-gcic(r,2)); % replace worst of the costs with start voicespurt cost 0528 f_pq(iNb)=0; % false pq period 0529 end 0530 if f_c(rix(1))<fbestc 0531 f_fb(r)=rix(1); % points to the node with lowest end-of-spurt cost 0532 % === should compensate for the pitch period cost incurred at the start of the next spurt 0533 % === note that a node can never be a one-node voicespurt on its own unless dy_nbest=1 0534 % since the start voices[purt option replaced the worst Nbest cost. This is probably good but 0535 % is a bit inconsistent. 0536 fbestc=f_c(rix(1)); 0537 else 0538 f_fb(r)=f_fb(r-1); 0539 end 0540 else % no viable candidates - must be the start of a voicespurt if anything 0541 % === for now we exclude the energy and phase-slope costs for compatibility with dypsa2 0542 % === this is probably a bug 0543 % ==== the qmin>2 condition is for compatibility with dypsa 2 and 0544 % prevents any spurts starting until at least qrmax past the first 0545 % gci. This is probably a bug (see again above) 0546 if (qmin>2) 0547 f_c(rix(1))=f_c(f_fb(qmin-1))+wproj*(1-gcic(r,2)); % cost of new voicespurt 0548 f_f(rix)=f_fb(qmin-1); % traceback to previous talkspurt end 0549 f_pq(rix)=0; % previous period 0550 end 0551 f_fb(r)=f_fb(r-1); % cannot be the end of a voicespurt 0552 end 0553 end 0554 0555 % now do the traceback 0556 0557 gci = zeros(1,Ncand+1); 0558 0559 % === for compatibility with dypsa2, we force the penultimate candidate to be accepted 0560 % === should be: i=f_fb(Ncand+1) but instead we pick the best of the penultimate candidate 0561 i=rix(1)-dy_nbest; 0562 if f_c(i-dy_nbest+1)<f_c(i) % check if start of a talkspurt 0563 i=i-dy_nbest+1; 0564 end 0565 k=1; 0566 while i>1 0567 j=1+floor((i-1)/dy_nbest); % convert node number to candidate number 0568 gci(k)=g_n(j); 0569 i=f_f(i); 0570 k=k+1; 0571 end 0572 gci=gci(k-1:-1:1); % put into ascending order 0573 0574 0575 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0576 0577