v_modspect

PURPOSE ^

V_MODSPECT Calculate the modulation spectrum of a signal C=(S,FS,W,NC,P,N,INC,FL,FH)

SYNOPSIS ^

function [c,qq,ff,tt,po]=v_modspect(s,fs,m,nf,nq,p)

DESCRIPTION ^

V_MODSPECT Calculate the modulation spectrum of a signal C=(S,FS,W,NC,P,N,INC,FL,FH)


 Simple use:

 Inputs:
     s      speech signal
     fs  sample rate in Hz (default 11025)
     m   mode string containing any sensible combination of the following letters; these
         options override any specified in argument p.

             g     apply AGC to the input speech

           r/n/m   time domain windows: rectangular/hanning/hamming [def: hamming]
           T/N/M   mel/mod-mel domain windows: triangular/hanning/hamming [def: triangular]

            a/c    mel filters act in the amplitude or complex domain [def: power]
            p/l    mel outputs are in power or log power [def: amplitude]
             A     mod-mel filters act in the power domain [def: power]
            P/L    mod-mel outputs are in amplitude or log power [def: amplitude]
             k     include DC component of modulation spectrum [not implemented]

             F     Take a DCT in the mel direction
             Z     include 0'th order mel-cepstral coef

             f     Take a DCT in the mod-mel direction
             z     include 0'th order mod-mel cepstral coef

             D     include d/df coefficients in * per mel (not valid with 'F')
             d     include d/dt coefficients in * per second

             u     give frequency axes in mels and mod-mels rather than in Hz

             s     plot the result
             S     plot intermediate results also

     nf  four-element vector giving:
           nf(1) = number of mel bins (or increment in k-mel) [0.1]
           nf(2) = min v_filterbank frequency [40 Hz]
           nf(3) = max v_filterbank frequency [10kHz]
           nf(4) = number of DCT coefficientss excl 0'th [25].
         Omitted values use defaults.
     nq  four-element vector giving
           nq(1) = number of mel-mod bins (or increment in 10 mod-mel units)
           nq(2) = min v_filterbank frequency [50 Hz]
           nq(3) = max v_filterbank frequency [400 Hz]
           nq(4) = number of DCT coefficients excl 0'th [15]
         Omitted values use defaults. Note that the mel-mod frequency scale
         is like the mel scale  but reduced by a factor of 100.
         I.e. mod-mel(f)=mel(100 f)/100; thus 10Hz corresponds to 10 mod-mels.
     p   parameter structure containing some or all of the parameters listed
         at the start of the program code below. p may be given as the
         last input parameter even if some or all of the inputs m, nf and nq
         have been omitted.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [c,qq,ff,tt,po]=v_modspect(s,fs,m,nf,nq,p)
0002 %V_MODSPECT Calculate the modulation spectrum of a signal C=(S,FS,W,NC,P,N,INC,FL,FH)
0003 %
0004 %
0005 % Simple use:
0006 %
0007 % Inputs:
0008 %     s      speech signal
0009 %     fs  sample rate in Hz (default 11025)
0010 %     m   mode string containing any sensible combination of the following letters; these
0011 %         options override any specified in argument p.
0012 %
0013 %             g     apply AGC to the input speech
0014 %
0015 %           r/n/m   time domain windows: rectangular/hanning/hamming [def: hamming]
0016 %           T/N/M   mel/mod-mel domain windows: triangular/hanning/hamming [def: triangular]
0017 %
0018 %            a/c    mel filters act in the amplitude or complex domain [def: power]
0019 %            p/l    mel outputs are in power or log power [def: amplitude]
0020 %             A     mod-mel filters act in the power domain [def: power]
0021 %            P/L    mod-mel outputs are in amplitude or log power [def: amplitude]
0022 %             k     include DC component of modulation spectrum [not implemented]
0023 %
0024 %             F     Take a DCT in the mel direction
0025 %             Z     include 0'th order mel-cepstral coef
0026 %
0027 %             f     Take a DCT in the mod-mel direction
0028 %             z     include 0'th order mod-mel cepstral coef
0029 %
0030 %             D     include d/df coefficients in * per mel (not valid with 'F')
0031 %             d     include d/dt coefficients in * per second
0032 %
0033 %             u     give frequency axes in mels and mod-mels rather than in Hz
0034 %
0035 %             s     plot the result
0036 %             S     plot intermediate results also
0037 %
0038 %     nf  four-element vector giving:
0039 %           nf(1) = number of mel bins (or increment in k-mel) [0.1]
0040 %           nf(2) = min v_filterbank frequency [40 Hz]
0041 %           nf(3) = max v_filterbank frequency [10kHz]
0042 %           nf(4) = number of DCT coefficientss excl 0'th [25].
0043 %         Omitted values use defaults.
0044 %     nq  four-element vector giving
0045 %           nq(1) = number of mel-mod bins (or increment in 10 mod-mel units)
0046 %           nq(2) = min v_filterbank frequency [50 Hz]
0047 %           nq(3) = max v_filterbank frequency [400 Hz]
0048 %           nq(4) = number of DCT coefficients excl 0'th [15]
0049 %         Omitted values use defaults. Note that the mel-mod frequency scale
0050 %         is like the mel scale  but reduced by a factor of 100.
0051 %         I.e. mod-mel(f)=mel(100 f)/100; thus 10Hz corresponds to 10 mod-mels.
0052 %     p   parameter structure containing some or all of the parameters listed
0053 %         at the start of the program code below. p may be given as the
0054 %         last input parameter even if some or all of the inputs m, nf and nq
0055 %         have been omitted.
0056 %
0057 
0058 %
0059 %
0060 % Outputs:
0061 %           c[*,nf,nt]  modulation spectrum where nt the number of time frames and
0062 %                       nf is the normally number of mel frequency bins (but reduced by 1 if
0063 %                       'F' is specified without 'Z').
0064 %                       The middle index is for the output coefficients
0065 %                       which are concatenated in the order [base, delta-p, delta-t].
0066 %                       The number of base coefficients is normally the
0067 %                       number of mon-mel filters but will be reduced by 1
0068 %                       if 'f' is specified but 'z' is not.
0069 %           qq          centre frequencies of modulation bins before any DCT (Hz or mel if 'u' set)
0070 %           ff          centre frequencies of freqiency bins before any DCT (Hz or mel if 'u' set)
0071 %           tt          time axis (sample s(i) is at time i/fs)
0072 %           po          a copy of the parameter structure that was actually used.
0073 %
0074 % Algorithm Outline [parameters are in brackets]:
0075 %
0076 %   (1) Apply AGC to the speech to preserve a constant RMS level [tagc,tagt;'g']
0077 %   (2) Divide into overlapping frames [tinc,tovl], window [twin;'rnm'], zero-pad [tpad,trnd] and FFT
0078 %   (3) Apply mel filters in power or amplitude domain [fmin,fmax,fbin,fwin,fpow;'TNMac'=nf]
0079 %   (4) Regarding each v_filterbank output as a time-varying signal power or amplitude [ppow;'pl'],
0080 %       divide into overlapping frames [pinc,povl], window [pwin;'rnm'], zero-pad [ppad,prnd] and FFT
0081 %   (5) Apply mod-mel filters in power or amplitude domain [mmin,mmax,mbin,mwin,mpow;'TNMA'=nq]
0082 %   (6) Take optional log [qpow;'PL']
0083 %   (7) Optionally apply a DCT in the mel [qdcp;'F'] and/or mod-mel [qdcq;'f'] directions preserving
0084 %       the only some low quefrency coefficients [ dncp, dzcp;'Z'=nf, dncq, dzcq;'z'=nq]
0085 %   (8) Calculate derivatives over mel-frequency [ddep,ddnp;'D'] and/or over time [ddet,ddnt;'d']
0086 %
0087 % Notes: All logs are limitied in dynamic range [logr], output units can be Hz or mel [unit;'u']
0088 %        and optional plots can be generated [dplt;'pP'].
0089 %
0090 
0091 %
0092 % [1] S. D. Ewert and T. Dau. Characterizing frequency slectivity for envelope fluctuations.
0093 %     J. Acoust Soc Amer, 108 (3): 1181-1196, Sept. 2000.
0094 % [2] M. L. Jepsen, S. D. Ewert, and T. Dau. A computational model of human auditory signal processing and perception.
0095 %     J. Acoust Soc Amer, 124 (1): 422-438, July 2008.
0096 % [3] G. Kim, Y. Lu, Y. Hu, and P. C. Loizou. An algorithm that improves speech intelligibility
0097 %     in noise for normal-hearing listeners.
0098 %     J. Acoust Soc Amer, 126 (3): 1486-1494, Sept. 2009. doi: 10.1121/1.3184603.
0099 % [4] J. Tchorz and B. Kollmeier. Estimation of the signal-to-noise ratio with amplitude
0100 %     modulation spectrograms. Speech Commun., 38: 1-17, 2002.
0101 % [5] J. Tchorz and B. Kollmeier. SNR estimation based on amplitude modulation analysis with
0102 %     applications to noise suppression.
0103 %     IEEE Trans Speech Audio Processing, 11 (3): 184-192, May 2003. doi: 10.1109/TSA.2003.811542.
0104 
0105 % bugs:
0106 %  (3) should scale the derivatives so they get the correct units (e.g. power per second or per mel)
0107 %  (5) should take care of the scaling so that FFT size does not affect
0108 %  results
0109 %  (4) sort out quefrency scale for graphs
0110 %  (5) could add an option to draw an algorithm flow diagram
0111 
0112 %      Copyright (C) Mike Brookes 1997
0113 %      Version: $Id: v_modspect.m 10865 2018-09-21 17:22:45Z dmb $
0114 %
0115 %   VOICEBOX is a MATLAB toolbox for speech processing.
0116 %   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
0117 %
0118 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0119 %   This program is free software; you can redistribute it and/or modify
0120 %   it under the terms of the GNU General Public License as published by
0121 %   the Free Software Foundation; either version 2 of the License, or
0122 %   (at your option) any later version.
0123 %
0124 %   This program is distributed in the hope that it will be useful,
0125 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0126 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0127 %   GNU General Public License for more details.
0128 %
0129 %   You can obtain a copy of the GNU General Public License from
0130 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0131 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0132 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0133 persistent PP
0134 if isempty(PP)
0135 
0136     % Algorithm constants: the first letter indicates the signal domain as follows:
0137     %     t=time, f=frequency, p=mel freq, m=modulation, q=mel-modulation, d=mod-cepstral
0138 
0139     PP.logr=120;                                % maximum dynamic range before taking log  (dB)
0140     PP.tagc=0;                                  % 1 = do agc
0141     PP.tagt=2;                                  % agc time constant
0142     PP.tinc=0.25e-3;                            % time domain frame increment
0143     PP.tovl=16;                                 % time domain overlap factor
0144     PP.tpad=30e-3;                              % time domain minimum FFT length (s). must be > 1/min spacing
0145     % of mel filters in Hz
0146     PP.trnd=1;                                  % 1 to round up time domain FFT to the next power of 2
0147     PP.twin='m';                                % time domain window shape: m=hamming, n=hann
0148     PP.fpow='p';                                % mel filters act on a=amplitude or p=power or c=complex FFT coefs
0149     PP.fmin=40;                                 % minimum mel v_filterbank frequency
0150     PP.fmax=10000;                              % maximum mel v_filterbank frequency (0 for Nyquist)
0151     PP.fwin='t';                                % mel v_filterbank filter shape: t=triangle, m=hamming, n=hann
0152     PP.fbin=0.1;                                 % number of mel filters or target spacing in k-mel
0153     PP.ppow='a';                                % modulation signal is a=amplitude or p=power or l=log-power
0154     PP.pinc=16e-3;                              % modulation frame increment
0155     PP.povl=2;                                  % modulation frame overlap factor
0156     PP.pwin='m';                                % mel domain time-window shape
0157     PP.ppad=200e-3;                              % mel domain minimum FFT length (s). must be > 1/min spacing
0158     % of mod-mel filters in Hz
0159     PP.prnd=1;                                  % 1 to round up mel domain FFT to the next power of 2
0160     PP.mpow='p';                                % transform mel domain a=amplitude or p=power
0161     PP.mwin='t';                                % mel-mod v_filterbank filter shape: t=triangle, m=hamming, n=hann
0162     PP.mbin=15;                                 % Number of mel-mod filters
0163     PP.mmin=50;                                 % minimum modulation frequency
0164     PP.mmax=400;                                % maximum modulation frequency (0 for Nyquist)
0165     PP.qpow='a';                                % Use mel-modulation domain a=amplitude or p=power or l=log-power
0166     PP.qdcq=0;                                  % 1 = Take DCT in modulation direction
0167     PP.qdcp=0;                                  % 1 = Take DCT in frequency direction
0168     PP.dzcq=0;                                  % 1 = include 0'th coefficient in modulation direction (always included in derivatives)
0169     PP.dzcp=0;                                  % 1 = include 0'th coefficient in frequency direction
0170     PP.dncp=30;                                 % number of frequency coefficient (excl 0'th)
0171     PP.dncq=15;                                 % number of mel-mod coefficient (excl 0'th)
0172     PP.ddet=0;                                  % 1 = include delta-time coefficients
0173     PP.ddnt=2;                                  % (length of time filter - 1)/2
0174     PP.ddep=0;                                  % 1 = include delta-freq coefficient
0175     PP.ddnp=1;                                  % (length of freq filter - 1)/2
0176     PP.unit=0;                                  % 1 = give frequency axes in mels and mod-mels
0177     PP.dplt=0;                                  % plotting bitfield: 1=new fig,2=link axes, 4=base coefs, 8=d/dp, 16=d/dt,
0178     % 32=waveform, 64=spectrogram, 128=mel-spectrogram, 256=mod-spectrogram, 512 frequency chart
0179     PP.cent=0.2;                                % centile to check for log plotting
0180     PP.cchk=0.2;                                % fraction to require above the centile to allow linear plot
0181 end
0182 po=PP;              % initialize the parameter structure to the default values
0183 switch nargin
0184     case 0
0185         % list all fields
0186         nn=sort(fieldnames(PP));
0187         cnn=char(nn);
0188         fprintf('%d Voicebox parameters:\n',length(nn));
0189 
0190         for i=1:length(nn);
0191             if ischar(PP.(nn{i}))
0192                 fmt='  %s = %s\n';
0193             else
0194                 fmt='  %s = %g\n';
0195             end
0196             fprintf(fmt,cnn(i,:),PP.(nn{i}));
0197         end
0198         return
0199     case 1
0200         error('no sample frequency specified');
0201     case 2
0202         p=[];
0203     case 3
0204         if isstruct(m)
0205             p=m;
0206             m='';
0207         else
0208             p=[];
0209         end
0210         nf=[];
0211         nq=[];
0212     case 4
0213         if isstruct(nf)
0214             p=nf;
0215             nf=[];
0216         else
0217             p=[];
0218         end
0219         nq=[];
0220     case 5
0221         if isstruct(nq)
0222             p=nq;
0223             nq=[];
0224         else
0225             p=[];
0226         end
0227 end
0228 if isstruct(p)      % copy specified fields into po structure
0229     nn=fieldnames(p);
0230     for i=1:length(nn)
0231         if isfield(po,nn{i})
0232             po.(nn{i})=p.(nn{i});
0233         end
0234     end
0235 end
0236 
0237 % now sort out the mode flags
0238 
0239 for i=1:length(m)
0240     switch m(i)
0241         case 'g'
0242             po.tagc=1;
0243         case {'r','n','m'}
0244             po.twin=m(i);
0245             po.pwin=m(i);
0246         case {'T','N','M'}
0247             po.fwin=lower(m(i));
0248             po.mwin=po.fwin;
0249         case {'a','c'}
0250             po.fpow=m(i);
0251         case {'p','l'}
0252             po.ppow=m(i);
0253         case 'A'
0254             po.mpow=lower(m(i));
0255         case {'P','L'}
0256             po.qpow=lower(m(i));
0257         case 'f'
0258             po.qdcq=1;
0259         case 'F'
0260             po.qdcp=1;
0261         case 'z'
0262             po.dzcq=1;
0263         case 'Z'
0264             po.dzcp=1;
0265         case 'd'
0266             po.ddet=1;
0267         case 'D'
0268             po.ddep=1;
0269         case 'u'
0270             po.unit=1;
0271         case 's'
0272             po.dplt=bitor(po.dplt,31+512);
0273         case 'S'
0274             po.dplt=bitor(po.dplt,1023);
0275     end
0276 end
0277 
0278 if ~nargout
0279     po.dplt=bitor(po.dplt,4);   % just plot base coefficients
0280 end
0281 if ~isempty(nf)
0282     po.dncp=nf;
0283 end
0284 if ~isempty(nq)
0285     po.dncq=nq;
0286 end
0287 
0288 lnf=length(nf);
0289 if lnf>=1
0290     po.fbin=nf(1);
0291     if lnf>=2
0292         po.fmin=nf(2);
0293         if lnf>=3
0294             po.fmax=nf(3);
0295             if lnf>=4
0296                 po.dncp=nf(4);
0297             end
0298         end
0299     end
0300 end
0301 lnq=length(nq);
0302 if lnq>=1
0303     po.mbin=nq(1);
0304     if lnq>=2
0305         po.mmin=nq(2);
0306         if lnq>=3
0307             po.mmax=nq(3);
0308             if lnq>=4
0309                 po.dncq=nq(4);
0310             end
0311         end
0312     end
0313 end
0314 
0315 % finally eliminate potential incomaptibilities
0316 
0317 po.ddep=po.ddep & (po.qdcp==0);         % never differentiate along DCT axis
0318 po.dzcq=po.dzcq | (po.qdcq==0);         % include all coefficients if no DCT
0319 po.dzcp=po.dzcp | (po.qdcp==0);         % include all coefficients if no DCT
0320 
0321 logth=10^(-po.logr/10);
0322 ns=length(s);
0323 axhandle=[];
0324 %
0325 % first apply AGC
0326 %
0327 if po.tagc
0328     tau=po.tagt*fs;      % time constant for power filtering
0329     ax2=[1 -exp(-1/tau)];
0330     bx2=sum(ax2);
0331     sm=mean(s(1:min(ns,round(tau))).^2);            % initialize filter using average of 1 tau
0332     if sm>0
0333         s=s./sqrt(filter(bx2,ax2,s.^2,-ax2(2)*sm));    % normalize to RMS=1
0334     end
0335 end
0336 if bitand(po.dplt,32)
0337     if bitand(po.dplt,1)
0338         figure;
0339     end
0340     plot((1:ns)/fs,s);
0341     axhandle(end+1)=gca;
0342     title('Input after AGC');
0343     xlabel('Time (s)');
0344 end
0345 %
0346 % % calculate power spectra
0347 %
0348 ni=ceil(po.tinc*fs);    % frame increment in samples
0349 nw=ni*po.tovl;           % window length
0350 nf=ceil(max(nw,po.tpad*fs));          % FFT length: pad with zeros
0351 if po.trnd
0352     nf=2^nextpow2(nf);  % round up FFT length to next power of 2
0353 else
0354     nf=round(nf);
0355 end
0356 switch po.twin
0357     case 'm'
0358 %         w=hamming(nw+1)'; w(end)=[];        % Hamming window ([2] uses Hanning)
0359         w=0.54-0.46*cos(2*pi*(0:nw-1)/nw); % Hamming window ([2] uses Hanning)
0360     case 'n'
0361 %         w=hanning(nw-1)';           % Hanning window: underlying period is nw
0362         w=0.5-0.5*cos(2*pi*(1:nw-1)/nw);  % Hanning window: underlying period is nw
0363     case 'r'
0364         w=ones(nw,1);                 % rctangular window
0365 end
0366 fx=v_rfft(v_enframe(s,w,ni),nf,2);
0367 [nft,nff]=size(fx);                 % number of frames and frequency bins
0368 if bitand(po.dplt,64)
0369     if bitand(po.dplt,1)
0370         figure;
0371     end
0372     fp=fx.*conj(fx);
0373     imagesc(((0:nft-1)*ni+(nw+1)/2)/fs,(0:nff-1)*fs*0.001/nf,10*log10(max(fp,max(fp(:))*1e-6))');
0374     axhandle(end+1)=gca;
0375     hanc= colorbar;
0376     set(get(hanc,'ylabel'),'String','dB');
0377     axis('xy');
0378     title('Input Spectrogram');
0379     xlabel('Time (s)');
0380     ylabel('Frequency (kHz)');
0381 end
0382 [mbk,ffm]=v_melbankm(po.fbin,nf,fs,po.fmin/fs,min(po.fmax/fs+(po.fmax==0),0.5),po.fwin);
0383 switch [po.fpow po.ppow]
0384     case 'aa'
0385         px=abs(fx)*mbk';         % amplitude domain filtering
0386     case {'ap','al'}
0387         px=(abs(fx)*mbk').^2;         % amplitude domain filtering
0388     case 'pa'
0389         px=sqrt((fx.*conj(fx))*mbk');         % power domain filtering
0390     case {'pp','pl'}
0391         px=(fx.*conj(fx))*mbk';         % power domain filtering
0392     case 'ca'
0393         px=abs(fx*mbk');         % complex domain filtering
0394     case {'cp','cl'}
0395         px=fx*mbk';         % complex domain filtering
0396         px=px.*conj(px);   % convert to power
0397 end
0398 if po.ppow=='l'
0399     px=log(max((px),max(px(:))*logth)); % clip before log
0400 end
0401 if bitand(po.dplt,128)
0402     if bitand(po.dplt,1)
0403         figure;
0404     end
0405     switch po.ppow
0406         case 'a'
0407             imagesc(((0:nft-1)*ni+(nw+1)/2)/fs,ffm/1000,20*log10(max(px,max(px(:))*1e-3))');
0408         case 'p'
0409             imagesc(((0:nft-1)*ni+(nw+1)/2)/fs,ffm/1000,10*log10(max(px,max(px(:))*1e-6))');
0410         case 'l'
0411             imagesc(((0:nft-1)*ni+(nw+1)/2)/fs,ffm/1000,max(px*10/log(10),max(px(:))*10/log(10)-60)');
0412     end
0413     axhandle(end+1)=gca;
0414     hanc= colorbar;
0415     set(get(hanc,'ylabel'),'String','dB');
0416     axis('xy');
0417     title('Mel Spectrogram');
0418     xlabel('Time (s)');
0419     ylabel('Frequency (k-mel)');
0420 end
0421 npf=length(ffm);     % number of mel filters
0422 mni=ceil(po.pinc*fs/ni);    % frame increment in spectral frames
0423 mnw=mni*po.povl;        % window length
0424 mnf=ceil(max(mnw,po.ppad*fs/ni));          % FFT length: pad with zeros
0425 if po.prnd
0426     mnf=2^nextpow2(mnf);  % round up FFT length to next power of 2
0427 else
0428     mnf=round(mnf);
0429 end
0430 nmt=fix((nft-mnw+mni)/mni); % number of modulation spectrum frames
0431 ix=repmat((1:mnw)',1,nmt)+repmat((0:nmt-1)*mni,mnw,1);  % time index for all modumation spectrum frames
0432 mx=v_rfft(reshape(px(ix(:),:),mnw,nmt*npf),mnf,1);          % find modulation spectrum
0433 % nmm=size(mx,1);             % number of modulation spectrum bins [not needed]
0434 [qbk,qqm]=v_melbankm(po.mbin,mnf,100*fs/ni,po.mmin*ni/fs,min(po.mmax*ni/fs+(po.mmax==0),0.5),po.mwin);    % multiply frq by 100 to make it alsmost log
0435 nqq=length(qqm);
0436 switch po.mpow
0437     case 'a'
0438         qx=reshape(qbk*abs(mx),nqq,nmt,npf);
0439     case 'p'
0440         qx=reshape(qbk*(mx.*conj(mx)),nqq,nmt,npf);
0441 end
0442 switch [po.mpow po.qpow]
0443     case 'aa'
0444         qx=reshape(qbk*abs(mx),nqq,nmt,npf);         % amplitude domain filtering
0445     case {'ap','al'}
0446         qx=reshape(qbk*abs(mx),nqq,nmt,npf).^2;         % amplitude domain filtering + power out
0447     case 'pa'
0448         qx=sqrt(reshape(qbk*(mx.*conj(mx)),nqq,nmt,npf));         % power domain filtering + amp out
0449     case {'pp','pl'}
0450         qx=reshape(qbk*(mx.*conj(mx)),nqq,nmt,npf);         % power domain filtering
0451 end
0452 if po.qpow=='l'
0453     qx=log(max((qx),max(qx(:))*logth)); % clip before log
0454 end
0455 tt=((1+mnw*ni+nw-ni)/2+(0:nmt-1)*mni*ni)/fs; % time axis of output frames
0456 
0457 if bitand(po.dplt,256) && (~bitand(po.dplt,4) || po.qdcq>0 || po.qdcp>0)
0458     if bitand(po.dplt,1)
0459         figure;
0460     end
0461     switch po.qpow
0462         case 'a'
0463             dqx=20*log10(max(qx,max(qx(:))*1e-3));
0464         case 'p'
0465             dqx=10*log10(max(qx,max(qx(:))*1e-6));
0466         case 'l'
0467             dqx=max(qx*10/log(10),max(qx(:))*10/log(10)-60);
0468     end
0469     dqx(end+1,:,:)=max(dqx(:));  % insert a border
0470     dqx=reshape(permute(dqx,[2,1,3]),nmt,(nqq+1)*npf);
0471     ffq=ffm(1)+((0:(nqq+1)*npf)-(nqq-1)/2)*(ffm(2)-ffm(1))/(nqq+1); % mel frequencies
0472     imagesc(tt,ffq/1000,dqx');
0473     axhandle(end+1)=gca;
0474     hanc= colorbar;
0475     set(get(hanc,'ylabel'),'String','dB');
0476     axis('xy');
0477     title('Modulation Spectrogram');
0478     xlabel('Time (s)');
0479     ylabel('Frequency (k-mel)');
0480 end
0481 ndq=nqq;    % number of coefficients to use in the q direction
0482 if po.qdcq
0483     dx=reshape(v_rdct(reshape(qx,nqq,nmt*npf)),nqq,nmt,npf);    % take dct in q direction
0484     ndq=min(ndq,po.dncq+1);     % "+1" to include the 0'th coefficient
0485 else
0486     dx=qx;
0487 end
0488 ndf=npf;    % number of coefficients to use in the p direction
0489 if po.qdcp
0490     dx=permute(reshape(v_rdct(reshape(permute(qx,[3,1,2]),npf,nqq*nmt)),npf,nqq,nmt),[2 3 1]);    % take dct in p direction
0491     ndf=min(ndf,po.dncp+1);   % "+1" to include the 0'th coefficient
0492 elseif po.ddep                  % calculate the frequency derivative
0493     nv=po.ddnp;
0494     vv=(-nv:nv)*-3/(nv*(nv+1)*(2*nv+1)*(ffm(2)-ffm(1)));
0495     dxdp=filter(vv,1,dx,[],3);
0496     dxdp=dxdp(:,:,[repmat(2*nv+1,1,nv) 2*nv+1:npf repmat(npf,1,nv)]);   % replicate filter outputs at ends
0497 end
0498 if po.ddet                      % calculate the time derivative
0499     nv=po.ddnt;
0500     vv=(-nv:nv)*-3/(nv*(nv+1)*(2*nv+1)*(tt(2)-tt(1)));
0501     dxdt=filter(vv,1,dx,[],2);
0502     dxdt=dxdt(:,[repmat(2*nv+1,1,nv) 2*nv+1:nmt repmat(nmt,1,nv)],:);   % replicate filter outputs at ends
0503 end
0504 nqj=ndq-(po.dzcq==0);
0505 nqk=nqj+ndq*(po.ddep+po.ddet);
0506 npk=ndf-(po.dzcp==0);
0507 c=zeros(nqk,npk,nmt);
0508 c(1:nqj,:,:)=permute(dx(1+ndq-nqj:ndq,:,1+ndf-npk:ndf),[1,3,2]);        % base coefficients
0509 if bitand(po.dplt,512)
0510     if bitand(po.dplt,1)
0511         figure;
0512     end
0513     ffx=repmat(v_mel2frq(ffm(:)),1,nqq)/1000;
0514     qqx=repmat(mel2frq(qqm(:)')/100,npf,1);
0515     plot(qqx,ffx,'xb');
0516     axis([[qqx(1) qqx(end)]*[1.1 -0.1; -0.1 1.1] [ffx(1) ffx(end)]*[1.1 -0.1; -0.1 1.1]]);
0517     title('Frequency bin centres');
0518     xlabel(sprintf('%.1f : %.1f : %.1f mod-mel = Modulation Frequency (Hz)',qqm(1)/100,(qqm(2)-qqm(1))/100,qqm(end)/100));
0519     ylabel(sprintf('%.0f : %.0f : %.0f mel = Frequency (kHz)',ffm(1),ffm(2)-ffm(1),ffm(end)));
0520 end
0521 if bitand(po.dplt,4)
0522     if bitand(po.dplt,1)
0523         figure;
0524     end
0525     dqx=dx(1+ndq-nqj:ndq,:,1+ndf-npk:ndf);
0526     dqxmx=max(abs(dqx(:)));
0527     dqxge=dqx>=0;
0528     dbfact=2-(po.qpow=='p');        % 2=amplitude, 1=power
0529     dqxa=max(abs(dqx),dqxmx*10^(-6/dbfact));  % clip at -60 dB
0530     dqxmn=min(dqxa(:));
0531     dblab='';
0532     if(mean(dqxa(:)>dqxmn+po.cent*(dqxmx-dqxmn)))<po.cchk
0533         dboff=abs(round(dbfact*10*log10(dqxmn)));
0534         dblab='{\pm}dB';
0535         if ~all(dqxge(:))       % not all positive
0536             dqxa=dqxa/dqxmn;    % force log to be always positive
0537             if dqxmn>1 && dboff~=0
0538                 dblab=sprintf('{\\pm}dB - %d',dboff);
0539             else
0540                 dblab=sprintf('{\\pm}dB + %d',dboff);
0541             end
0542         else
0543             dblab='dB';
0544         end
0545         dqx=dbfact*10*log10(dqxa).*(2*dqxge-1);
0546     end
0547     dqx(end+1,:,:)=max(dqx(:));  % insert a border
0548     dqx=reshape(permute(dqx,[2,1,3]),nmt,(nqj+1)*npk);
0549     ffq=ffm(1)+((0:(nqj+1)*npf)-(nqj-1)/2)*(ffm(2)-ffm(1))/(nqj+1); % mel frequencies
0550     imagesc(tt,ffq/1000,dqx');
0551     axhandle(end+1)=gca;
0552     hanc= colorbar;
0553     set(get(hanc,'ylabel'),'String',dblab);
0554     axis('xy');
0555     if po.qdcq
0556         title('Modulation spectrum DCT');
0557     else
0558         title('Modulation spectrum');
0559     end
0560     xlabel('Time (s)');
0561     if po.qdcp
0562         ylabel('Quefrency (k-mel^{-1})');
0563     else
0564         ylabel('Frequency (k-mel)');
0565     end
0566 end
0567 
0568 if po.ddep
0569     c(nqj+1:nqj+ndq,:,:)=permute(dxdp(1:ndq,:,1+ndf-npk:ndf),[1,3,2]);  % add on p delta
0570     if bitand(po.dplt,8)
0571         if bitand(po.dplt,1)
0572             figure;
0573         end
0574         dqx=dxdp(1:ndq,:,1+ndf-npk:ndf);
0575         dqxmx=max(abs(dqx(:)));
0576         dqxge=dqx>=0;
0577         dbfact=2-(po.qpow=='p');        % 2=amplitude, 1=power
0578         dqxa=max(abs(dqx),dqxmx*10^(-6/dbfact));  % clip at -60 dB
0579         dqxmn=min(dqxa(:));
0580         dblab='';
0581         if(mean(dqxa(:)>dqxmn+po.cent*(dqxmx-dqxmn)))<po.cchk
0582             dboff=abs(round(dbfact*10*log10(dqxmn)));
0583             dblab='{\pm}dB';
0584             if ~all(dqxge(:))       % not all positive
0585                 dqxa=dqxa/dqxmn;    % force log to be always positive
0586                 if dqxmn>1 && dboff~=0
0587                     dblab=sprintf('{\\pm}dB - %d',dboff);
0588                 else
0589                     dblab=sprintf('{\\pm}dB + %d',dboff);
0590                 end
0591             else
0592                 dblab='dB';
0593             end
0594             dqx=dbfact*10*log10(dqxa).*(2*dqxge-1);
0595         end
0596         dqx(end+1,:,:)=max(dqx(:));  % insert a border
0597         dqx=reshape(permute(dqx,[2,1,3]),nmt,(ndq+1)*npk);
0598         ffq=ffm(1)+((0:(ndq+1)*npf)-(ndq-1)/2)*(ffm(2)-ffm(1))/(ndq+1); % mel frequencies
0599         imagesc(tt,ffq/1000,dqx');
0600         axhandle(end+1)=gca;
0601         hanc= colorbar;
0602         set(get(hanc,'ylabel'),'String',dblab);
0603         axis('xy');
0604         if po.qdcq
0605             title('Modulation spectrum DCT = freq derivative');
0606         else
0607             title('Modulation spectrum = freq derivative');
0608         end
0609         xlabel('Time (s)');
0610         if po.qdcp
0611             ylabel('Quefrency (k-mel^{-1})');
0612         else
0613             ylabel('Frequency (k-mel)');
0614         end
0615     end
0616 end
0617 if po.ddet
0618     c(nqk-ndq+1:nqk,:,:)=permute(dxdt(1:ndq,:,1+ndf-npk:ndf),[1,3,2]);  % add on t delta
0619     if bitand(po.dplt,16)
0620         if bitand(po.dplt,1)
0621             figure;
0622         end
0623         dqx=dxdt(1:ndq,:,1+ndf-npk:ndf);
0624         dqxmx=max(abs(dqx(:)));
0625         dqxge=dqx>=0;
0626         dbfact=2-(po.qpow=='p');        % 2=amplitude, 1=power
0627         dqxa=max(abs(dqx),dqxmx*10^(-6/dbfact));  % clip at -60 dB
0628         dqxmn=min(dqxa(:));
0629         dblab='';
0630         if(mean(dqxa(:)>dqxmn+po.cent*(dqxmx-dqxmn)))<po.cchk
0631             dboff=abs(round(dbfact*10*log10(dqxmn)));
0632             dblab='{\pm}dB';
0633             if ~all(dqxge(:))       % not all positive
0634                 dqxa=dqxa/dqxmn;    % force log to be always positive
0635                 if dqxmn>1 && dboff~=0
0636                     dblab=sprintf('{\\pm}dB - %d',dboff);
0637                 else
0638                     dblab=sprintf('{\\pm}dB + %d',dboff);
0639                 end
0640             else
0641                 dblab='dB';
0642             end
0643             dqx=dbfact*10*log10(dqxa).*(2*dqxge-1);
0644         end
0645         dqx(end+1,:,:)=max(dqx(:));  % insert a border
0646         dqx=reshape(permute(dqx,[2,1,3]),nmt,(ndq+1)*npk);
0647         ffq=ffm(1)+((0:(nqj+1)*npf)-(nqj-1)/2)*(ffm(2)-ffm(1))/(nqj+1); % mel frequencies
0648         imagesc(tt,ffq/1000,dqx');
0649         axhandle(end+1)=gca;
0650         hanc= colorbar;
0651         set(get(hanc,'ylabel'),'String',dblab);
0652         axis('xy');
0653         if po.qdcq
0654             title('Modulation spectrum DCT = time derivative');
0655         else
0656             title('Modulation spectrum = time derivative');
0657         end
0658         xlabel('Time (s)');
0659         if po.qdcp
0660             ylabel('Quefrency (k-mel^{-1})');
0661         else
0662             ylabel('Frequency (k-mel)');
0663         end
0664     end
0665 end
0666 if po.unit                      % frequency units are in mels
0667     ff=ffm;
0668     qq=qqm/100;
0669 else
0670     ff=v_mel2frq(ffm);
0671     qq=v_mel2frq(qqm)/100;
0672 end
0673 if length(axhandle)>1
0674     if ~bitand(po.dplt,32+64)
0675         linkaxes(axhandle)
0676     else
0677         linkaxes(axhandle,'x')
0678     end
0679 end
0680 
0681

Generated by m2html © 2003