Home > voicebox > filtbankm.m

filtbankm

PURPOSE ^

FILTBANKM determine matrix for a linear/mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W)

SYNOPSIS ^

function [x,cf,il,ih]=filtbankm(p,n,fs,fl,fh,w)

DESCRIPTION ^

FILTBANKM determine matrix for a linear/mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W)

 Inputs:
       p   number of filters in filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))]
        n   length of fft
           or [nfrq dfrq frq1] nfrq=number of input frequency bins, frequency increment (Hz), first bin freq (Hz)
        fs  sample rate in Hz
        fl  low end of the lowest filter in Hz (see 'h' option) [default = 0 or 30Hz for 'l' option]
        fh  high end of highest filter in Hz [default = fs/2]
        w   any sensible combination of the following:

             'b' = bark scale instead of mel
             'e' = erb-rate scale
             'l' = log10 Hz frequency scale
             'f' = linear frequency scale [default]
             'm' = mel frequency scale

             'c' = fl & fh specify centre of low and high filters instead of edges
             'h' = fl & fh are in mel/erb/bark/log10 instead of Hz
             'H' = cf outputs are in mel/erb/bark/log10 instead of Hz

              'y' = lowest filter remains at 1 down to 0 frequency and
                    highest filter remains at 1 up to nyquist freqency
                    The total power in the fft is preserved (unless 'u' is specified).
             'Y' = extend only at low frequency end (or high end if 'y' also specified)

             'p' = input P specifies the number of filters [default if P>=1]
             'P' = input P specifies the filter spacing [default if P<1]

             'u' = input and output are power per Hz instead of power.
             'U' = input is power but output is power per Hz.

             's' = single-sided input: do not include symmetric negative frequencies (i.e. non-DC inputs have been doubled)
             'S' = single-sided output: do not mirror the non-DC filter characteristics (i.e. double non-DC outputs)

             'g' = plot filter coefficients as graph
             'G' = plot filter coefficients as image [default if no output arguments present]


 Outputs:    x     a sparse matrix containing the filterbank amplitudes
                  If the il and ih outputs are given then size(x)=[p,mx-mn+1]
                 otherwise size(x)=[p,1+floor(n/2)]
                 Note that the peak filter values equal 2 to account for the power
                 in the negative FFT frequencies.
           cf    the filterbank centre frequencies in Hz (see 'H' option)
            il    the lowest fft bin with a non-zero coefficient
            ih    the highest fft bin with a non-zero coefficient

 The routine performs interpolation of the input spectrum by convolving the power spectrum
 with a triangular filter and then simulates a filterbank with asymetric triangular filters.

 Examples of use:

 (a) Calcuate the Mel-frequency Cepstral Coefficients

       f=rfft(s);                    % rfft() returns only 1+floor(n/2) coefficients
        x=filtbankm(p,n,fs,0,fs/2,'m');            % n is the fft length, p is the number of filters wanted
        z=log(x*abs(f).^2);         % multiply x by the power spectrum
        c=dct(z);                   % take the DCT

 (b) Calcuate the Mel-frequency Cepstral Coefficients efficiently

       f=fft(s);                        % n is the fft length, p is the number of filters wanted
       [x,cf,na,nb]=filtbankm(p,n,fs,0,fs/2,'m');   % na:nb gives the fft bins that are needed
       z=log(x*(f(na:nb)).*conj(f(na:nb)));

 (c) Plot the calculated filterbanks

      plot((0:floor(n/2))*fs/n,filtbankm(p,n,fs,0,fs/2,'m')')   % fs=sample frequency

 (d) Plot the filterbanks

      filtbankm(p,n,fs,0,fs/2,'m');

 References:

 [1] S. S. Stevens, J. Volkman, and E. B. Newman. A scale for the measurement
     of the psychological magnitude of pitch. J. Acoust Soc Amer, 8: 185–19, 1937.
 [2] S. Davis and P. Mermelstein. Comparison of parametric representations for
     monosyllabic word recognition in continuously spoken sentences.
     IEEE Trans Acoustics Speech and Signal Processing, 28 (4): 357–366, Aug. 1980.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [x,cf,il,ih]=filtbankm(p,n,fs,fl,fh,w)
0002 %FILTBANKM determine matrix for a linear/mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W)
0003 %
0004 % Inputs:
0005 %       p   number of filters in filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))]
0006 %        n   length of fft
0007 %           or [nfrq dfrq frq1] nfrq=number of input frequency bins, frequency increment (Hz), first bin freq (Hz)
0008 %        fs  sample rate in Hz
0009 %        fl  low end of the lowest filter in Hz (see 'h' option) [default = 0 or 30Hz for 'l' option]
0010 %        fh  high end of highest filter in Hz [default = fs/2]
0011 %        w   any sensible combination of the following:
0012 %
0013 %             'b' = bark scale instead of mel
0014 %             'e' = erb-rate scale
0015 %             'l' = log10 Hz frequency scale
0016 %             'f' = linear frequency scale [default]
0017 %             'm' = mel frequency scale
0018 %
0019 %             'c' = fl & fh specify centre of low and high filters instead of edges
0020 %             'h' = fl & fh are in mel/erb/bark/log10 instead of Hz
0021 %             'H' = cf outputs are in mel/erb/bark/log10 instead of Hz
0022 %
0023 %              'y' = lowest filter remains at 1 down to 0 frequency and
0024 %                    highest filter remains at 1 up to nyquist freqency
0025 %                    The total power in the fft is preserved (unless 'u' is specified).
0026 %             'Y' = extend only at low frequency end (or high end if 'y' also specified)
0027 %
0028 %             'p' = input P specifies the number of filters [default if P>=1]
0029 %             'P' = input P specifies the filter spacing [default if P<1]
0030 %
0031 %             'u' = input and output are power per Hz instead of power.
0032 %             'U' = input is power but output is power per Hz.
0033 %
0034 %             's' = single-sided input: do not include symmetric negative frequencies (i.e. non-DC inputs have been doubled)
0035 %             'S' = single-sided output: do not mirror the non-DC filter characteristics (i.e. double non-DC outputs)
0036 %
0037 %             'g' = plot filter coefficients as graph
0038 %             'G' = plot filter coefficients as image [default if no output arguments present]
0039 %
0040 %
0041 % Outputs:    x     a sparse matrix containing the filterbank amplitudes
0042 %                  If the il and ih outputs are given then size(x)=[p,mx-mn+1]
0043 %                 otherwise size(x)=[p,1+floor(n/2)]
0044 %                 Note that the peak filter values equal 2 to account for the power
0045 %                 in the negative FFT frequencies.
0046 %           cf    the filterbank centre frequencies in Hz (see 'H' option)
0047 %            il    the lowest fft bin with a non-zero coefficient
0048 %            ih    the highest fft bin with a non-zero coefficient
0049 %
0050 % The routine performs interpolation of the input spectrum by convolving the power spectrum
0051 % with a triangular filter and then simulates a filterbank with asymetric triangular filters.
0052 %
0053 % Examples of use:
0054 %
0055 % (a) Calcuate the Mel-frequency Cepstral Coefficients
0056 %
0057 %       f=rfft(s);                    % rfft() returns only 1+floor(n/2) coefficients
0058 %        x=filtbankm(p,n,fs,0,fs/2,'m');            % n is the fft length, p is the number of filters wanted
0059 %        z=log(x*abs(f).^2);         % multiply x by the power spectrum
0060 %        c=dct(z);                   % take the DCT
0061 %
0062 % (b) Calcuate the Mel-frequency Cepstral Coefficients efficiently
0063 %
0064 %       f=fft(s);                        % n is the fft length, p is the number of filters wanted
0065 %       [x,cf,na,nb]=filtbankm(p,n,fs,0,fs/2,'m');   % na:nb gives the fft bins that are needed
0066 %       z=log(x*(f(na:nb)).*conj(f(na:nb)));
0067 %
0068 % (c) Plot the calculated filterbanks
0069 %
0070 %      plot((0:floor(n/2))*fs/n,filtbankm(p,n,fs,0,fs/2,'m')')   % fs=sample frequency
0071 %
0072 % (d) Plot the filterbanks
0073 %
0074 %      filtbankm(p,n,fs,0,fs/2,'m');
0075 %
0076 % References:
0077 %
0078 % [1] S. S. Stevens, J. Volkman, and E. B. Newman. A scale for the measurement
0079 %     of the psychological magnitude of pitch. J. Acoust Soc Amer, 8: 185–19, 1937.
0080 % [2] S. Davis and P. Mermelstein. Comparison of parametric representations for
0081 %     monosyllabic word recognition in continuously spoken sentences.
0082 %     IEEE Trans Acoustics Speech and Signal Processing, 28 (4): 357–366, Aug. 1980.
0083 
0084 % Bugs/Suggestions
0085 % (1) default frequencies won't work if the h option is specified
0086 % (2) low default frequency is invalid if the 'l' option is specified
0087 
0088 %      Copyright (C) Mike Brookes 1997-2009
0089 %      Version: $Id: filtbankm.m 713 2011-10-16 14:45:43Z dmb $
0090 %
0091 %   VOICEBOX is a MATLAB toolbox for speech processing.
0092 %   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
0093 %
0094 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0095 %   This program is free software; you can redistribute it and/or modify
0096 %   it under the terms of the GNU General Public License as published by
0097 %   the Free Software Foundation; either version 2 of the License, or
0098 %   (at your option) any later version.
0099 %
0100 %   This program is distributed in the hope that it will be useful,
0101 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0102 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0103 %   GNU General Public License for more details.
0104 %
0105 %   You can obtain a copy of the GNU General Public License from
0106 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0107 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0108 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0109 
0110 % Note "FFT bin_0" assumes DC = bin 0 whereas "FFT bin_1" means DC = bin 1
0111 
0112 if nargin < 6
0113     w='f'; % default option: linear frequency scale
0114 end
0115 wr=' ';   % default warping is linear frequency
0116 for i=1:length(w)
0117     if any(w(i)=='lebm');
0118         wr=w(i);
0119     end
0120 end
0121 if nargin < 5 || ~numel(fh)
0122     fh=0.5*fs; % max freq is the nyquist
0123 end
0124 if nargin < 4 || ~numel(fl)
0125     if wr=='l'
0126         fl=30;  % min freq is 30 Hz for log scale
0127     else
0128     fl=0; % min freq is DC
0129     end
0130 end
0131 
0132 f1=0;
0133 if numel(n)>1
0134     nf=n(1);  % number of input frequency bins
0135     df=n(2);  % input frequency bin spacing
0136     if numel(n)>2
0137         f1=n(3); % frequency of first bin
0138     end
0139 else
0140     nf=1+floor(n/2); % number of input frequency bins
0141     df=fs/n;  % input frequency bin spacing
0142 end
0143 fin0=f1+(0:nf-1)*df;  % input frequency bins
0144 
0145 mflh=[fl fh];
0146 if ~any(w=='h')             % convert Hz to mel/erb/...
0147     switch wr
0148         case 'm'
0149             mflh=frq2mel(mflh);       % convert frequency limits into mel
0150         case 'l'
0151             if fl<=0
0152                 error('Low frequency limit must be >0 for l option');
0153             end
0154             mflh=log10(mflh);       % convert frequency limits into log10 Hz
0155         case 'e'
0156             mflh=frq2erb(mflh);       % convert frequency limits into erb-rate
0157         case 'b'
0158             mflh=frq2bark(mflh);       % convert frequency limits into bark
0159     end
0160 end
0161 melrng=mflh*(-1:2:1)';          % mel/erb/... range
0162 % fn2=floor(n/2);     % bin index of highest positive frequency (Nyquist if n is even)
0163 if isempty(p)
0164     p=ceil(4.6*log10(2*(f1+(nf-1)*df)));         % default number of output filters
0165 end
0166 puc=any(w=='P') || (p<1) && ~any(w=='p');
0167 if any(w=='c')              % c option: specify fiter centres not edges
0168     if puc
0169         p=round(melrng/(p*1000))+1;
0170     end
0171     melinc=melrng/(p-1);
0172     mflh=mflh+(-1:2:1)*melinc;
0173 else
0174     if puc
0175         p=round(melrng/(p*1000))-1;
0176     end
0177     melinc=melrng/(p+1);
0178 end
0179 %
0180 % Calculate the FFT bins0 corresponding to the filters
0181 %
0182 cf=mflh(1)+(0:p+1)*melinc; % centre frequencies in mel/erb/... including dummy ends
0183 cf(2:end)=max(cf(2:end),0); % only the first point can be negative
0184 switch wr    % convert centre frequencies from mel/erb/... to Hz
0185     case 'l'
0186         mb=10.^(cf);
0187     case 'e'
0188         mb=erb2frq(cf);
0189     case 'b'
0190         mb=bark2frq(cf);
0191     case 'm'
0192         mb=mel2frq(cf);
0193     otherwise
0194         mb=cf;
0195 end
0196 
0197 % first sort out 2-sided input frequencies
0198 
0199 fin=fin0;
0200 fin(nf+1)=fin(nf)+df; % add on a dummy point at the high end
0201 if fin(1)==0
0202     fin=[-fin(nf+1:-1:2) fin];
0203 elseif fin(1)<=df/2
0204     fin=[-fin(nf+1:-1:1) fin];
0205 elseif fin(1)<df
0206     fin=[-fin(nf+1:-1:1) fin(1)-df df-fin(1) fin];
0207 elseif fin(1)==df
0208     fin=[-fin(nf+1:-1:1) 0 fin];
0209 else
0210     fin=[-fin(nf+1:-1:1) df-fin(1) fin(1)-df fin];
0211 end
0212 nfin=length(fin);  % length of extended input frequency list
0213 
0214 % now sort out the interleaving
0215 
0216 fout=mb;  % output frequencies in Hz
0217 lowex=any(w=='y')~=any(w=='Y');   % extend to 0 Hz
0218 highex=any(w=='y') && (fout(end-1)<fin(end));  % extend at high end
0219 if lowex
0220     fout=[0 0 fout(2:end)];
0221 end
0222 if highex
0223     fout=[fout(1:end-1) fin(end) fin(end)];
0224 end
0225 mfout=length(fout);
0226 if any(w=='u') || any(w=='U')
0227     gout=fout(3:mfout)-fout(1:mfout-2);
0228     gout=2*(gout+(gout==0)).^(-1); % Gain of output triangles
0229 else
0230     gout=ones(1,mfout-2);
0231 end
0232 if any(w=='S')
0233     msk=fout(2:mfout-1)~=0;
0234     gout(msk)=2*gout(msk); % double non-DC outputs for a 1-sided ouptu spectrum
0235 end
0236 if any(w=='u')
0237     gin=ones(1,nfin-2);
0238 else
0239     gin=fin(3:nfin)-fin(1:nfin-2);
0240     gin=2*(gin+(gin==0)).^(-1); % Gain of input triangles
0241 end
0242 msk=fin(2:end-1)==0;
0243 if any(w=='s')
0244     gin(~msk)=0.5*gin(~msk); % halve non-DC inputs to change back to a 2-sided spectrum
0245 end
0246 if lowex
0247     gin(msk)=2*gin(msk);  % double DC input to preserve its power
0248 end
0249 foutin=[fout fin];
0250 nfall=length(foutin);
0251 wleft=[0 fout(2:mfout)-fout(1:mfout-1) 0 fin(2:nfin)-fin(1:nfin-1)]; % left width
0252 wright=[wleft(2:end) 0]; % right width
0253 ffact=[0 gout 0 0 gin(1:min(nf,nfin-nf-2)) zeros(1,max(nfin-2*nf-2,0)) gin(nfin-nf-1:nfin-2) 0]; % gain of triangle posts
0254 % ffact(wleft+wright==0)=0; % disable null width triangles shouldn't need this if all frequencies are distinct
0255 [fall,ifall]=sort(foutin);
0256 jfall=zeros(1,nfall);
0257 infall=1:nfall;
0258 jfall(ifall)=infall; % unsort->sort index
0259 ffact(ifall([1:max(jfall(1),jfall(mfout+1))-2 min(jfall(mfout),jfall(nfall))+2:nfall]))=0;  % zap nodes that are much too small/big
0260 
0261 nxto=cumsum(ifall<=mfout);
0262 nxti=cumsum(ifall>mfout);
0263 nxtr=min(nxti+1+mfout,nfall);  % next input node to the right of each value (or nfall if none)
0264 nxtr(ifall>mfout)=1+nxto(ifall>mfout); % next post to the right of opposite type (unsorted indexes)
0265 nxtr=nxtr(jfall);  % next post to the right of opposite type (converted to unsorted indices) or if none: nfall/(mfout+1)
0266 
0267 % each triangle is "attached" to the node at its extreme right end
0268 % the general result for integrating the product of two trapesiums with
0269 % heights (a,b) and (c,d) over a width x is (ad+bc+2bd+2ac)*w/6
0270 %
0271 % integrate product of lower triangles
0272 
0273 msk0=(ffact>0);
0274 msk=msk0 & (ffact(nxtr)>0); % select appropriate triangle pairs (unsorted indices)
0275 ix1=infall(msk); % unsorted indices of leftmost post of pair
0276 jx1=nxtr(msk);  % unsorted indices of rightmost post of pair
0277 vfgx=foutin(ix1)-foutin(jx1-1); % length of right triangle to the left of the left post
0278 yx=min(wleft(ix1),vfgx); % integration length
0279 wx1=ffact(ix1).*ffact(jx1).*yx.*(wleft(ix1).*vfgx-yx.*(0.5*(wleft(ix1)+vfgx)-yx/3))./(wleft(ix1).*wleft(jx1)+(yx==0));
0280 
0281 % integrate product of upper triangles
0282 
0283 nxtu=max([nxtr(2:end)-1 0],1);
0284 msk=msk0 & (ffact(nxtu)>0);
0285 ix2=infall(msk); % unsorted indices of leftmost post of pair
0286 jx2=nxtu(msk);  % unsorted indices of rightmost post of pair
0287 vfgx=foutin(ix2+1)-foutin(jx2); % length of left triangle to the right of the right post
0288 yx=min(wright(ix2),vfgx); % integration length
0289 yx(foutin(jx2+1)<foutin(ix2+1))=0; % zap invalid triangles
0290 wx2=ffact(ix2).*ffact(jx2).*yx.^2.*((0.5*(wright(jx2)-vfgx)+yx/3))./(wright(ix2).*wright(jx2)+(yx==0));
0291 
0292 % integrate lower triangle and upper triangle that ends to its right
0293 
0294 nxtu=max(nxtr-1,1);
0295 msk=msk0 & (ffact(nxtu)>0);
0296 ix3=infall(msk); % unsorted indices of leftmost post of pair
0297 jx3=nxtu(msk);  % unsorted indices of rightmost post of pair
0298 vfgx=foutin(ix3)-foutin(jx3); % length of upper triangle to the left of the lower post
0299 yx=min(wleft(ix3),vfgx); % integration length
0300 yx(foutin(jx3+1)<foutin(ix3))=0; % zap invalid triangles
0301 wx3=ffact(ix3).*ffact(jx3).*yx.*(wleft(ix3).*(wright(jx3)-vfgx)+yx.*(0.5*(wleft(ix3)-wright(jx3)+vfgx)-yx/3))./(wleft(ix3).*wright(jx3)+(yx==0));
0302 
0303 % integrate upper triangle and lower triangle that starts to its right
0304 
0305 nxtu=[nxtr(2:end) 1];
0306 msk=msk0 & (ffact(nxtu)>0);
0307 ix4=infall(msk); % unsorted indices of leftmost post of pair
0308 jx4=nxtu(msk);  % unsorted indices of rightmost post of pair
0309 vfgx=foutin(ix4+1)-foutin(jx4-1); % length of upper triangle to the left of the lower post
0310 yx=min(wright(ix4),vfgx); % integration length
0311 wx4=ffact(ix4).*ffact(jx4).*yx.^2.*(0.5*vfgx-yx/3)./(wright(ix4).*wleft(jx4)+(yx==0));
0312 
0313 % now create the matrix
0314 
0315 iox=sort([ix1 ix2 ix3 ix4;jx1 jx2 jx3 jx4]);
0316 % [iox;[wx1 wx2 wx3 wx4]>0 ]
0317 msk=iox(2,:)<=(nfall+mfout)/2;
0318 iox(2,msk)=(nfall+mfout+1)-iox(2,msk);  % convert negative frequencies to positive
0319 if highex
0320     iox(1,iox(1,:)==mfout-1)=mfout-2; % merge highest two output nodes
0321 end
0322 if lowex
0323     iox(1,iox(1,:)==2)=3; % merge lowest two output nodes
0324 end
0325 
0326 x=sparse(iox(1,:)-1-lowex,max(iox(2,:)-nfall+nf+1,1),[wx1 wx2 wx3 wx4],p,nf);
0327 %
0328 % sort out the output argument options
0329 %
0330 if ~any(w=='H')
0331     cf=mb;         % output Hz instead of mel/erb/...
0332 end
0333 cf=cf(2:p+1);  % remove dummy end frequencies
0334 il=1;
0335 ih=nf;
0336 if nargout > 2
0337     msk=full(any(x>0,1));
0338     il=find(msk,1);
0339     if ~numel(il)
0340         ih=1;
0341     elseif nargout >3
0342         ih=find(msk,1,'last');
0343     end
0344     x=x(:,il:ih);
0345 end
0346 if any(w=='u')
0347     sx=sum(x,2);
0348     x=x./repmat(sx+(sx==0),1,size(x,2));
0349 end
0350 %
0351 % plot results if no output arguments or g option given
0352 %
0353 if ~nargout || any(w=='g') || any(w=='G') % plot idealized filters
0354     if ~any(w=='g') && ~any(w=='G')
0355         w=[w 'G'];
0356     end
0357     newfig=0;
0358     if  any(w=='g')
0359         plot(f1-df+(il:ih)*df,x');
0360         title(['filtbankm: mode = ' w]);
0361         xlabel(['Frequency (' xticksi 'Hz)']);
0362         ylabel('Weight');
0363         newfig=1;
0364     end
0365 
0366     if  any(w=='G')
0367         if newfig
0368             figure;
0369         end
0370         imagesc(f1-df+(il:ih)*df,1:p,x);
0371         axis 'xy'
0372         colorbar;
0373         cblabel('Weight');
0374         switch wr
0375             case 'l'
0376                 type='Log-spaced';
0377             case 'e'
0378                 type='Erb-spaced';
0379             case 'b'
0380                 type='Bark-spaced';
0381             case 'm'
0382                 type='Mel-spaced';
0383             otherwise
0384                 type='Linear-spaced';
0385         end
0386         ylabel([type ' Filter']);
0387         xlabel(['Frequency (' xticksi 'Hz)']);
0388         title(['filtbankm: mode = ' w]);
0389     end
0390 
0391 end

Generated on Mon 20-May-2013 10:53:22 by m2html © 2003