Home > voicebox > filtbankm.m

filtbankm

PURPOSE ^

FILTBANKM determine matrix for a linear/mel/erb/bark-spaced filterbank [X,IL,IH]=(P,N,FS,FL,FH,W)

SYNOPSIS ^

function [x,cf,il,ih]=filtbankm(p,n,fs,fl,fh,w)

DESCRIPTION ^

FILTBANKM determine matrix for a linear/mel/erb/bark-spaced filterbank [X,IL,IH]=(P,N,FS,FL,FH,W)

 Usage:
 (1) Calcuate the Mel-frequency Cepstral Coefficients

        f=rfft(s);                          % rfft() returns only 1+floor(n/2) coefficients
         x=filtbankm(p,n,fs,0,fs/2,'m');  % n is the fft length, p is the number of filters wanted
         z=log(x*abs(f).^2);              % multiply x by the power spectrum
         c=dct(z);                        % take the DCT

 (2) Calcuate the Mel-frequency Cepstral Coefficients efficiently

        f=fft(s);                                   % n is the fft length, p is the number of filters wanted
        [x,cf,na,nb]=filtbankm(p,n,fs,0,fs/2,'m');  % na:nb gives the fft bins that are needed
        z=log(x*(f(na:nb)).*conj(f(na:nb)));        % multiply x by the power spectrum
         c=dct(z);                                   % take the DCT

 (3) Plot the calculated filterbanks as a graph

        plot((0:floor(n/2))*fs/n,filtbankm(p,n,fs,0,fs/2,'m')')   % fs=sample frequency

 (4) Plot the calculated filterbanks as a spectrogram

        filtbankm(p,n,fs,0,fs/2,'m');

 Inputs:
       p   number of filters in filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))]
        n   length of fft
           or [nfrq dfrq frq1] nfrq=number of input frequency bins, frequency increment (Hz), first bin freq (Hz)
        fs  sample rate in Hz
        fl  low end of the lowest filter in Hz (see 'h' option) [default = 0 or 30Hz for 'l' option]
        fh  high end of highest filter in Hz [default = fs/2]
        w   any sensible combination of the following:

             'b' = bark scale instead of mel
             'e' = erb-rate scale
             'l' = log10 Hz frequency scale
             'f' = linear frequency scale [default]
             'm' = mel frequency scale

             'n' - round to the nearest FFT bin so each row of x contains only one non-zero entry

             'c' = fl & fh specify centre of low and high filters instead of edges
             'h' = fl & fh are in mel/erb/bark/log10 instead of Hz
             'H' = cf outputs are in mel/erb/bark/log10 instead of Hz

              'y' = lowest filter remains at 1 down to 0 frequency and
                    highest filter remains at 1 up to nyquist freqency
                    The total power in the fft is preserved (unless 'u' is
                    specified).
             'Y' = extend only at low frequency end (or high end if 'y' also specified)

             'p' = input P specifies the number of filters [default if P>=1]
             'P' = input P specifies the filter spacing [default if P<1]

             'u' = input and output are power per Hz instead of power.
             'U' = input is power but output is power per Hz.

             's' = single-sided input: do not include symmetric negative frequencies (i.e. non-DC inputs have been doubled)
             'S' = single-sided output: do not mirror the non-DC filter characteristics (i.e. double non-DC outputs)

             'g' = plot filter coefficients as graph
             'G' = plot filter coefficients as image [default if no output arguments present]


 Outputs:    x     a sparse matrix containing the filterbank amplitudes
                  If the il and ih outputs are given then size(x)=[p,ih-il+1]
                 otherwise size(x)=[p,1+floor(n/2)]
                 Note that the peak filter values equal 2 to account for the power
                 in the negative FFT frequencies.
           cf    the filterbank centre frequencies in Hz (see 'H' option)
            il    the lowest fft bin with a non-zero coefficient
            ih    the highest fft bin with a non-zero coefficient

 The routine performs interpolation of the input spectrum by convolving the power spectrum
 with a triangular filter and then simulates a filterbank with asymetric triangular filters.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [x,cf,il,ih]=filtbankm(p,n,fs,fl,fh,w)
0002 %FILTBANKM determine matrix for a linear/mel/erb/bark-spaced filterbank [X,IL,IH]=(P,N,FS,FL,FH,W)
0003 %
0004 % Usage:
0005 % (1) Calcuate the Mel-frequency Cepstral Coefficients
0006 %
0007 %        f=rfft(s);                          % rfft() returns only 1+floor(n/2) coefficients
0008 %         x=filtbankm(p,n,fs,0,fs/2,'m');  % n is the fft length, p is the number of filters wanted
0009 %         z=log(x*abs(f).^2);              % multiply x by the power spectrum
0010 %         c=dct(z);                        % take the DCT
0011 %
0012 % (2) Calcuate the Mel-frequency Cepstral Coefficients efficiently
0013 %
0014 %        f=fft(s);                                   % n is the fft length, p is the number of filters wanted
0015 %        [x,cf,na,nb]=filtbankm(p,n,fs,0,fs/2,'m');  % na:nb gives the fft bins that are needed
0016 %        z=log(x*(f(na:nb)).*conj(f(na:nb)));        % multiply x by the power spectrum
0017 %         c=dct(z);                                   % take the DCT
0018 %
0019 % (3) Plot the calculated filterbanks as a graph
0020 %
0021 %        plot((0:floor(n/2))*fs/n,filtbankm(p,n,fs,0,fs/2,'m')')   % fs=sample frequency
0022 %
0023 % (4) Plot the calculated filterbanks as a spectrogram
0024 %
0025 %        filtbankm(p,n,fs,0,fs/2,'m');
0026 %
0027 % Inputs:
0028 %       p   number of filters in filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))]
0029 %        n   length of fft
0030 %           or [nfrq dfrq frq1] nfrq=number of input frequency bins, frequency increment (Hz), first bin freq (Hz)
0031 %        fs  sample rate in Hz
0032 %        fl  low end of the lowest filter in Hz (see 'h' option) [default = 0 or 30Hz for 'l' option]
0033 %        fh  high end of highest filter in Hz [default = fs/2]
0034 %        w   any sensible combination of the following:
0035 %
0036 %             'b' = bark scale instead of mel
0037 %             'e' = erb-rate scale
0038 %             'l' = log10 Hz frequency scale
0039 %             'f' = linear frequency scale [default]
0040 %             'm' = mel frequency scale
0041 %
0042 %             'n' - round to the nearest FFT bin so each row of x contains only one non-zero entry
0043 %
0044 %             'c' = fl & fh specify centre of low and high filters instead of edges
0045 %             'h' = fl & fh are in mel/erb/bark/log10 instead of Hz
0046 %             'H' = cf outputs are in mel/erb/bark/log10 instead of Hz
0047 %
0048 %              'y' = lowest filter remains at 1 down to 0 frequency and
0049 %                    highest filter remains at 1 up to nyquist freqency
0050 %                    The total power in the fft is preserved (unless 'u' is
0051 %                    specified).
0052 %             'Y' = extend only at low frequency end (or high end if 'y' also specified)
0053 %
0054 %             'p' = input P specifies the number of filters [default if P>=1]
0055 %             'P' = input P specifies the filter spacing [default if P<1]
0056 %
0057 %             'u' = input and output are power per Hz instead of power.
0058 %             'U' = input is power but output is power per Hz.
0059 %
0060 %             's' = single-sided input: do not include symmetric negative frequencies (i.e. non-DC inputs have been doubled)
0061 %             'S' = single-sided output: do not mirror the non-DC filter characteristics (i.e. double non-DC outputs)
0062 %
0063 %             'g' = plot filter coefficients as graph
0064 %             'G' = plot filter coefficients as image [default if no output arguments present]
0065 %
0066 %
0067 % Outputs:    x     a sparse matrix containing the filterbank amplitudes
0068 %                  If the il and ih outputs are given then size(x)=[p,ih-il+1]
0069 %                 otherwise size(x)=[p,1+floor(n/2)]
0070 %                 Note that the peak filter values equal 2 to account for the power
0071 %                 in the negative FFT frequencies.
0072 %           cf    the filterbank centre frequencies in Hz (see 'H' option)
0073 %            il    the lowest fft bin with a non-zero coefficient
0074 %            ih    the highest fft bin with a non-zero coefficient
0075 %
0076 % The routine performs interpolation of the input spectrum by convolving the power spectrum
0077 % with a triangular filter and then simulates a filterbank with asymetric triangular filters.
0078 
0079 %
0080 % References:
0081 %
0082 % [1] S. S. Stevens, J. Volkman, and E. B. Newman. A scale for the measurement
0083 %     of the psychological magnitude of pitch. J. Acoust Soc Amer, 8: 18519, 1937.
0084 % [2] S. Davis and P. Mermelstein. Comparison of parametric representations for
0085 %     monosyllabic word recognition in continuously spoken sentences.
0086 %     IEEE Trans Acoustics Speech and Signal Processing, 28 (4): 357366, Aug. 1980.
0087 
0088 % Bugs/Suggestions
0089 % (1) default frequencies won't work if the h option is specified
0090 % (2) low default frequency is invalid if the 'l' option is specified
0091 
0092 %      Copyright (C) Mike Brookes 1997-2009
0093 %      Version: $Id: filtbankm.m 8483 2016-09-13 15:48:05Z dmb $
0094 %
0095 %   VOICEBOX is a MATLAB toolbox for speech processing.
0096 %   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
0097 %
0098 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0099 %   This program is free software; you can redistribute it and/or modify
0100 %   it under the terms of the GNU General Public License as published by
0101 %   the Free Software Foundation; either version 2 of the License, or
0102 %   (at your option) any later version.
0103 %
0104 %   This program is distributed in the hope that it will be useful,
0105 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0106 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0107 %   GNU General Public License for more details.
0108 %
0109 %   You can obtain a copy of the GNU General Public License from
0110 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0111 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0112 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0113 
0114 % Note "FFT bin_0" assumes DC = bin 0 whereas "FFT bin_1" means DC = bin 1
0115 
0116 if nargin < 6
0117     w='f'; % default option: linear frequency scale
0118 end
0119 wr=' ';   % default warping is linear frequency
0120 for i=1:length(w)
0121     if any(w(i)=='lebm');
0122         wr=w(i);
0123     end
0124 end
0125 if nargin < 5 || ~numel(fh)
0126     fh=0.5*fs; % max freq is the nyquist
0127 end
0128 if nargin < 4 || ~numel(fl)
0129     if wr=='l'
0130         fl=30;  % min freq is 30 Hz for log scale
0131     else
0132         fl=0; % min freq is DC
0133     end
0134 end
0135 
0136 fa=0;
0137 if numel(n)>1
0138     nf=n(1);  % number of input frequency bins
0139     df=n(2);  % input frequency bin spacing
0140     if numel(n)>2
0141         fa=n(3); % frequency of first bin
0142     end
0143 else
0144     nf=1+floor(n/2); % number of input frequency bins
0145     df=fs/n;  % input frequency bin spacing
0146 end
0147 fin0=fa+(0:nf-1)*df;  % input frequency bins
0148 
0149 mflh=[fl fh];
0150 if ~any(w=='h')             % convert Hz to mel/erb/...
0151     switch wr
0152         case 'm'
0153             mflh=frq2mel(mflh);       % convert frequency limits into mel
0154         case 'l'
0155             if fl<=0
0156                 error('Low frequency limit must be >0 for l option');
0157             end
0158             mflh=log10(mflh);       % convert frequency limits into log10 Hz
0159         case 'e'
0160             mflh=frq2erb(mflh);       % convert frequency limits into erb-rate
0161         case 'b'
0162             mflh=frq2bark(mflh);       % convert frequency limits into bark
0163     end
0164 end
0165 melrng=mflh*(-1:2:1)';          % mel/erb/... range
0166 % fn2=floor(n/2);     % bin index of highest positive frequency (Nyquist if n is even)
0167 if isempty(p)
0168     p=ceil(4.6*log10(2*(fa+(nf-1)*df)));         % default number of output filters
0169 end
0170 puc=any(w=='P') || (p<1) && ~any(w=='p');
0171 if any(w=='c')              % c option: specify fiter centres not edges
0172     if puc
0173         p=round(melrng/(p*1000))+1;
0174     end
0175     melinc=melrng/(p-1);
0176     mflh=mflh+(-1:2:1)*melinc;
0177 else
0178     if puc
0179         p=round(melrng/(p*1000))-1;
0180     end
0181     melinc=melrng/(p+1);
0182 end
0183 %
0184 % Calculate the FFT bins0 corresponding to the filters
0185 %
0186 cf=mflh(1)+(0:p+1)*melinc; % centre frequencies in mel/erb/... including dummy ends
0187 cf(2:end)=max(cf(2:end),0); % only the first point can be negative
0188 switch wr    % convert centre frequencies from mel/erb/... to Hz
0189     case 'l'
0190         mb=10.^(cf);
0191     case 'e'
0192         mb=erb2frq(cf);
0193     case 'b'
0194         mb=bark2frq(cf);
0195     case 'm'
0196         mb=mel2frq(cf);
0197     otherwise
0198         mb=cf;
0199 end
0200 
0201 % first sort out 2-sided input frequencies
0202 
0203 fin=fin0;
0204 fin(nf+1)=fin(nf)+df; % add on a dummy point at the high end
0205 if fin(1)==0
0206     fin=[-fin(nf+1:-1:2) fin];
0207 elseif fin(1)<=df/2
0208     fin=[-fin(nf+1:-1:1) fin];
0209 elseif fin(1)<df
0210     fin=[-fin(nf+1:-1:1) fin(1)-df df-fin(1) fin];
0211 elseif fin(1)==df
0212     fin=[-fin(nf+1:-1:1) 0 fin];
0213 else
0214     fin=[-fin(nf+1:-1:1) df-fin(1) fin(1)-df fin];
0215 end
0216 nfin=length(fin);  % length of extended input frequency list
0217 
0218 % now sort out the interleaving
0219 
0220 fout=mb;  % output frequencies in Hz
0221 lowex=any(w=='y')~=any(w=='Y');   % extend to 0 Hz
0222 highex=any(w=='y') && (fout(end-1)<fin(end));  % extend at high end
0223 if lowex
0224     fout=[0 0 fout(2:end)];
0225 end
0226 if highex
0227     fout=[fout(1:end-1) fin(end) fin(end)];
0228 end
0229 mfout=length(fout);
0230 if any(w=='u') || any(w=='U')
0231     gout=fout(3:mfout)-fout(1:mfout-2);
0232     gout=2*(gout+(gout==0)).^(-1); % Gain of output triangles
0233 else
0234     gout=ones(1,mfout-2);
0235 end
0236 if any(w=='S')
0237     msk=fout(2:mfout-1)~=0;
0238     gout(msk)=2*gout(msk); % double non-DC outputs for a 1-sided output spectrum
0239 end
0240 if any(w=='u')
0241     gin=ones(1,nfin-2);
0242 else
0243     gin=fin(3:nfin)-fin(1:nfin-2);
0244     gin=2*(gin+(gin==0)).^(-1); % Gain of input triangles
0245 end
0246 msk=fin(2:end-1)==0;
0247 if any(w=='s')
0248     gin(~msk)=0.5*gin(~msk); % halve non-DC inputs to change back to a 2-sided spectrum
0249 end
0250 if lowex
0251     gin(msk)=2*gin(msk);  % double DC input to preserve its power
0252 end
0253 foutin=[fout fin];
0254 nfall=length(foutin);
0255 wleft=[0 fout(2:mfout)-fout(1:mfout-1) 0 fin(2:nfin)-fin(1:nfin-1)]; % left width
0256 wright=[wleft(2:end) 0]; % right width
0257 ffact=[0 gout 0 0 gin(1:min(nf,nfin-nf-2)) zeros(1,max(nfin-2*nf-2,0)) gin(nfin-nf-1:nfin-2) 0]; % gain of triangle posts
0258 % ffact(wleft+wright==0)=0; % disable null width triangles shouldn't need this if all frequencies are distinct
0259 [fall,ifall]=sort(foutin);
0260 jfall=zeros(1,nfall);
0261 infall=1:nfall;
0262 jfall(ifall)=infall; % unsort->sort index
0263 ffact(ifall([1:max(jfall(1),jfall(mfout+1))-2 min(jfall(mfout),jfall(nfall))+2:nfall]))=0;  % zap nodes that are much too small/big
0264 
0265 nxto=cumsum(ifall<=mfout);
0266 nxti=cumsum(ifall>mfout);
0267 nxtr=min(nxti+1+mfout,nfall);  % next input node to the right of each value (or nfall if none)
0268 nxtr(ifall>mfout)=1+nxto(ifall>mfout); % next post to the right of opposite type (unsorted indexes)
0269 nxtr=nxtr(jfall);  % next post to the right of opposite type (converted to unsorted indices) or if none: nfall/(mfout+1)
0270 
0271 % each triangle is "attached" to the node at its extreme right end
0272 % the general result for integrating the product of two trapesiums with
0273 % heights (a,b) and (c,d) over a width x is (ad+bc+2bd+2ac)*w/6
0274 %
0275 % integrate product of lower triangles
0276 
0277 msk0=(ffact>0);
0278 msk=msk0 & (ffact(nxtr)>0); % select appropriate triangle pairs (unsorted indices)
0279 ix1=infall(msk); % unsorted indices of leftmost post of pair
0280 jx1=nxtr(msk);  % unsorted indices of rightmost post of pair
0281 vfgx=foutin(ix1)-foutin(jx1-1); % length of right triangle to the left of the left post
0282 yx=min(wleft(ix1),vfgx); % integration length
0283 wx1=ffact(ix1).*ffact(jx1).*yx.*(wleft(ix1).*vfgx-yx.*(0.5*(wleft(ix1)+vfgx)-yx/3))./(wleft(ix1).*wleft(jx1)+(yx==0));
0284 
0285 % integrate product of upper triangles
0286 
0287 nxtu=max([nxtr(2:end)-1 0],1);
0288 msk=msk0 & (ffact(nxtu)>0);
0289 ix2=infall(msk); % unsorted indices of leftmost post of pair
0290 jx2=nxtu(msk);  % unsorted indices of rightmost post of pair
0291 vfgx=foutin(ix2+1)-foutin(jx2); % length of left triangle to the right of the right post
0292 yx=min(wright(ix2),vfgx); % integration length
0293 yx(foutin(jx2+1)<foutin(ix2+1))=0; % zap invalid triangles
0294 wx2=ffact(ix2).*ffact(jx2).*yx.^2.*((0.5*(wright(jx2)-vfgx)+yx/3))./(wright(ix2).*wright(jx2)+(yx==0));
0295 
0296 % integrate lower triangle and upper triangle that ends to its right
0297 
0298 nxtu=max(nxtr-1,1);
0299 msk=msk0 & (ffact(nxtu)>0);
0300 ix3=infall(msk); % unsorted indices of leftmost post of pair
0301 jx3=nxtu(msk);  % unsorted indices of rightmost post of pair
0302 vfgx=foutin(ix3)-foutin(jx3); % length of upper triangle to the left of the lower post
0303 yx=min(wleft(ix3),vfgx); % integration length
0304 yx(foutin(jx3+1)<foutin(ix3))=0; % zap invalid triangles
0305 wx3=ffact(ix3).*ffact(jx3).*yx.*(wleft(ix3).*(wright(jx3)-vfgx)+yx.*(0.5*(wleft(ix3)-wright(jx3)+vfgx)-yx/3))./(wleft(ix3).*wright(jx3)+(yx==0));
0306 
0307 % integrate upper triangle and lower triangle that starts to its right
0308 
0309 nxtu=[nxtr(2:end) 1];
0310 msk=msk0 & (ffact(nxtu)>0);
0311 ix4=infall(msk); % unsorted indices of leftmost post of pair
0312 jx4=nxtu(msk);  % unsorted indices of rightmost post of pair
0313 vfgx=foutin(ix4+1)-foutin(jx4-1); % length of upper triangle to the left of the lower post
0314 yx=min(wright(ix4),vfgx); % integration length
0315 wx4=ffact(ix4).*ffact(jx4).*yx.^2.*(0.5*vfgx-yx/3)./(wright(ix4).*wleft(jx4)+(yx==0));
0316 
0317 % now create the matrix
0318 
0319 iox=sort([ix1 ix2 ix3 ix4;jx1 jx2 jx3 jx4]);
0320 % [iox;[wx1 wx2 wx3 wx4]>0 ]
0321 msk=iox(2,:)<=(nfall+mfout)/2;
0322 iox(2,msk)=(nfall+mfout+1)-iox(2,msk);  % convert negative frequencies to positive
0323 if highex
0324     iox(1,iox(1,:)==mfout-1)=mfout-2; % merge highest two output nodes
0325 end
0326 if lowex
0327     iox(1,iox(1,:)==2)=3; % merge lowest two output nodes
0328 end
0329 
0330 x=sparse(iox(1,:)-1-lowex,max(iox(2,:)-nfall+nf+1,1),[wx1 wx2 wx3 wx4],p,nf);
0331 %
0332 % sort out the output argument options
0333 %
0334 if ~any(w=='H')
0335     cf=mb;         % output Hz instead of mel/erb/...
0336 end
0337 cf=cf(2:p+1);  % remove dummy end frequencies
0338 il=1;
0339 ih=nf;
0340 if any(w=='n') % round outputs to the centre of gravity bin
0341     sx2=sum(x,2);
0342     msk=full(sx2~=0);
0343     vxc=zeros(p,1);
0344     vxc(msk)=round((x(msk,:)*(1:nf)')./sx2(msk));
0345     x=sparse(1:p,vxc,sx2,p,nf);
0346 end
0347 if nargout > 2
0348     msk=full(any(x>0,1));
0349     il=find(msk,1);
0350     if ~numel(il)
0351         ih=1;
0352     elseif nargout >3
0353         ih=find(msk,1,'last');
0354     end
0355     x=x(:,il:ih);
0356 end
0357 if any(w=='u')
0358     sx=sum(x,2);
0359     x=x./repmat(sx+(sx==0),1,size(x,2));
0360 end
0361 %
0362 % plot results if no output arguments or g option given
0363 %
0364 if ~nargout || any(w=='g') || any(w=='G') % plot idealized filters
0365     if ~any(w=='g') && ~any(w=='G')
0366         w=[w 'G'];
0367     end
0368     newfig=0;
0369     if  any(w=='g')
0370         plot(fa-df+(il:ih)*df,x');
0371         title(['filtbankm: mode = ' w]);
0372         xlabel(['Frequency (' xticksi 'Hz)']);
0373         ylabel('Weight');
0374         newfig=1;
0375     end
0376     
0377     if  any(w=='G')
0378         if newfig
0379             figure;
0380         end
0381         imagesc(fa-df+(il:ih)*df,1:p,x);
0382         axis 'xy'
0383         colorbar;
0384         cblabel('Weight');
0385         switch wr
0386             case 'l'
0387                 type='Log-spaced';
0388             case 'e'
0389                 type='Erb-spaced';
0390             case 'b'
0391                 type='Bark-spaced';
0392             case 'm'
0393                 type='Mel-spaced';
0394             otherwise
0395                 type='Linear-spaced';
0396         end
0397         ylabel([type ' Filter']);
0398         xlabel(['Frequency (' xticksi 'Hz)']);
0399         title(['filtbankm: mode = ' w]);
0400     end
0401     
0402 end

Generated on Tue 10-Oct-2017 08:30:10 by m2html © 2003