


FILTBANKM determine matrix for a linear/mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W)
Inputs:
p number of filters in filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))]
n length of fft
or [nfrq dfrq frq1] nfrq=number of input frequency bins, frequency increment (Hz), first bin freq (Hz)
fs sample rate in Hz
fl low end of the lowest filter in Hz (see 'h' option) [default = 0 or 30Hz for 'l' option]
fh high end of highest filter in Hz [default = fs/2]
w any sensible combination of the following:
'b' = bark scale instead of mel
'e' = erb-rate scale
'l' = log10 Hz frequency scale
'f' = linear frequency scale [default]
'm' = mel frequency scale
'c' = fl & fh specify centre of low and high filters instead of edges
'h' = fl & fh are in mel/erb/bark/log10 instead of Hz
'H' = cf outputs are in mel/erb/bark/log10 instead of Hz
'y' = lowest filter remains at 1 down to 0 frequency and
highest filter remains at 1 up to nyquist freqency
The total power in the fft is preserved (unless 'u' is specified).
'Y' = extend only at low frequency end (or high end if 'y' also specified)
'p' = input P specifies the number of filters [default if P>=1]
'P' = input P specifies the filter spacing [default if P<1]
'u' = input and output are power per Hz instead of power.
'U' = input is power but output is power per Hz.
's' = single-sided input: do not include symmetric negative frequencies (i.e. non-DC inputs have been doubled)
'S' = single-sided output: do not mirror the non-DC filter characteristics (i.e. double non-DC outputs)
'g' = plot filter coefficients as graph
'G' = plot filter coefficients as image [default if no output arguments present]
Outputs: x a sparse matrix containing the filterbank amplitudes
If the il and ih outputs are given then size(x)=[p,mx-mn+1]
otherwise size(x)=[p,1+floor(n/2)]
Note that the peak filter values equal 2 to account for the power
in the negative FFT frequencies.
cf the filterbank centre frequencies in Hz (see 'H' option)
il the lowest fft bin with a non-zero coefficient
ih the highest fft bin with a non-zero coefficient
The routine performs interpolation of the input spectrum by convolving the power spectrum
with a triangular filter and then simulates a filterbank with asymetric triangular filters.
Examples of use:
(a) Calcuate the Mel-frequency Cepstral Coefficients
f=rfft(s); % rfft() returns only 1+floor(n/2) coefficients
x=filtbankm(p,n,fs,0,fs/2,'m'); % n is the fft length, p is the number of filters wanted
z=log(x*abs(f).^2); % multiply x by the power spectrum
c=dct(z); % take the DCT
(b) Calcuate the Mel-frequency Cepstral Coefficients efficiently
f=fft(s); % n is the fft length, p is the number of filters wanted
[x,cf,na,nb]=filtbankm(p,n,fs,0,fs/2,'m'); % na:nb gives the fft bins that are needed
z=log(x*(f(na:nb)).*conj(f(na:nb)));
(c) Plot the calculated filterbanks
plot((0:floor(n/2))*fs/n,filtbankm(p,n,fs,0,fs/2,'m')') % fs=sample frequency
(d) Plot the filterbanks
filtbankm(p,n,fs,0,fs/2,'m');
References:
[1] S. S. Stevens, J. Volkman, and E. B. Newman. A scale for the measurement
of the psychological magnitude of pitch. J. Acoust Soc Amer, 8: 185–19, 1937.
[2] S. Davis and P. Mermelstein. Comparison of parametric representations for
monosyllabic word recognition in continuously spoken sentences.
IEEE Trans Acoustics Speech and Signal Processing, 28 (4): 357–366, Aug. 1980.

0001 function [x,cf,il,ih]=filtbankm(p,n,fs,fl,fh,w) 0002 %FILTBANKM determine matrix for a linear/mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W) 0003 % 0004 % Inputs: 0005 % p number of filters in filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))] 0006 % n length of fft 0007 % or [nfrq dfrq frq1] nfrq=number of input frequency bins, frequency increment (Hz), first bin freq (Hz) 0008 % fs sample rate in Hz 0009 % fl low end of the lowest filter in Hz (see 'h' option) [default = 0 or 30Hz for 'l' option] 0010 % fh high end of highest filter in Hz [default = fs/2] 0011 % w any sensible combination of the following: 0012 % 0013 % 'b' = bark scale instead of mel 0014 % 'e' = erb-rate scale 0015 % 'l' = log10 Hz frequency scale 0016 % 'f' = linear frequency scale [default] 0017 % 'm' = mel frequency scale 0018 % 0019 % 'c' = fl & fh specify centre of low and high filters instead of edges 0020 % 'h' = fl & fh are in mel/erb/bark/log10 instead of Hz 0021 % 'H' = cf outputs are in mel/erb/bark/log10 instead of Hz 0022 % 0023 % 'y' = lowest filter remains at 1 down to 0 frequency and 0024 % highest filter remains at 1 up to nyquist freqency 0025 % The total power in the fft is preserved (unless 'u' is specified). 0026 % 'Y' = extend only at low frequency end (or high end if 'y' also specified) 0027 % 0028 % 'p' = input P specifies the number of filters [default if P>=1] 0029 % 'P' = input P specifies the filter spacing [default if P<1] 0030 % 0031 % 'u' = input and output are power per Hz instead of power. 0032 % 'U' = input is power but output is power per Hz. 0033 % 0034 % 's' = single-sided input: do not include symmetric negative frequencies (i.e. non-DC inputs have been doubled) 0035 % 'S' = single-sided output: do not mirror the non-DC filter characteristics (i.e. double non-DC outputs) 0036 % 0037 % 'g' = plot filter coefficients as graph 0038 % 'G' = plot filter coefficients as image [default if no output arguments present] 0039 % 0040 % 0041 % Outputs: x a sparse matrix containing the filterbank amplitudes 0042 % If the il and ih outputs are given then size(x)=[p,mx-mn+1] 0043 % otherwise size(x)=[p,1+floor(n/2)] 0044 % Note that the peak filter values equal 2 to account for the power 0045 % in the negative FFT frequencies. 0046 % cf the filterbank centre frequencies in Hz (see 'H' option) 0047 % il the lowest fft bin with a non-zero coefficient 0048 % ih the highest fft bin with a non-zero coefficient 0049 % 0050 % The routine performs interpolation of the input spectrum by convolving the power spectrum 0051 % with a triangular filter and then simulates a filterbank with asymetric triangular filters. 0052 % 0053 % Examples of use: 0054 % 0055 % (a) Calcuate the Mel-frequency Cepstral Coefficients 0056 % 0057 % f=rfft(s); % rfft() returns only 1+floor(n/2) coefficients 0058 % x=filtbankm(p,n,fs,0,fs/2,'m'); % n is the fft length, p is the number of filters wanted 0059 % z=log(x*abs(f).^2); % multiply x by the power spectrum 0060 % c=dct(z); % take the DCT 0061 % 0062 % (b) Calcuate the Mel-frequency Cepstral Coefficients efficiently 0063 % 0064 % f=fft(s); % n is the fft length, p is the number of filters wanted 0065 % [x,cf,na,nb]=filtbankm(p,n,fs,0,fs/2,'m'); % na:nb gives the fft bins that are needed 0066 % z=log(x*(f(na:nb)).*conj(f(na:nb))); 0067 % 0068 % (c) Plot the calculated filterbanks 0069 % 0070 % plot((0:floor(n/2))*fs/n,filtbankm(p,n,fs,0,fs/2,'m')') % fs=sample frequency 0071 % 0072 % (d) Plot the filterbanks 0073 % 0074 % filtbankm(p,n,fs,0,fs/2,'m'); 0075 % 0076 % References: 0077 % 0078 % [1] S. S. Stevens, J. Volkman, and E. B. Newman. A scale for the measurement 0079 % of the psychological magnitude of pitch. J. Acoust Soc Amer, 8: 185–19, 1937. 0080 % [2] S. Davis and P. Mermelstein. Comparison of parametric representations for 0081 % monosyllabic word recognition in continuously spoken sentences. 0082 % IEEE Trans Acoustics Speech and Signal Processing, 28 (4): 357–366, Aug. 1980. 0083 0084 % Bugs/Suggestions 0085 % (1) default frequencies won't work if the h option is specified 0086 % (2) low default frequency is invalid if the 'l' option is specified 0087 0088 % Copyright (C) Mike Brookes 1997-2009 0089 % Version: $Id: filtbankm.m 713 2011-10-16 14:45:43Z dmb $ 0090 % 0091 % VOICEBOX is a MATLAB toolbox for speech processing. 0092 % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html 0093 % 0094 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0095 % This program is free software; you can redistribute it and/or modify 0096 % it under the terms of the GNU General Public License as published by 0097 % the Free Software Foundation; either version 2 of the License, or 0098 % (at your option) any later version. 0099 % 0100 % This program is distributed in the hope that it will be useful, 0101 % but WITHOUT ANY WARRANTY; without even the implied warranty of 0102 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 0103 % GNU General Public License for more details. 0104 % 0105 % You can obtain a copy of the GNU General Public License from 0106 % http://www.gnu.org/copyleft/gpl.html or by writing to 0107 % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. 0108 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0109 0110 % Note "FFT bin_0" assumes DC = bin 0 whereas "FFT bin_1" means DC = bin 1 0111 0112 if nargin < 6 0113 w='f'; % default option: linear frequency scale 0114 end 0115 wr=' '; % default warping is linear frequency 0116 for i=1:length(w) 0117 if any(w(i)=='lebm'); 0118 wr=w(i); 0119 end 0120 end 0121 if nargin < 5 || ~numel(fh) 0122 fh=0.5*fs; % max freq is the nyquist 0123 end 0124 if nargin < 4 || ~numel(fl) 0125 if wr=='l' 0126 fl=30; % min freq is 30 Hz for log scale 0127 else 0128 fl=0; % min freq is DC 0129 end 0130 end 0131 0132 f1=0; 0133 if numel(n)>1 0134 nf=n(1); % number of input frequency bins 0135 df=n(2); % input frequency bin spacing 0136 if numel(n)>2 0137 f1=n(3); % frequency of first bin 0138 end 0139 else 0140 nf=1+floor(n/2); % number of input frequency bins 0141 df=fs/n; % input frequency bin spacing 0142 end 0143 fin0=f1+(0:nf-1)*df; % input frequency bins 0144 0145 mflh=[fl fh]; 0146 if ~any(w=='h') % convert Hz to mel/erb/... 0147 switch wr 0148 case 'm' 0149 mflh=frq2mel(mflh); % convert frequency limits into mel 0150 case 'l' 0151 if fl<=0 0152 error('Low frequency limit must be >0 for l option'); 0153 end 0154 mflh=log10(mflh); % convert frequency limits into log10 Hz 0155 case 'e' 0156 mflh=frq2erb(mflh); % convert frequency limits into erb-rate 0157 case 'b' 0158 mflh=frq2bark(mflh); % convert frequency limits into bark 0159 end 0160 end 0161 melrng=mflh*(-1:2:1)'; % mel/erb/... range 0162 % fn2=floor(n/2); % bin index of highest positive frequency (Nyquist if n is even) 0163 if isempty(p) 0164 p=ceil(4.6*log10(2*(f1+(nf-1)*df))); % default number of output filters 0165 end 0166 puc=any(w=='P') || (p<1) && ~any(w=='p'); 0167 if any(w=='c') % c option: specify fiter centres not edges 0168 if puc 0169 p=round(melrng/(p*1000))+1; 0170 end 0171 melinc=melrng/(p-1); 0172 mflh=mflh+(-1:2:1)*melinc; 0173 else 0174 if puc 0175 p=round(melrng/(p*1000))-1; 0176 end 0177 melinc=melrng/(p+1); 0178 end 0179 % 0180 % Calculate the FFT bins0 corresponding to the filters 0181 % 0182 cf=mflh(1)+(0:p+1)*melinc; % centre frequencies in mel/erb/... including dummy ends 0183 cf(2:end)=max(cf(2:end),0); % only the first point can be negative 0184 switch wr % convert centre frequencies from mel/erb/... to Hz 0185 case 'l' 0186 mb=10.^(cf); 0187 case 'e' 0188 mb=erb2frq(cf); 0189 case 'b' 0190 mb=bark2frq(cf); 0191 case 'm' 0192 mb=mel2frq(cf); 0193 otherwise 0194 mb=cf; 0195 end 0196 0197 % first sort out 2-sided input frequencies 0198 0199 fin=fin0; 0200 fin(nf+1)=fin(nf)+df; % add on a dummy point at the high end 0201 if fin(1)==0 0202 fin=[-fin(nf+1:-1:2) fin]; 0203 elseif fin(1)<=df/2 0204 fin=[-fin(nf+1:-1:1) fin]; 0205 elseif fin(1)<df 0206 fin=[-fin(nf+1:-1:1) fin(1)-df df-fin(1) fin]; 0207 elseif fin(1)==df 0208 fin=[-fin(nf+1:-1:1) 0 fin]; 0209 else 0210 fin=[-fin(nf+1:-1:1) df-fin(1) fin(1)-df fin]; 0211 end 0212 nfin=length(fin); % length of extended input frequency list 0213 0214 % now sort out the interleaving 0215 0216 fout=mb; % output frequencies in Hz 0217 lowex=any(w=='y')~=any(w=='Y'); % extend to 0 Hz 0218 highex=any(w=='y') && (fout(end-1)<fin(end)); % extend at high end 0219 if lowex 0220 fout=[0 0 fout(2:end)]; 0221 end 0222 if highex 0223 fout=[fout(1:end-1) fin(end) fin(end)]; 0224 end 0225 mfout=length(fout); 0226 if any(w=='u') || any(w=='U') 0227 gout=fout(3:mfout)-fout(1:mfout-2); 0228 gout=2*(gout+(gout==0)).^(-1); % Gain of output triangles 0229 else 0230 gout=ones(1,mfout-2); 0231 end 0232 if any(w=='S') 0233 msk=fout(2:mfout-1)~=0; 0234 gout(msk)=2*gout(msk); % double non-DC outputs for a 1-sided ouptu spectrum 0235 end 0236 if any(w=='u') 0237 gin=ones(1,nfin-2); 0238 else 0239 gin=fin(3:nfin)-fin(1:nfin-2); 0240 gin=2*(gin+(gin==0)).^(-1); % Gain of input triangles 0241 end 0242 msk=fin(2:end-1)==0; 0243 if any(w=='s') 0244 gin(~msk)=0.5*gin(~msk); % halve non-DC inputs to change back to a 2-sided spectrum 0245 end 0246 if lowex 0247 gin(msk)=2*gin(msk); % double DC input to preserve its power 0248 end 0249 foutin=[fout fin]; 0250 nfall=length(foutin); 0251 wleft=[0 fout(2:mfout)-fout(1:mfout-1) 0 fin(2:nfin)-fin(1:nfin-1)]; % left width 0252 wright=[wleft(2:end) 0]; % right width 0253 ffact=[0 gout 0 0 gin(1:min(nf,nfin-nf-2)) zeros(1,max(nfin-2*nf-2,0)) gin(nfin-nf-1:nfin-2) 0]; % gain of triangle posts 0254 % ffact(wleft+wright==0)=0; % disable null width triangles shouldn't need this if all frequencies are distinct 0255 [fall,ifall]=sort(foutin); 0256 jfall=zeros(1,nfall); 0257 infall=1:nfall; 0258 jfall(ifall)=infall; % unsort->sort index 0259 ffact(ifall([1:max(jfall(1),jfall(mfout+1))-2 min(jfall(mfout),jfall(nfall))+2:nfall]))=0; % zap nodes that are much too small/big 0260 0261 nxto=cumsum(ifall<=mfout); 0262 nxti=cumsum(ifall>mfout); 0263 nxtr=min(nxti+1+mfout,nfall); % next input node to the right of each value (or nfall if none) 0264 nxtr(ifall>mfout)=1+nxto(ifall>mfout); % next post to the right of opposite type (unsorted indexes) 0265 nxtr=nxtr(jfall); % next post to the right of opposite type (converted to unsorted indices) or if none: nfall/(mfout+1) 0266 0267 % each triangle is "attached" to the node at its extreme right end 0268 % the general result for integrating the product of two trapesiums with 0269 % heights (a,b) and (c,d) over a width x is (ad+bc+2bd+2ac)*w/6 0270 % 0271 % integrate product of lower triangles 0272 0273 msk0=(ffact>0); 0274 msk=msk0 & (ffact(nxtr)>0); % select appropriate triangle pairs (unsorted indices) 0275 ix1=infall(msk); % unsorted indices of leftmost post of pair 0276 jx1=nxtr(msk); % unsorted indices of rightmost post of pair 0277 vfgx=foutin(ix1)-foutin(jx1-1); % length of right triangle to the left of the left post 0278 yx=min(wleft(ix1),vfgx); % integration length 0279 wx1=ffact(ix1).*ffact(jx1).*yx.*(wleft(ix1).*vfgx-yx.*(0.5*(wleft(ix1)+vfgx)-yx/3))./(wleft(ix1).*wleft(jx1)+(yx==0)); 0280 0281 % integrate product of upper triangles 0282 0283 nxtu=max([nxtr(2:end)-1 0],1); 0284 msk=msk0 & (ffact(nxtu)>0); 0285 ix2=infall(msk); % unsorted indices of leftmost post of pair 0286 jx2=nxtu(msk); % unsorted indices of rightmost post of pair 0287 vfgx=foutin(ix2+1)-foutin(jx2); % length of left triangle to the right of the right post 0288 yx=min(wright(ix2),vfgx); % integration length 0289 yx(foutin(jx2+1)<foutin(ix2+1))=0; % zap invalid triangles 0290 wx2=ffact(ix2).*ffact(jx2).*yx.^2.*((0.5*(wright(jx2)-vfgx)+yx/3))./(wright(ix2).*wright(jx2)+(yx==0)); 0291 0292 % integrate lower triangle and upper triangle that ends to its right 0293 0294 nxtu=max(nxtr-1,1); 0295 msk=msk0 & (ffact(nxtu)>0); 0296 ix3=infall(msk); % unsorted indices of leftmost post of pair 0297 jx3=nxtu(msk); % unsorted indices of rightmost post of pair 0298 vfgx=foutin(ix3)-foutin(jx3); % length of upper triangle to the left of the lower post 0299 yx=min(wleft(ix3),vfgx); % integration length 0300 yx(foutin(jx3+1)<foutin(ix3))=0; % zap invalid triangles 0301 wx3=ffact(ix3).*ffact(jx3).*yx.*(wleft(ix3).*(wright(jx3)-vfgx)+yx.*(0.5*(wleft(ix3)-wright(jx3)+vfgx)-yx/3))./(wleft(ix3).*wright(jx3)+(yx==0)); 0302 0303 % integrate upper triangle and lower triangle that starts to its right 0304 0305 nxtu=[nxtr(2:end) 1]; 0306 msk=msk0 & (ffact(nxtu)>0); 0307 ix4=infall(msk); % unsorted indices of leftmost post of pair 0308 jx4=nxtu(msk); % unsorted indices of rightmost post of pair 0309 vfgx=foutin(ix4+1)-foutin(jx4-1); % length of upper triangle to the left of the lower post 0310 yx=min(wright(ix4),vfgx); % integration length 0311 wx4=ffact(ix4).*ffact(jx4).*yx.^2.*(0.5*vfgx-yx/3)./(wright(ix4).*wleft(jx4)+(yx==0)); 0312 0313 % now create the matrix 0314 0315 iox=sort([ix1 ix2 ix3 ix4;jx1 jx2 jx3 jx4]); 0316 % [iox;[wx1 wx2 wx3 wx4]>0 ] 0317 msk=iox(2,:)<=(nfall+mfout)/2; 0318 iox(2,msk)=(nfall+mfout+1)-iox(2,msk); % convert negative frequencies to positive 0319 if highex 0320 iox(1,iox(1,:)==mfout-1)=mfout-2; % merge highest two output nodes 0321 end 0322 if lowex 0323 iox(1,iox(1,:)==2)=3; % merge lowest two output nodes 0324 end 0325 0326 x=sparse(iox(1,:)-1-lowex,max(iox(2,:)-nfall+nf+1,1),[wx1 wx2 wx3 wx4],p,nf); 0327 % 0328 % sort out the output argument options 0329 % 0330 if ~any(w=='H') 0331 cf=mb; % output Hz instead of mel/erb/... 0332 end 0333 cf=cf(2:p+1); % remove dummy end frequencies 0334 il=1; 0335 ih=nf; 0336 if nargout > 2 0337 msk=full(any(x>0,1)); 0338 il=find(msk,1); 0339 if ~numel(il) 0340 ih=1; 0341 elseif nargout >3 0342 ih=find(msk,1,'last'); 0343 end 0344 x=x(:,il:ih); 0345 end 0346 if any(w=='u') 0347 sx=sum(x,2); 0348 x=x./repmat(sx+(sx==0),1,size(x,2)); 0349 end 0350 % 0351 % plot results if no output arguments or g option given 0352 % 0353 if ~nargout || any(w=='g') || any(w=='G') % plot idealized filters 0354 if ~any(w=='g') && ~any(w=='G') 0355 w=[w 'G']; 0356 end 0357 newfig=0; 0358 if any(w=='g') 0359 plot(f1-df+(il:ih)*df,x'); 0360 title(['filtbankm: mode = ' w]); 0361 xlabel(['Frequency (' xticksi 'Hz)']); 0362 ylabel('Weight'); 0363 newfig=1; 0364 end 0365 0366 if any(w=='G') 0367 if newfig 0368 figure; 0369 end 0370 imagesc(f1-df+(il:ih)*df,1:p,x); 0371 axis 'xy' 0372 colorbar; 0373 cblabel('Weight'); 0374 switch wr 0375 case 'l' 0376 type='Log-spaced'; 0377 case 'e' 0378 type='Erb-spaced'; 0379 case 'b' 0380 type='Bark-spaced'; 0381 case 'm' 0382 type='Mel-spaced'; 0383 otherwise 0384 type='Linear-spaced'; 0385 end 0386 ylabel([type ' Filter']); 0387 xlabel(['Frequency (' xticksi 'Hz)']); 0388 title(['filtbankm: mode = ' w]); 0389 end 0390 0391 end