V_MELBANKM determine matrix for a mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W) Inputs: p number of filters in v_filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))] n length of fft fs sample rate in Hz fl low end of the lowest filter as a fraction of fs [default = 0] fh high end of highest filter as a fraction of fs [default = 0.5] w any sensible combination of the following: 'b' = bark scale instead of mel 'e' = erb-rate scale 'l' = log10 Hz frequency scale 'f' = linear frequency scale 'c' = fl/fh specify centre of low and high filters 'h' = fl/fh are in Hz instead of fractions of fs 'H' = fl/fh are in mel/erb/bark/log10 't' = triangular shaped filters in mel/erb/bark domain (default) 'n' = hanning shaped filters in mel/erb/bark domain 'm' = hamming shaped filters in mel/erb/bark domain 'z' = highest and lowest filters taper down to zero [default] 'y' = lowest filter remains at 1 down to 0 frequency and highest filter remains at 1 up to nyquist freqency. See note (1) below. 'u' = scale filters to sum to unity 's' = single-sided: do not double filters to account for negative frequencies 'g' = plot idealized filters [default if no output arguments present] Outputs: x a sparse matrix containing the v_filterbank amplitudes If the mn and mx outputs are given then size(x)=[p,mx-mn+1] otherwise size(x)=[p,1+floor(n/2)] Note that the peak filter values equal 2 to account for the power in the negative FFT frequencies. mc the v_filterbank centre frequencies in mel/erb/bark mn the lowest fft bin with a non-zero coefficient mx the highest fft bin with a non-zero coefficient Note: you must specify both or neither of mn and mx. Notes: (1) If 'ty' or 'ny' is specified, the total power in the fft is preserved. (2) The filter shape (triangular, hamming etc) is defined in the mel (or erb etc) domain rather than in the linear frequency domain which is more common (e.g. [2]). (3) A mel-filterbank can also be created using v_filtbank() which uses triangular filters in the linear frequency domain and copes better with the narrow filters that arise when p is large on n is small. Examples of use: (a) Calcuate the Mel-frequency Cepstral Coefficients f=v_rfft(s); % v_rfft() returns only 1+floor(n/2) coefficients x=v_melbankm(p,n,fs); % n is the fft length, p is the number of filters wanted z=log(x*abs(f).^2); % multiply x by the power spectrum c=dct(z); % take the DCT (b) Calcuate the Mel-frequency Cepstral Coefficients efficiently f=fft(s); % n is the fft length, p is the number of filters wanted [x,mc,na,nb]=v_melbankm(p,n,fs); % na:nb gives the fft bins that are needed z=log(x*(f(na:nb)).*conj(f(na:nb))); (c) Plot the calculated filterbanks plot((0:floor(n/2))*fs/n,melbankm(p,n,fs)') % fs=sample frequency (d) Plot the idealized filterbanks (without output sampling) v_melbankm(p,n,fs); References: [1] S. S. Stevens, J. Volkman, and E. B. Newman. A scale for the measurement of the psychological magnitude of pitch. J. Acoust Soc Amer, 8: 185-19, 1937. [2] S. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoustics Speech and Signal Processing, 28 (4): 357-366, Aug. 1980.
0001 function [x,mc,mn,mx]=v_melbankm(p,n,fs,fl,fh,w) 0002 %V_MELBANKM determine matrix for a mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W) 0003 % 0004 % Inputs: 0005 % p number of filters in v_filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))] 0006 % n length of fft 0007 % fs sample rate in Hz 0008 % fl low end of the lowest filter as a fraction of fs [default = 0] 0009 % fh high end of highest filter as a fraction of fs [default = 0.5] 0010 % w any sensible combination of the following: 0011 % 'b' = bark scale instead of mel 0012 % 'e' = erb-rate scale 0013 % 'l' = log10 Hz frequency scale 0014 % 'f' = linear frequency scale 0015 % 0016 % 'c' = fl/fh specify centre of low and high filters 0017 % 'h' = fl/fh are in Hz instead of fractions of fs 0018 % 'H' = fl/fh are in mel/erb/bark/log10 0019 % 0020 % 't' = triangular shaped filters in mel/erb/bark domain (default) 0021 % 'n' = hanning shaped filters in mel/erb/bark domain 0022 % 'm' = hamming shaped filters in mel/erb/bark domain 0023 % 0024 % 'z' = highest and lowest filters taper down to zero [default] 0025 % 'y' = lowest filter remains at 1 down to 0 frequency and highest filter 0026 % remains at 1 up to nyquist freqency. See note (1) below. 0027 % 0028 % 'u' = scale filters to sum to unity 0029 % 0030 % 's' = single-sided: do not double filters to account for negative frequencies 0031 % 0032 % 'g' = plot idealized filters [default if no output arguments present] 0033 % 0034 % Outputs: x a sparse matrix containing the v_filterbank amplitudes 0035 % If the mn and mx outputs are given then size(x)=[p,mx-mn+1] 0036 % otherwise size(x)=[p,1+floor(n/2)] 0037 % Note that the peak filter values equal 2 to account for the power 0038 % in the negative FFT frequencies. 0039 % mc the v_filterbank centre frequencies in mel/erb/bark 0040 % mn the lowest fft bin with a non-zero coefficient 0041 % mx the highest fft bin with a non-zero coefficient 0042 % Note: you must specify both or neither of mn and mx. 0043 % 0044 % Notes: (1) If 'ty' or 'ny' is specified, the total power in the fft is preserved. 0045 % (2) The filter shape (triangular, hamming etc) is defined in the mel (or erb etc) domain 0046 % rather than in the linear frequency domain which is more common (e.g. [2]). 0047 % (3) A mel-filterbank can also be created using v_filtbank() which uses triangular 0048 % filters in the linear frequency domain and copes better with the narrow filters 0049 % that arise when p is large on n is small. 0050 % 0051 % Examples of use: 0052 % 0053 % (a) Calcuate the Mel-frequency Cepstral Coefficients 0054 % 0055 % f=v_rfft(s); % v_rfft() returns only 1+floor(n/2) coefficients 0056 % x=v_melbankm(p,n,fs); % n is the fft length, p is the number of filters wanted 0057 % z=log(x*abs(f).^2); % multiply x by the power spectrum 0058 % c=dct(z); % take the DCT 0059 % 0060 % (b) Calcuate the Mel-frequency Cepstral Coefficients efficiently 0061 % 0062 % f=fft(s); % n is the fft length, p is the number of filters wanted 0063 % [x,mc,na,nb]=v_melbankm(p,n,fs); % na:nb gives the fft bins that are needed 0064 % z=log(x*(f(na:nb)).*conj(f(na:nb))); 0065 % 0066 % (c) Plot the calculated filterbanks 0067 % 0068 % plot((0:floor(n/2))*fs/n,melbankm(p,n,fs)') % fs=sample frequency 0069 % 0070 % (d) Plot the idealized filterbanks (without output sampling) 0071 % 0072 % v_melbankm(p,n,fs); 0073 % 0074 % References: 0075 % 0076 % [1] S. S. Stevens, J. Volkman, and E. B. Newman. A scale for the measurement 0077 % of the psychological magnitude of pitch. J. Acoust Soc Amer, 8: 185-19, 1937. 0078 % [2] S. Davis and P. Mermelstein. Comparison of parametric representations for 0079 % monosyllabic word recognition in continuously spoken sentences. 0080 % IEEE Trans Acoustics Speech and Signal Processing, 28 (4): 357-366, Aug. 1980. 0081 0082 0083 % Copyright (C) Mike Brookes 1997-2009 0084 % Version: $Id: v_melbankm.m 10865 2018-09-21 17:22:45Z dmb $ 0085 % 0086 % VOICEBOX is a MATLAB toolbox for speech processing. 0087 % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html 0088 % 0089 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0090 % This program is free software; you can redistribute it and/or modify 0091 % it under the terms of the GNU General Public License as published by 0092 % the Free Software Foundation; either version 2 of the License, or 0093 % (at your option) any later version. 0094 % 0095 % This program is distributed in the hope that it will be useful, 0096 % but WITHOUT ANY WARRANTY; without even the implied warranty of 0097 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 0098 % GNU General Public License for more details. 0099 % 0100 % You can obtain a copy of the GNU General Public License from 0101 % http://www.gnu.org/copyleft/gpl.html or by writing to 0102 % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. 0103 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0104 0105 % Note "FFT bin_0" assumes DC = bin 0 whereas "FFT bin_1" means DC = bin 1 0106 0107 if nargin < 6 0108 w='tz'; % default options 0109 end 0110 if nargin < 5 || isempty(fh) 0111 fh=0.5; % max freq is the nyquist 0112 end 0113 if nargin < 4 || isempty(fl) 0114 fl=0; % min freq is DC 0115 end 0116 0117 sfact=2-any(w=='s'); % 1 if single sided else 2 0118 wr=' '; % default warping is mel 0119 for i=1:length(w) 0120 if any(w(i)=='lebf'); 0121 wr=w(i); 0122 end 0123 end 0124 if any(w=='h') || any(w=='H') 0125 mflh=[fl fh]; 0126 else 0127 mflh=[fl fh]*fs; 0128 end 0129 if ~any(w=='H') 0130 switch wr 0131 case 'f' % no transformation 0132 case 'l' 0133 if fl<=0 0134 error('Low frequency limit must be >0 for l option'); 0135 end 0136 mflh=log10(mflh); % convert frequency limits into log10 Hz 0137 case 'e' 0138 mflh=v_frq2erb(mflh); % convert frequency limits into erb-rate 0139 case 'b' 0140 mflh=v_frq2bark(mflh); % convert frequency limits into bark 0141 otherwise 0142 mflh=v_frq2mel(mflh); % convert frequency limits into mel 0143 end 0144 end 0145 melrng=mflh*(-1:2:1)'; % mel range 0146 fn2=floor(n/2); % bin index of highest positive frequency (Nyquist if n is even) 0147 if isempty(p) 0148 p=ceil(4.6*log10(fs)); % default number of filters 0149 end 0150 if any(w=='c') % c option: specify fiter centres not edges 0151 if p<1 0152 p=round(melrng/(p*1000))+1; 0153 end 0154 melinc=melrng/(p-1); 0155 mflh=mflh+(-1:2:1)*melinc; 0156 else 0157 if p<1 0158 p=round(melrng/(p*1000))-1; 0159 end 0160 melinc=melrng/(p+1); 0161 end 0162 0163 % 0164 % Calculate the FFT bins corresponding to [filter#1-low filter#1-mid filter#p-mid filter#p-high] 0165 % 0166 switch wr 0167 case 'f' 0168 blim=(mflh(1)+[0 1 p p+1]*melinc)*n/fs; 0169 case 'l' 0170 blim=10.^(mflh(1)+[0 1 p p+1]*melinc)*n/fs; 0171 case 'e' 0172 blim=v_erb2frq(mflh(1)+[0 1 p p+1]*melinc)*n/fs; 0173 case 'b' 0174 blim=v_bark2frq(mflh(1)+[0 1 p p+1]*melinc)*n/fs; 0175 otherwise 0176 blim=v_mel2frq(mflh(1)+[0 1 p p+1]*melinc)*n/fs; 0177 end 0178 mc=mflh(1)+(1:p)*melinc; % mel centre frequencies 0179 b1=floor(blim(1))+1; % lowest FFT bin_0 required might be negative) 0180 b4=min(fn2,ceil(blim(4))-1); % highest FFT bin_0 required 0181 % 0182 % now map all the useful FFT bins_0 to filter1 centres 0183 % 0184 switch wr 0185 case 'f' 0186 pf=((b1:b4)*fs/n-mflh(1))/melinc; 0187 case 'l' 0188 pf=(log10((b1:b4)*fs/n)-mflh(1))/melinc; 0189 case 'e' 0190 pf=(v_frq2erb((b1:b4)*fs/n)-mflh(1))/melinc; 0191 case 'b' 0192 pf=(v_frq2bark((b1:b4)*fs/n)-mflh(1))/melinc; 0193 otherwise 0194 pf=(v_frq2mel((b1:b4)*fs/n)-mflh(1))/melinc; 0195 end 0196 % 0197 % remove any incorrect entries in pf due to rounding errors 0198 % 0199 if pf(1)<0 0200 pf(1)=[]; 0201 b1=b1+1; 0202 end 0203 if pf(end)>=p+1 0204 pf(end)=[]; 0205 b4=b4-1; 0206 end 0207 fp=floor(pf); % FFT bin_0 i contributes to filters_1 fp(1+i-b1)+[0 1] 0208 pm=pf-fp; % multiplier for upper filter 0209 k2=find(fp>0,1); % FFT bin_1 k2+b1 is the first to contribute to both upper and lower filters 0210 k3=find(fp<p,1,'last'); % FFT bin_1 k3+b1 is the last to contribute to both upper and lower filters 0211 k4=numel(fp); % FFT bin_1 k4+b1 is the last to contribute to any filters 0212 if isempty(k2) 0213 k2=k4+1; 0214 end 0215 if isempty(k3) 0216 k3=0; 0217 end 0218 if any(w=='y') % preserve power in FFT 0219 mn=1; % lowest fft bin required (1 = DC) 0220 mx=fn2+1; % highest fft bin required (1 = DC) 0221 r=[ones(1,k2+b1-1) 1+fp(k2:k3) fp(k2:k3) repmat(p,1,fn2-k3-b1+1)]; % filter number_1 0222 c=[1:k2+b1-1 k2+b1:k3+b1 k2+b1:k3+b1 k3+b1+1:fn2+1]; % FFT bin1 0223 v=[ones(1,k2+b1-1) pm(k2:k3) 1-pm(k2:k3) ones(1,fn2-k3-b1+1)]; 0224 else 0225 r=[1+fp(1:k3) fp(k2:k4)]; % filter number_1 0226 c=[1:k3 k2:k4]; % FFT bin_1 - b1 0227 v=[pm(1:k3) 1-pm(k2:k4)]; 0228 mn=b1+1; % lowest fft bin_1 0229 mx=b4+1; % highest fft bin_1 0230 end 0231 if b1<0 0232 c=abs(c+b1-1)-b1+1; % convert negative frequencies into positive 0233 end 0234 % end 0235 if any(w=='n') 0236 v=0.5-0.5*cos(v*pi); % convert triangles to Hanning 0237 elseif any(w=='m') 0238 v=0.5-0.46/1.08*cos(v*pi); % convert triangles to Hamming 0239 end 0240 if sfact==2 % double all except the DC and Nyquist (if any) terms 0241 msk=(c+mn>2) & (c+mn<n-fn2+2); % there is no Nyquist term if n is odd 0242 v(msk)=2*v(msk); 0243 end 0244 % 0245 % sort out the output argument options 0246 % 0247 if nargout > 2 0248 x=sparse(r,c,v); 0249 if nargout == 3 % if exactly three output arguments, then 0250 mc=mn; % delete mc output for legacy code compatibility 0251 mn=mx; 0252 end 0253 else 0254 x=sparse(r,c+mn-1,v,p,1+fn2); 0255 end 0256 if any(w=='u') 0257 sx=sum(x,2); 0258 x=x./repmat(sx+(sx==0),1,size(x,2)); 0259 end 0260 % 0261 % plot results if no output arguments or g option given 0262 % 0263 if ~nargout || any(w=='g') % plot idealized filters 0264 ng=201; % 201 points 0265 me=mflh(1)+(0:p+1)'*melinc; 0266 switch wr 0267 case 'f' 0268 fe=me; % defining frequencies 0269 xg=repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng); 0270 case 'l' 0271 fe=10.^me; % defining frequencies 0272 xg=10.^(repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng)); 0273 case 'e' 0274 fe=v_erb2frq(me); % defining frequencies 0275 xg=v_erb2frq(repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng)); 0276 case 'b' 0277 fe=v_bark2frq(me); % defining frequencies 0278 xg=v_bark2frq(repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng)); 0279 otherwise 0280 fe=v_mel2frq(me); % defining frequencies 0281 xg=v_mel2frq(repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng)); 0282 end 0283 0284 v=1-abs(linspace(-1,1,ng)); 0285 if any(w=='n') 0286 v=0.5-0.5*cos(v*pi); % convert triangles to Hanning 0287 elseif any(w=='m') 0288 v=0.5-0.46/1.08*cos(v*pi); % convert triangles to Hamming 0289 end 0290 v=v*sfact; % multiply by 2 if double sided 0291 v=repmat(v,p,1); 0292 if any(w=='y') % extend first and last filters 0293 v(1,xg(1,:)<fe(2))=sfact; 0294 v(end,xg(end,:)>fe(p+1))=sfact; 0295 end 0296 if any(w=='u') % scale to unity sum 0297 dx=(xg(:,3:end)-xg(:,1:end-2))/2; 0298 dx=dx(:,[1 1:ng-2 ng-2]); 0299 vs=sum(v.*dx,2); 0300 v=v./repmat(vs+(vs==0),1,ng)*fs/n; 0301 end 0302 plot(xg',v','b'); 0303 set(gca,'xlim',[fe(1) fe(end)]); 0304 xlabel(['Frequency (' v_xticksi 'Hz)']); 0305 end