V_MELBANKM determine matrix for a mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W) Inputs: p number of filters in v_filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))] n length of fft fs sample rate in Hz fl low end of the lowest filter as a fraction of fs [default = 0] fh high end of highest filter as a fraction of fs [default = 0.5] w any sensible combination of the following: 'b' = bark scale instead of mel 'e' = erb-rate scale 'l' = log10 Hz frequency scale 'f' = linear frequency scale 'c' = fl/fh specify centre of low and high filters 'h' = fl/fh are in Hz instead of fractions of fs 'H' = fl/fh are in mel/erb/bark/log10 't' = triangular shaped filters in mel/erb/bark domain (default) 'n' = hanning shaped filters in mel/erb/bark domain 'm' = hamming shaped filters in mel/erb/bark domain 'z' = highest and lowest filters taper down to zero [default] 'y' = lowest filter remains at 1 down to 0 frequency and highest filter remains at 1 up to nyquist freqency. See note (1) below. 'u' = scale filters to sum to unity 's' = single-sided: do not double filters to account for negative frequencies 'g' = plot idealized filters [default if no output arguments present] Outputs: x a sparse matrix containing the v_filterbank amplitudes If the mn and mx outputs are given then size(x)=[p,mx-mn+1] otherwise size(x)=[p,1+floor(n/2)] Note that the peak filter values equal 2 to account for the power in the negative FFT frequencies. mc the v_filterbank centre frequencies in mel/erb/bark mn the lowest fft bin with a non-zero coefficient mx the highest fft bin with a non-zero coefficient NOTE: For legacy compatibility reasons, you must specify both or neither of mn and mx. Notes: (1) If 'ty' or 'ny' is specified, the total power in the fft is preserved. (2) The filter shape (triangular, hamming etc) is defined in the mel (or erb etc) domain rather than in the linear frequency domain which is more common (e.g. [2]). (3) A mel-filterbank can also be created using v_filtbank() which uses triangular filters in the linear frequency domain and copes better with the narrow filters that arise when p is large on n is small. Examples of use: (a) Calcuate the Mel-frequency Cepstral Coefficients f=v_rfft(s); % v_rfft() returns only 1+floor(n/2) coefficients x=v_melbankm(p,n,fs); % n is the fft length, p is the number of filters wanted z=log(x*abs(f).^2); % multiply x by the power spectrum c=dct(z); % take the DCT (b) Calcuate the Mel-frequency Cepstral Coefficients efficiently f=fft(s); % n is the fft length, p is the number of filters wanted [x,mc,na,nb]=v_melbankm(p,n,fs); % na:nb gives the fft bins that are needed z=log(x*(f(na:nb)).*conj(f(na:nb))); c=dct(z); % take the DCT (c) Plot the calculated filterbanks plot((0:floor(n/2))*fs/n,melbankm(p,n,fs)') % fs=sample frequency (d) Plot the idealized filterbanks (without output sampling) v_melbankm(p,n,fs); References: [1] S. S. Stevens, J. Volkman, and E. B. Newman. A scale for the measurement of the psychological magnitude of pitch. J. Acoust Soc Amer, 8: 185-19, 1937. [2] S. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoustics Speech and Signal Processing, 28 (4): 357-366, Aug. 1980.
0001 function [x,mc,mn,mx]=v_melbankm(p,n,fs,fl,fh,w) 0002 %V_MELBANKM determine matrix for a mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W) 0003 % 0004 % Inputs: 0005 % p number of filters in v_filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))] 0006 % n length of fft 0007 % fs sample rate in Hz 0008 % fl low end of the lowest filter as a fraction of fs [default = 0] 0009 % fh high end of highest filter as a fraction of fs [default = 0.5] 0010 % w any sensible combination of the following: 0011 % 'b' = bark scale instead of mel 0012 % 'e' = erb-rate scale 0013 % 'l' = log10 Hz frequency scale 0014 % 'f' = linear frequency scale 0015 % 0016 % 'c' = fl/fh specify centre of low and high filters 0017 % 'h' = fl/fh are in Hz instead of fractions of fs 0018 % 'H' = fl/fh are in mel/erb/bark/log10 0019 % 0020 % 't' = triangular shaped filters in mel/erb/bark domain (default) 0021 % 'n' = hanning shaped filters in mel/erb/bark domain 0022 % 'm' = hamming shaped filters in mel/erb/bark domain 0023 % 0024 % 'z' = highest and lowest filters taper down to zero [default] 0025 % 'y' = lowest filter remains at 1 down to 0 frequency and highest filter 0026 % remains at 1 up to nyquist freqency. See note (1) below. 0027 % 0028 % 'u' = scale filters to sum to unity 0029 % 0030 % 's' = single-sided: do not double filters to account for negative frequencies 0031 % 0032 % 'g' = plot idealized filters [default if no output arguments present] 0033 % 0034 % Outputs: x a sparse matrix containing the v_filterbank amplitudes 0035 % If the mn and mx outputs are given then size(x)=[p,mx-mn+1] 0036 % otherwise size(x)=[p,1+floor(n/2)] 0037 % Note that the peak filter values equal 2 to account for the power in the negative FFT frequencies. 0038 % mc the v_filterbank centre frequencies in mel/erb/bark 0039 % mn the lowest fft bin with a non-zero coefficient 0040 % mx the highest fft bin with a non-zero coefficient 0041 % NOTE: For legacy compatibility reasons, you must specify both or neither of mn and mx. 0042 % 0043 % Notes: (1) If 'ty' or 'ny' is specified, the total power in the fft is preserved. 0044 % (2) The filter shape (triangular, hamming etc) is defined in the mel (or erb etc) domain 0045 % rather than in the linear frequency domain which is more common (e.g. [2]). 0046 % (3) A mel-filterbank can also be created using v_filtbank() which uses triangular 0047 % filters in the linear frequency domain and copes better with the narrow filters 0048 % that arise when p is large on n is small. 0049 % 0050 % Examples of use: 0051 % 0052 % (a) Calcuate the Mel-frequency Cepstral Coefficients 0053 % 0054 % f=v_rfft(s); % v_rfft() returns only 1+floor(n/2) coefficients 0055 % x=v_melbankm(p,n,fs); % n is the fft length, p is the number of filters wanted 0056 % z=log(x*abs(f).^2); % multiply x by the power spectrum 0057 % c=dct(z); % take the DCT 0058 % 0059 % (b) Calcuate the Mel-frequency Cepstral Coefficients efficiently 0060 % 0061 % f=fft(s); % n is the fft length, p is the number of filters wanted 0062 % [x,mc,na,nb]=v_melbankm(p,n,fs); % na:nb gives the fft bins that are needed 0063 % z=log(x*(f(na:nb)).*conj(f(na:nb))); 0064 % c=dct(z); % take the DCT 0065 % 0066 % (c) Plot the calculated filterbanks 0067 % 0068 % plot((0:floor(n/2))*fs/n,melbankm(p,n,fs)') % fs=sample frequency 0069 % 0070 % (d) Plot the idealized filterbanks (without output sampling) 0071 % 0072 % v_melbankm(p,n,fs); 0073 % 0074 % References: 0075 % 0076 % [1] S. S. Stevens, J. Volkman, and E. B. Newman. A scale for the measurement 0077 % of the psychological magnitude of pitch. J. Acoust Soc Amer, 8: 185-19, 1937. 0078 % [2] S. Davis and P. Mermelstein. Comparison of parametric representations for 0079 % monosyllabic word recognition in continuously spoken sentences. 0080 % IEEE Trans Acoustics Speech and Signal Processing, 28 (4): 357-366, Aug. 1980. 0081 0082 0083 % Copyright (C) Mike Brookes 1997-2009 0084 % Version: $Id: v_melbankm.m 10865 2018-09-21 17:22:45Z dmb $ 0085 % 0086 % VOICEBOX is a MATLAB toolbox for speech processing. 0087 % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html 0088 % 0089 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0090 % This program is free software; you can redistribute it and/or modify 0091 % it under the terms of the GNU General Public License as published by 0092 % the Free Software Foundation; either version 2 of the License, or 0093 % (at your option) any later version. 0094 % 0095 % This program is distributed in the hope that it will be useful, 0096 % but WITHOUT ANY WARRANTY; without even the implied warranty of 0097 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 0098 % GNU General Public License for more details. 0099 % 0100 % You can obtain a copy of the GNU General Public License from 0101 % http://www.gnu.org/copyleft/gpl.html or by writing to 0102 % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. 0103 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0104 0105 % Note, in the comments, "FFT bin_0" assumes DC = bin 0 whereas "FFT bin_1" assumes DC = bin 1 0106 0107 if nargin < 6 0108 w='tz'; % default options 0109 end 0110 if nargin < 5 || isempty(fh) 0111 fh=0.5; % max freq is the nyquist 0112 end 0113 if nargin < 4 || isempty(fl) 0114 fl=0; % min freq is DC 0115 end 0116 0117 sfact=2-any(w=='s'); % 1 if single sided else 2 0118 wr=' '; % default warping is mel 0119 for i=1:length(w) 0120 if any(w(i)=='lebf') 0121 wr=w(i); 0122 end 0123 end 0124 if any(w=='h') || any(w=='H') 0125 mflh=[fl fh]; 0126 else 0127 mflh=[fl fh]*fs; 0128 end 0129 if ~any(w=='H') 0130 switch wr 0131 case 'f' % no transformation 0132 case 'l' 0133 if fl<=0 0134 error('Low frequency limit must be >0 for ''l'' option'); 0135 end 0136 mflh=log10(mflh); % convert frequency limits into log10 Hz 0137 case 'e' 0138 mflh=v_frq2erb(mflh); % convert frequency limits into erb-rate 0139 case 'b' 0140 mflh=v_frq2bark(mflh); % convert frequency limits into bark 0141 otherwise 0142 mflh=v_frq2mel(mflh); % convert frequency limits into mel 0143 end 0144 end 0145 melrng=mflh*(-1:2:1)'; % mel range 0146 fn2=floor(n/2); % bin index of highest positive frequency (Nyquist if n is even) 0147 if isempty(p) 0148 p=ceil(4.6*log10(fs)); % default number of filters 0149 end 0150 if any(w=='c') % c option: specify filter centres not edges 0151 if p<1 0152 p=round(melrng/(p*1000))+1; 0153 end 0154 melinc=melrng/(p-1); 0155 mflh=mflh+(-1:2:1)*melinc; 0156 else 0157 if p<1 0158 p=round(melrng/(p*1000))-1; 0159 end 0160 melinc=melrng/(p+1); 0161 end 0162 % 0163 % Calculate the FFT bins corresponding to [filter#1-low filter#1-mid filter#p-mid filter#p-high] 0164 % 0165 switch wr 0166 case 'f' 0167 blim=(mflh(1)+[0 1 p p+1]*melinc)*n/fs; 0168 case 'l' 0169 blim=10.^(mflh(1)+[0 1 p p+1]*melinc)*n/fs; 0170 case 'e' 0171 blim=v_erb2frq(mflh(1)+[0 1 p p+1]*melinc)*n/fs; 0172 case 'b' 0173 blim=v_bark2frq(mflh(1)+[0 1 p p+1]*melinc)*n/fs; 0174 otherwise 0175 blim=v_mel2frq(mflh(1)+[0 1 p p+1]*melinc)*n/fs; 0176 end 0177 mc=mflh(1)+(1:p)*melinc; % mel centre frequencies 0178 b1=floor(blim(1))+1; % lowest FFT bin_0 required (might be negative) 0179 b4=min(fn2,ceil(blim(4))-1); % highest FFT bin_0 required 0180 % 0181 % now map all the useful FFT bins_0 to filter_1 centres 0182 % 0183 switch wr 0184 case 'f' 0185 pf=((b1:b4)*fs/n-mflh(1))/melinc; 0186 case 'l' 0187 pf=(log10((b1:b4)*fs/n)-mflh(1))/melinc; 0188 case 'e' 0189 pf=(v_frq2erb((b1:b4)*fs/n)-mflh(1))/melinc; 0190 case 'b' 0191 pf=(v_frq2bark((b1:b4)*fs/n)-mflh(1))/melinc; 0192 otherwise 0193 pf=(v_frq2mel((b1:b4)*fs/n)-mflh(1))/melinc; 0194 end 0195 % 0196 % remove any incorrect entries in pf due to rounding errors 0197 % 0198 if pf(1)<0 0199 pf(1)=[]; 0200 b1=b1+1; 0201 end 0202 if pf(end)>=p+1 0203 pf(end)=[]; 0204 b4=b4-1; 0205 end 0206 fp=floor(pf); % FFT bin_0 i contributes to filters_1 fp(1+i-b1)+[0 1] 0207 pm=pf-fp; % multiplier for upper filter 0208 k2=find(fp>0,1); % FFT bin_1 k2+b1 is the first to contribute to both upper and lower filters 0209 k3=find(fp<p,1,'last'); % FFT bin_1 k3+b1 is the last to contribute to both upper and lower filters 0210 k4=numel(fp); % FFT bin_1 k4+b1 is the last to contribute to any filters 0211 if isempty(k2) 0212 k2=k4+1; 0213 end 0214 if isempty(k3) 0215 k3=0; 0216 end 0217 if any(w=='y') % preserve power in FFT 0218 mn=1; % lowest fft bin required (1 = DC) 0219 mx=fn2+1; % highest fft bin required (1 = DC) 0220 r=[ones(1,k2+b1-1) 1+fp(k2:k3) fp(k2:k3) repmat(p,1,fn2-k3-b1+1)]; % filter number_1 0221 c=[1:k2+b1-1 k2+b1:k3+b1 k2+b1:k3+b1 k3+b1+1:fn2+1]; % FFT bin1 0222 v=[ones(1,k2+b1-1) pm(k2:k3) 1-pm(k2:k3) ones(1,fn2-k3-b1+1)]; 0223 else 0224 r=[1+fp(1:k3) fp(k2:k4)]; % filter number_1 0225 c=[1:k3 k2:k4]; % FFT bin_1 - b1 0226 v=[pm(1:k3) 1-pm(k2:k4)]; 0227 mn=b1+1; % lowest fft bin_1 0228 mx=b4+1; % highest fft bin_1 0229 end 0230 if b1<0 0231 c=abs(c+b1-1)-b1+1; % convert negative frequencies into positive 0232 end 0233 % end 0234 if any(w=='n') 0235 v=0.5-0.5*cos(v*pi); % convert triangles to Hanning 0236 elseif any(w=='m') 0237 v=0.5-0.46/1.08*cos(v*pi); % convert triangles to Hamming 0238 end 0239 if sfact==2 % double all except the DC and Nyquist (if any) terms 0240 msk=(c+mn>2) & (c+mn<n-fn2+2); % there is no Nyquist term if n is odd 0241 v(msk)=2*v(msk); 0242 end 0243 % 0244 % sort out the output argument options 0245 % 0246 if nargout > 2 0247 x=sparse(r,c,v); 0248 if nargout == 3 % if exactly three output arguments, then 0249 mc=mn; % delete mc output for legacy code compatibility 0250 mn=mx; 0251 end 0252 else 0253 x=sparse(r,c+mn-1,v,p,1+fn2); 0254 end 0255 if any(w=='u') 0256 sx=sum(x,2); 0257 x=x./repmat(sx+(sx==0),1,size(x,2)); 0258 end 0259 % 0260 % plot results if no output arguments or g option given 0261 % 0262 if ~nargout || any(w=='g') % plot idealized filters 0263 ng=201; % 201 points 0264 me=mflh(1)+(0:p+1)'*melinc; 0265 switch wr 0266 case 'f' 0267 fe=me; % defining frequencies 0268 xg=repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng); 0269 case 'l' 0270 fe=10.^me; % defining frequencies 0271 xg=10.^(repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng)); 0272 case 'e' 0273 fe=v_erb2frq(me); % defining frequencies 0274 xg=v_erb2frq(repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng)); 0275 case 'b' 0276 fe=v_bark2frq(me); % defining frequencies 0277 xg=v_bark2frq(repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng)); 0278 otherwise 0279 fe=v_mel2frq(me); % defining frequencies 0280 xg=v_mel2frq(repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng)); 0281 end 0282 0283 v=1-abs(linspace(-1,1,ng)); 0284 if any(w=='n') 0285 v=0.5-0.5*cos(v*pi); % convert triangles to Hanning 0286 elseif any(w=='m') 0287 v=0.5-0.46/1.08*cos(v*pi); % convert triangles to Hamming 0288 end 0289 v=v*sfact; % multiply by 2 if double sided 0290 v=repmat(v,p,1); 0291 if any(w=='y') % extend first and last filters 0292 v(1,xg(1,:)<fe(2))=sfact; 0293 v(end,xg(end,:)>fe(p+1))=sfact; 0294 end 0295 if any(w=='u') % scale to unity sum 0296 dx=(xg(:,3:end)-xg(:,1:end-2))/2; 0297 dx=dx(:,[1 1:ng-2 ng-2]); 0298 vs=sum(v.*dx,2); 0299 v=v./repmat(vs+(vs==0),1,ng)*fs/n; 0300 end 0301 plot(xg',v','b'); 0302 set(gca,'xlim',[fe(1) fe(end)]); 0303 xlabel(['Frequency (' v_xticksi 'Hz)']); 0304 end