Home > voicebox > melbankm.m

melbankm

PURPOSE ^

MELBANKM determine matrix for a mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W)

SYNOPSIS ^

function [x,mc,mn,mx]=melbankm(p,n,fs,fl,fh,w)

DESCRIPTION ^

MELBANKM determine matrix for a mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W)

 Inputs:
       p   number of filters in filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))]
        n   length of fft
        fs  sample rate in Hz
        fl  low end of the lowest filter as a fraction of fs [default = 0]
        fh  high end of highest filter as a fraction of fs [default = 0.5]
        w   any sensible combination of the following:
             'b' = bark scale instead of mel
             'e' = erb-rate scale
             'l' = log10 Hz frequency scale
             'f' = linear frequency scale

             'c' = fl/fh specify centre of low and high filters
             'h' = fl/fh are in Hz instead of fractions of fs
             'H' = fl/fh are in mel/erb/bark/log10

              't' = triangular shaped filters in mel/erb/bark domain (default)
              'n' = hanning shaped filters in mel/erb/bark domain
              'm' = hamming shaped filters in mel/erb/bark domain

              'z' = highest and lowest filters taper down to zero [default]
              'y' = lowest filter remains at 1 down to 0 frequency and
                    highest filter remains at 1 up to nyquist freqency

             'u' = scale filters to sum to unity

             's' = single-sided: do not double filters to account for negative frequencies

             'g' = plot idealized filters [default if no output arguments present]

 Note that the filter shape (triangular, hamming etc) is defined in the mel (or erb etc) domain.
 Some people instead define an asymmetric triangular filter in the frequency domain.

               If 'ty' or 'ny' is specified, the total power in the fft is preserved.

 Outputs:    x     a sparse matrix containing the filterbank amplitudes
                  If the mn and mx outputs are given then size(x)=[p,mx-mn+1]
                 otherwise size(x)=[p,1+floor(n/2)]
                 Note that the peak filter values equal 2 to account for the power
                 in the negative FFT frequencies.
           mc    the filterbank centre frequencies in mel/erb/bark
            mn    the lowest fft bin with a non-zero coefficient
            mx    the highest fft bin with a non-zero coefficient
                 Note: you must specify both or neither of mn and mx.

 Examples of use:

 (a) Calcuate the Mel-frequency Cepstral Coefficients

       f=rfft(s);                    % rfft() returns only 1+floor(n/2) coefficients
        x=melbankm(p,n,fs);            % n is the fft length, p is the number of filters wanted
        z=log(x*abs(f).^2);         % multiply x by the power spectrum
        c=dct(z);                   % take the DCT

 (b) Calcuate the Mel-frequency Cepstral Coefficients efficiently

       f=fft(s);                        % n is the fft length, p is the number of filters wanted
       [x,mc,na,nb]=melbankm(p,n,fs);   % na:nb gives the fft bins that are needed
       z=log(x*(f(na:nb)).*conj(f(na:nb)));

 (c) Plot the calculated filterbanks

      plot((0:floor(n/2))*fs/n,melbankm(p,n,fs)')   % fs=sample frequency

 (d) Plot the idealized filterbanks (without output sampling)

      melbankm(p,n,fs);

 References:

 [1] S. S. Stevens, J. Volkman, and E. B. Newman. A scale for the measurement
     of the psychological magnitude of pitch. J. Acoust Soc Amer, 8: 18519, 1937.
 [2] S. Davis and P. Mermelstein. Comparison of parametric representations for
     monosyllabic word recognition in continuously spoken sentences.
     IEEE Trans Acoustics Speech and Signal Processing, 28 (4): 357366, Aug. 1980.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [x,mc,mn,mx]=melbankm(p,n,fs,fl,fh,w)
0002 %MELBANKM determine matrix for a mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W)
0003 %
0004 % Inputs:
0005 %       p   number of filters in filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))]
0006 %        n   length of fft
0007 %        fs  sample rate in Hz
0008 %        fl  low end of the lowest filter as a fraction of fs [default = 0]
0009 %        fh  high end of highest filter as a fraction of fs [default = 0.5]
0010 %        w   any sensible combination of the following:
0011 %             'b' = bark scale instead of mel
0012 %             'e' = erb-rate scale
0013 %             'l' = log10 Hz frequency scale
0014 %             'f' = linear frequency scale
0015 %
0016 %             'c' = fl/fh specify centre of low and high filters
0017 %             'h' = fl/fh are in Hz instead of fractions of fs
0018 %             'H' = fl/fh are in mel/erb/bark/log10
0019 %
0020 %              't' = triangular shaped filters in mel/erb/bark domain (default)
0021 %              'n' = hanning shaped filters in mel/erb/bark domain
0022 %              'm' = hamming shaped filters in mel/erb/bark domain
0023 %
0024 %              'z' = highest and lowest filters taper down to zero [default]
0025 %              'y' = lowest filter remains at 1 down to 0 frequency and
0026 %                    highest filter remains at 1 up to nyquist freqency
0027 %
0028 %             'u' = scale filters to sum to unity
0029 %
0030 %             's' = single-sided: do not double filters to account for negative frequencies
0031 %
0032 %             'g' = plot idealized filters [default if no output arguments present]
0033 %
0034 % Note that the filter shape (triangular, hamming etc) is defined in the mel (or erb etc) domain.
0035 % Some people instead define an asymmetric triangular filter in the frequency domain.
0036 %
0037 %               If 'ty' or 'ny' is specified, the total power in the fft is preserved.
0038 %
0039 % Outputs:    x     a sparse matrix containing the filterbank amplitudes
0040 %                  If the mn and mx outputs are given then size(x)=[p,mx-mn+1]
0041 %                 otherwise size(x)=[p,1+floor(n/2)]
0042 %                 Note that the peak filter values equal 2 to account for the power
0043 %                 in the negative FFT frequencies.
0044 %           mc    the filterbank centre frequencies in mel/erb/bark
0045 %            mn    the lowest fft bin with a non-zero coefficient
0046 %            mx    the highest fft bin with a non-zero coefficient
0047 %                 Note: you must specify both or neither of mn and mx.
0048 %
0049 % Examples of use:
0050 %
0051 % (a) Calcuate the Mel-frequency Cepstral Coefficients
0052 %
0053 %       f=rfft(s);                    % rfft() returns only 1+floor(n/2) coefficients
0054 %        x=melbankm(p,n,fs);            % n is the fft length, p is the number of filters wanted
0055 %        z=log(x*abs(f).^2);         % multiply x by the power spectrum
0056 %        c=dct(z);                   % take the DCT
0057 %
0058 % (b) Calcuate the Mel-frequency Cepstral Coefficients efficiently
0059 %
0060 %       f=fft(s);                        % n is the fft length, p is the number of filters wanted
0061 %       [x,mc,na,nb]=melbankm(p,n,fs);   % na:nb gives the fft bins that are needed
0062 %       z=log(x*(f(na:nb)).*conj(f(na:nb)));
0063 %
0064 % (c) Plot the calculated filterbanks
0065 %
0066 %      plot((0:floor(n/2))*fs/n,melbankm(p,n,fs)')   % fs=sample frequency
0067 %
0068 % (d) Plot the idealized filterbanks (without output sampling)
0069 %
0070 %      melbankm(p,n,fs);
0071 %
0072 % References:
0073 %
0074 % [1] S. S. Stevens, J. Volkman, and E. B. Newman. A scale for the measurement
0075 %     of the psychological magnitude of pitch. J. Acoust Soc Amer, 8: 18519, 1937.
0076 % [2] S. Davis and P. Mermelstein. Comparison of parametric representations for
0077 %     monosyllabic word recognition in continuously spoken sentences.
0078 %     IEEE Trans Acoustics Speech and Signal Processing, 28 (4): 357366, Aug. 1980.
0079 
0080 
0081 %      Copyright (C) Mike Brookes 1997-2009
0082 %      Version: $Id: melbankm.m 9519 2017-02-23 07:52:51Z dmb $
0083 %
0084 %   VOICEBOX is a MATLAB toolbox for speech processing.
0085 %   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
0086 %
0087 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0088 %   This program is free software; you can redistribute it and/or modify
0089 %   it under the terms of the GNU General Public License as published by
0090 %   the Free Software Foundation; either version 2 of the License, or
0091 %   (at your option) any later version.
0092 %
0093 %   This program is distributed in the hope that it will be useful,
0094 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0095 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0096 %   GNU General Public License for more details.
0097 %
0098 %   You can obtain a copy of the GNU General Public License from
0099 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0100 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0101 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0102 
0103 % Note "FFT bin_0" assumes DC = bin 0 whereas "FFT bin_1" means DC = bin 1
0104 
0105 if nargin < 6
0106     w='tz'; % default options
0107 end
0108 if nargin < 5 || isempty(fh)
0109     fh=0.5; % max freq is the nyquist
0110 end
0111 if nargin < 4 || isempty(fl)
0112     fl=0; % min freq is DC
0113 end
0114 
0115 sfact=2-any(w=='s');   % 1 if single sided else 2
0116 wr=' ';   % default warping is mel
0117 for i=1:length(w)
0118     if any(w(i)=='lebf');
0119         wr=w(i);
0120     end
0121 end
0122 if any(w=='h') || any(w=='H')
0123     mflh=[fl fh];
0124 else
0125     mflh=[fl fh]*fs;
0126 end
0127 if ~any(w=='H')
0128     switch wr
0129                     case 'f'       % no transformation
0130         case 'l'
0131             if fl<=0
0132                 error('Low frequency limit must be >0 for l option');
0133             end
0134             mflh=log10(mflh);       % convert frequency limits into log10 Hz
0135         case 'e'
0136             mflh=frq2erb(mflh);       % convert frequency limits into erb-rate
0137         case 'b'
0138             mflh=frq2bark(mflh);       % convert frequency limits into bark
0139         otherwise
0140             mflh=frq2mel(mflh);       % convert frequency limits into mel
0141     end
0142 end
0143 melrng=mflh*(-1:2:1)';          % mel range
0144 fn2=floor(n/2);     % bin index of highest positive frequency (Nyquist if n is even)
0145 if isempty(p)
0146     p=ceil(4.6*log10(fs));         % default number of filters
0147 end
0148 if any(w=='c')              % c option: specify fiter centres not edges
0149 if p<1
0150     p=round(melrng/(p*1000))+1;
0151 end
0152 melinc=melrng/(p-1);
0153 mflh=mflh+(-1:2:1)*melinc;
0154 else
0155     if p<1
0156     p=round(melrng/(p*1000))-1;
0157 end
0158 melinc=melrng/(p+1);
0159 end
0160 
0161 %
0162 % Calculate the FFT bins corresponding to [filter#1-low filter#1-mid filter#p-mid filter#p-high]
0163 %
0164 switch wr
0165         case 'f'
0166         blim=(mflh(1)+[0 1 p p+1]*melinc)*n/fs;
0167     case 'l'
0168         blim=10.^(mflh(1)+[0 1 p p+1]*melinc)*n/fs;
0169     case 'e'
0170         blim=erb2frq(mflh(1)+[0 1 p p+1]*melinc)*n/fs;
0171     case 'b'
0172         blim=bark2frq(mflh(1)+[0 1 p p+1]*melinc)*n/fs;
0173     otherwise
0174         blim=mel2frq(mflh(1)+[0 1 p p+1]*melinc)*n/fs;
0175 end
0176 mc=mflh(1)+(1:p)*melinc;    % mel centre frequencies
0177 b1=floor(blim(1))+1;            % lowest FFT bin_0 required might be negative)
0178 b4=min(fn2,ceil(blim(4))-1);    % highest FFT bin_0 required
0179 %
0180 % now map all the useful FFT bins_0 to filter1 centres
0181 %
0182 switch wr
0183         case 'f'
0184         pf=((b1:b4)*fs/n-mflh(1))/melinc;
0185     case 'l'
0186         pf=(log10((b1:b4)*fs/n)-mflh(1))/melinc;
0187     case 'e'
0188         pf=(frq2erb((b1:b4)*fs/n)-mflh(1))/melinc;
0189     case 'b'
0190         pf=(frq2bark((b1:b4)*fs/n)-mflh(1))/melinc;
0191     otherwise
0192         pf=(frq2mel((b1:b4)*fs/n)-mflh(1))/melinc;
0193 end
0194 %
0195 %  remove any incorrect entries in pf due to rounding errors
0196 %
0197 if pf(1)<0
0198     pf(1)=[];
0199     b1=b1+1;
0200 end
0201 if pf(end)>=p+1
0202     pf(end)=[];
0203     b4=b4-1;
0204 end
0205 fp=floor(pf);                  % FFT bin_0 i contributes to filters_1 fp(1+i-b1)+[0 1]
0206 pm=pf-fp;                       % multiplier for upper filter
0207 k2=find(fp>0,1);   % FFT bin_1 k2+b1 is the first to contribute to both upper and lower filters
0208 k3=find(fp<p,1,'last');  % FFT bin_1 k3+b1 is the last to contribute to both upper and lower filters
0209 k4=numel(fp); % FFT bin_1 k4+b1 is the last to contribute to any filters
0210 if isempty(k2)
0211     k2=k4+1;
0212 end
0213 if isempty(k3)
0214     k3=0;
0215 end
0216 if any(w=='y')          % preserve power in FFT
0217     mn=1; % lowest fft bin required (1 = DC)
0218     mx=fn2+1; % highest fft bin required (1 = DC)
0219     r=[ones(1,k2+b1-1) 1+fp(k2:k3) fp(k2:k3) repmat(p,1,fn2-k3-b1+1)]; % filter number_1
0220     c=[1:k2+b1-1 k2+b1:k3+b1 k2+b1:k3+b1 k3+b1+1:fn2+1]; % FFT bin1
0221     v=[ones(1,k2+b1-1) pm(k2:k3) 1-pm(k2:k3) ones(1,fn2-k3-b1+1)];
0222 else
0223     r=[1+fp(1:k3) fp(k2:k4)]; % filter number_1
0224     c=[1:k3 k2:k4]; % FFT bin_1 - b1
0225     v=[pm(1:k3) 1-pm(k2:k4)];
0226     mn=b1+1; % lowest fft bin_1
0227     mx=b4+1;  % highest fft bin_1
0228 end
0229 if b1<0
0230     c=abs(c+b1-1)-b1+1;     % convert negative frequencies into positive
0231 end
0232 % end
0233 if any(w=='n')
0234     v=0.5-0.5*cos(v*pi);      % convert triangles to Hanning
0235 elseif any(w=='m')
0236     v=0.5-0.46/1.08*cos(v*pi);  % convert triangles to Hamming
0237 end
0238 if sfact==2  % double all except the DC and Nyquist (if any) terms
0239     msk=(c+mn>2) & (c+mn<n-fn2+2);  % there is no Nyquist term if n is odd
0240     v(msk)=2*v(msk);
0241 end
0242 %
0243 % sort out the output argument options
0244 %
0245 if nargout > 2
0246     x=sparse(r,c,v);
0247     if nargout == 3     % if exactly three output arguments, then
0248         mc=mn;          % delete mc output for legacy code compatibility
0249         mn=mx;
0250     end
0251 else
0252     x=sparse(r,c+mn-1,v,p,1+fn2);
0253 end
0254 if any(w=='u')
0255     sx=sum(x,2);
0256     x=x./repmat(sx+(sx==0),1,size(x,2));
0257 end
0258 %
0259 % plot results if no output arguments or g option given
0260 %
0261 if ~nargout || any(w=='g') % plot idealized filters
0262     ng=201;     % 201 points
0263     me=mflh(1)+(0:p+1)'*melinc;
0264     switch wr
0265                 case 'f'
0266             fe=me; % defining frequencies
0267             xg=repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng);
0268         case 'l'
0269             fe=10.^me; % defining frequencies
0270             xg=10.^(repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng));
0271         case 'e'
0272             fe=erb2frq(me); % defining frequencies
0273             xg=erb2frq(repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng));
0274         case 'b'
0275             fe=bark2frq(me); % defining frequencies
0276             xg=bark2frq(repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng));
0277         otherwise
0278             fe=mel2frq(me); % defining frequencies
0279             xg=mel2frq(repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng));
0280     end
0281 
0282     v=1-abs(linspace(-1,1,ng));
0283     if any(w=='n')
0284         v=0.5-0.5*cos(v*pi);      % convert triangles to Hanning
0285     elseif any(w=='m')
0286         v=0.5-0.46/1.08*cos(v*pi);  % convert triangles to Hamming
0287     end
0288     v=v*sfact;  % multiply by 2 if double sided
0289     v=repmat(v,p,1);
0290     if any(w=='y')  % extend first and last filters
0291         v(1,xg(1,:)<fe(2))=sfact;
0292         v(end,xg(end,:)>fe(p+1))=sfact;
0293     end
0294     if any(w=='u') % scale to unity sum
0295         dx=(xg(:,3:end)-xg(:,1:end-2))/2;
0296         dx=dx(:,[1 1:ng-2 ng-2]);
0297         vs=sum(v.*dx,2);
0298         v=v./repmat(vs+(vs==0),1,ng)*fs/n;
0299     end
0300     plot(xg',v','b');
0301     set(gca,'xlim',[fe(1) fe(end)]);
0302     xlabel(['Frequency (' xticksi 'Hz)']);
0303 end

Generated on Tue 10-Oct-2017 08:30:10 by m2html © 2003