v_melbankm

PURPOSE ^

V_MELBANKM determine matrix for a mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W)

SYNOPSIS ^

function [x,mc,mn,mx]=v_melbankm(p,n,fs,fl,fh,w)

DESCRIPTION ^

V_MELBANKM determine matrix for a mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W)

 Inputs:
       p   number of filters in v_filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))]
        n   length of fft
        fs  sample rate in Hz
        fl  low end of the lowest filter as a fraction of fs [default = 0]
        fh  high end of highest filter as a fraction of fs [default = 0.5]
        w   any sensible combination of the following:
             'b' = bark scale instead of mel
             'e' = erb-rate scale
             'l' = log10 Hz frequency scale
             'f' = linear frequency scale

             'c' = fl/fh specify centre of low and high filters
             'h' = fl/fh are in Hz instead of fractions of fs
             'H' = fl/fh are in mel/erb/bark/log10

              't' = triangular shaped filters in mel/erb/bark domain (default)
              'n' = hanning shaped filters in mel/erb/bark domain
              'm' = hamming shaped filters in mel/erb/bark domain

              'z' = highest and lowest filters taper down to zero [default]
              'y' = lowest filter remains at 1 down to 0 frequency and highest filter
                   remains at 1 up to nyquist freqency. See note (1) below.

             'u' = scale filters to sum to unity

             's' = single-sided: do not double filters to account for negative frequencies

             'g' = plot idealized filters [default if no output arguments present]

 Outputs:    x     a sparse matrix containing the v_filterbank amplitudes
                  If the mn and mx outputs are given then size(x)=[p,mx-mn+1]
                 otherwise size(x)=[p,1+floor(n/2)]
                 Note that the peak filter values equal 2 to account for the power
                 in the negative FFT frequencies.
           mc    the v_filterbank centre frequencies in mel/erb/bark
            mn    the lowest fft bin with a non-zero coefficient
            mx    the highest fft bin with a non-zero coefficient
                 Note: you must specify both or neither of mn and mx.

 Notes: (1) If 'ty' or 'ny' is specified, the total power in the fft is preserved.
        (2) The filter shape (triangular, hamming etc) is defined in the mel (or erb etc) domain
            rather than in the linear frequency domain which is more common (e.g. [2]).
        (3) A mel-filterbank can also be created using v_filtbank() which uses triangular
            filters in the linear frequency domain and copes better with the narrow filters
            that arise when p is large on n is small. 

 Examples of use:

 (a) Calcuate the Mel-frequency Cepstral Coefficients

       f=v_rfft(s);                    % v_rfft() returns only 1+floor(n/2) coefficients
        x=v_melbankm(p,n,fs);            % n is the fft length, p is the number of filters wanted
        z=log(x*abs(f).^2);         % multiply x by the power spectrum
        c=dct(z);                   % take the DCT

 (b) Calcuate the Mel-frequency Cepstral Coefficients efficiently

       f=fft(s);                        % n is the fft length, p is the number of filters wanted
       [x,mc,na,nb]=v_melbankm(p,n,fs);   % na:nb gives the fft bins that are needed
       z=log(x*(f(na:nb)).*conj(f(na:nb)));

 (c) Plot the calculated filterbanks

      plot((0:floor(n/2))*fs/n,melbankm(p,n,fs)')   % fs=sample frequency

 (d) Plot the idealized filterbanks (without output sampling)

      v_melbankm(p,n,fs);

 References:

 [1] S. S. Stevens, J. Volkman, and E. B. Newman. A scale for the measurement
     of the psychological magnitude of pitch. J. Acoust Soc Amer, 8: 185-19, 1937.
 [2] S. Davis and P. Mermelstein. Comparison of parametric representations for
     monosyllabic word recognition in continuously spoken sentences.
     IEEE Trans Acoustics Speech and Signal Processing, 28 (4): 357-366, Aug. 1980.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [x,mc,mn,mx]=v_melbankm(p,n,fs,fl,fh,w)
0002 %V_MELBANKM determine matrix for a mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W)
0003 %
0004 % Inputs:
0005 %       p   number of filters in v_filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))]
0006 %        n   length of fft
0007 %        fs  sample rate in Hz
0008 %        fl  low end of the lowest filter as a fraction of fs [default = 0]
0009 %        fh  high end of highest filter as a fraction of fs [default = 0.5]
0010 %        w   any sensible combination of the following:
0011 %             'b' = bark scale instead of mel
0012 %             'e' = erb-rate scale
0013 %             'l' = log10 Hz frequency scale
0014 %             'f' = linear frequency scale
0015 %
0016 %             'c' = fl/fh specify centre of low and high filters
0017 %             'h' = fl/fh are in Hz instead of fractions of fs
0018 %             'H' = fl/fh are in mel/erb/bark/log10
0019 %
0020 %              't' = triangular shaped filters in mel/erb/bark domain (default)
0021 %              'n' = hanning shaped filters in mel/erb/bark domain
0022 %              'm' = hamming shaped filters in mel/erb/bark domain
0023 %
0024 %              'z' = highest and lowest filters taper down to zero [default]
0025 %              'y' = lowest filter remains at 1 down to 0 frequency and highest filter
0026 %                   remains at 1 up to nyquist freqency. See note (1) below.
0027 %
0028 %             'u' = scale filters to sum to unity
0029 %
0030 %             's' = single-sided: do not double filters to account for negative frequencies
0031 %
0032 %             'g' = plot idealized filters [default if no output arguments present]
0033 %
0034 % Outputs:    x     a sparse matrix containing the v_filterbank amplitudes
0035 %                  If the mn and mx outputs are given then size(x)=[p,mx-mn+1]
0036 %                 otherwise size(x)=[p,1+floor(n/2)]
0037 %                 Note that the peak filter values equal 2 to account for the power
0038 %                 in the negative FFT frequencies.
0039 %           mc    the v_filterbank centre frequencies in mel/erb/bark
0040 %            mn    the lowest fft bin with a non-zero coefficient
0041 %            mx    the highest fft bin with a non-zero coefficient
0042 %                 Note: you must specify both or neither of mn and mx.
0043 %
0044 % Notes: (1) If 'ty' or 'ny' is specified, the total power in the fft is preserved.
0045 %        (2) The filter shape (triangular, hamming etc) is defined in the mel (or erb etc) domain
0046 %            rather than in the linear frequency domain which is more common (e.g. [2]).
0047 %        (3) A mel-filterbank can also be created using v_filtbank() which uses triangular
0048 %            filters in the linear frequency domain and copes better with the narrow filters
0049 %            that arise when p is large on n is small.
0050 %
0051 % Examples of use:
0052 %
0053 % (a) Calcuate the Mel-frequency Cepstral Coefficients
0054 %
0055 %       f=v_rfft(s);                    % v_rfft() returns only 1+floor(n/2) coefficients
0056 %        x=v_melbankm(p,n,fs);            % n is the fft length, p is the number of filters wanted
0057 %        z=log(x*abs(f).^2);         % multiply x by the power spectrum
0058 %        c=dct(z);                   % take the DCT
0059 %
0060 % (b) Calcuate the Mel-frequency Cepstral Coefficients efficiently
0061 %
0062 %       f=fft(s);                        % n is the fft length, p is the number of filters wanted
0063 %       [x,mc,na,nb]=v_melbankm(p,n,fs);   % na:nb gives the fft bins that are needed
0064 %       z=log(x*(f(na:nb)).*conj(f(na:nb)));
0065 %
0066 % (c) Plot the calculated filterbanks
0067 %
0068 %      plot((0:floor(n/2))*fs/n,melbankm(p,n,fs)')   % fs=sample frequency
0069 %
0070 % (d) Plot the idealized filterbanks (without output sampling)
0071 %
0072 %      v_melbankm(p,n,fs);
0073 %
0074 % References:
0075 %
0076 % [1] S. S. Stevens, J. Volkman, and E. B. Newman. A scale for the measurement
0077 %     of the psychological magnitude of pitch. J. Acoust Soc Amer, 8: 185-19, 1937.
0078 % [2] S. Davis and P. Mermelstein. Comparison of parametric representations for
0079 %     monosyllabic word recognition in continuously spoken sentences.
0080 %     IEEE Trans Acoustics Speech and Signal Processing, 28 (4): 357-366, Aug. 1980.
0081 
0082 
0083 %      Copyright (C) Mike Brookes 1997-2009
0084 %      Version: $Id: v_melbankm.m 10865 2018-09-21 17:22:45Z dmb $
0085 %
0086 %   VOICEBOX is a MATLAB toolbox for speech processing.
0087 %   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
0088 %
0089 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0090 %   This program is free software; you can redistribute it and/or modify
0091 %   it under the terms of the GNU General Public License as published by
0092 %   the Free Software Foundation; either version 2 of the License, or
0093 %   (at your option) any later version.
0094 %
0095 %   This program is distributed in the hope that it will be useful,
0096 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0097 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0098 %   GNU General Public License for more details.
0099 %
0100 %   You can obtain a copy of the GNU General Public License from
0101 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0102 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0103 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0104 
0105 % Note "FFT bin_0" assumes DC = bin 0 whereas "FFT bin_1" means DC = bin 1
0106 
0107 if nargin < 6
0108     w='tz'; % default options
0109 end
0110 if nargin < 5 || isempty(fh)
0111     fh=0.5; % max freq is the nyquist
0112 end
0113 if nargin < 4 || isempty(fl)
0114     fl=0; % min freq is DC
0115 end
0116 
0117 sfact=2-any(w=='s');   % 1 if single sided else 2
0118 wr=' ';   % default warping is mel
0119 for i=1:length(w)
0120     if any(w(i)=='lebf');
0121         wr=w(i);
0122     end
0123 end
0124 if any(w=='h') || any(w=='H')
0125     mflh=[fl fh];
0126 else
0127     mflh=[fl fh]*fs;
0128 end
0129 if ~any(w=='H')
0130     switch wr
0131                     case 'f'       % no transformation
0132         case 'l'
0133             if fl<=0
0134                 error('Low frequency limit must be >0 for l option');
0135             end
0136             mflh=log10(mflh);       % convert frequency limits into log10 Hz
0137         case 'e'
0138             mflh=v_frq2erb(mflh);       % convert frequency limits into erb-rate
0139         case 'b'
0140             mflh=v_frq2bark(mflh);       % convert frequency limits into bark
0141         otherwise
0142             mflh=v_frq2mel(mflh);       % convert frequency limits into mel
0143     end
0144 end
0145 melrng=mflh*(-1:2:1)';          % mel range
0146 fn2=floor(n/2);     % bin index of highest positive frequency (Nyquist if n is even)
0147 if isempty(p)
0148     p=ceil(4.6*log10(fs));         % default number of filters
0149 end
0150 if any(w=='c')              % c option: specify fiter centres not edges
0151 if p<1
0152     p=round(melrng/(p*1000))+1;
0153 end
0154 melinc=melrng/(p-1);
0155 mflh=mflh+(-1:2:1)*melinc;
0156 else
0157     if p<1
0158     p=round(melrng/(p*1000))-1;
0159 end
0160 melinc=melrng/(p+1);
0161 end
0162 
0163 %
0164 % Calculate the FFT bins corresponding to [filter#1-low filter#1-mid filter#p-mid filter#p-high]
0165 %
0166 switch wr
0167         case 'f'
0168         blim=(mflh(1)+[0 1 p p+1]*melinc)*n/fs;
0169     case 'l'
0170         blim=10.^(mflh(1)+[0 1 p p+1]*melinc)*n/fs;
0171     case 'e'
0172         blim=v_erb2frq(mflh(1)+[0 1 p p+1]*melinc)*n/fs;
0173     case 'b'
0174         blim=v_bark2frq(mflh(1)+[0 1 p p+1]*melinc)*n/fs;
0175     otherwise
0176         blim=v_mel2frq(mflh(1)+[0 1 p p+1]*melinc)*n/fs;
0177 end
0178 mc=mflh(1)+(1:p)*melinc;    % mel centre frequencies
0179 b1=floor(blim(1))+1;            % lowest FFT bin_0 required might be negative)
0180 b4=min(fn2,ceil(blim(4))-1);    % highest FFT bin_0 required
0181 %
0182 % now map all the useful FFT bins_0 to filter1 centres
0183 %
0184 switch wr
0185         case 'f'
0186         pf=((b1:b4)*fs/n-mflh(1))/melinc;
0187     case 'l'
0188         pf=(log10((b1:b4)*fs/n)-mflh(1))/melinc;
0189     case 'e'
0190         pf=(v_frq2erb((b1:b4)*fs/n)-mflh(1))/melinc;
0191     case 'b'
0192         pf=(v_frq2bark((b1:b4)*fs/n)-mflh(1))/melinc;
0193     otherwise
0194         pf=(v_frq2mel((b1:b4)*fs/n)-mflh(1))/melinc;
0195 end
0196 %
0197 %  remove any incorrect entries in pf due to rounding errors
0198 %
0199 if pf(1)<0
0200     pf(1)=[];
0201     b1=b1+1;
0202 end
0203 if pf(end)>=p+1
0204     pf(end)=[];
0205     b4=b4-1;
0206 end
0207 fp=floor(pf);                  % FFT bin_0 i contributes to filters_1 fp(1+i-b1)+[0 1]
0208 pm=pf-fp;                       % multiplier for upper filter
0209 k2=find(fp>0,1);   % FFT bin_1 k2+b1 is the first to contribute to both upper and lower filters
0210 k3=find(fp<p,1,'last');  % FFT bin_1 k3+b1 is the last to contribute to both upper and lower filters
0211 k4=numel(fp); % FFT bin_1 k4+b1 is the last to contribute to any filters
0212 if isempty(k2)
0213     k2=k4+1;
0214 end
0215 if isempty(k3)
0216     k3=0;
0217 end
0218 if any(w=='y')          % preserve power in FFT
0219     mn=1; % lowest fft bin required (1 = DC)
0220     mx=fn2+1; % highest fft bin required (1 = DC)
0221     r=[ones(1,k2+b1-1) 1+fp(k2:k3) fp(k2:k3) repmat(p,1,fn2-k3-b1+1)]; % filter number_1
0222     c=[1:k2+b1-1 k2+b1:k3+b1 k2+b1:k3+b1 k3+b1+1:fn2+1]; % FFT bin1
0223     v=[ones(1,k2+b1-1) pm(k2:k3) 1-pm(k2:k3) ones(1,fn2-k3-b1+1)];
0224 else
0225     r=[1+fp(1:k3) fp(k2:k4)]; % filter number_1
0226     c=[1:k3 k2:k4]; % FFT bin_1 - b1
0227     v=[pm(1:k3) 1-pm(k2:k4)];
0228     mn=b1+1; % lowest fft bin_1
0229     mx=b4+1;  % highest fft bin_1
0230 end
0231 if b1<0
0232     c=abs(c+b1-1)-b1+1;     % convert negative frequencies into positive
0233 end
0234 % end
0235 if any(w=='n')
0236     v=0.5-0.5*cos(v*pi);      % convert triangles to Hanning
0237 elseif any(w=='m')
0238     v=0.5-0.46/1.08*cos(v*pi);  % convert triangles to Hamming
0239 end
0240 if sfact==2  % double all except the DC and Nyquist (if any) terms
0241     msk=(c+mn>2) & (c+mn<n-fn2+2);  % there is no Nyquist term if n is odd
0242     v(msk)=2*v(msk);
0243 end
0244 %
0245 % sort out the output argument options
0246 %
0247 if nargout > 2
0248     x=sparse(r,c,v);
0249     if nargout == 3     % if exactly three output arguments, then
0250         mc=mn;          % delete mc output for legacy code compatibility
0251         mn=mx;
0252     end
0253 else
0254     x=sparse(r,c+mn-1,v,p,1+fn2);
0255 end
0256 if any(w=='u')
0257     sx=sum(x,2);
0258     x=x./repmat(sx+(sx==0),1,size(x,2));
0259 end
0260 %
0261 % plot results if no output arguments or g option given
0262 %
0263 if ~nargout || any(w=='g') % plot idealized filters
0264     ng=201;     % 201 points
0265     me=mflh(1)+(0:p+1)'*melinc;
0266     switch wr
0267                 case 'f'
0268             fe=me; % defining frequencies
0269             xg=repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng);
0270         case 'l'
0271             fe=10.^me; % defining frequencies
0272             xg=10.^(repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng));
0273         case 'e'
0274             fe=v_erb2frq(me); % defining frequencies
0275             xg=v_erb2frq(repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng));
0276         case 'b'
0277             fe=v_bark2frq(me); % defining frequencies
0278             xg=v_bark2frq(repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng));
0279         otherwise
0280             fe=v_mel2frq(me); % defining frequencies
0281             xg=v_mel2frq(repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng));
0282     end
0283 
0284     v=1-abs(linspace(-1,1,ng));
0285     if any(w=='n')
0286         v=0.5-0.5*cos(v*pi);      % convert triangles to Hanning
0287     elseif any(w=='m')
0288         v=0.5-0.46/1.08*cos(v*pi);  % convert triangles to Hamming
0289     end
0290     v=v*sfact;  % multiply by 2 if double sided
0291     v=repmat(v,p,1);
0292     if any(w=='y')  % extend first and last filters
0293         v(1,xg(1,:)<fe(2))=sfact;
0294         v(end,xg(end,:)>fe(p+1))=sfact;
0295     end
0296     if any(w=='u') % scale to unity sum
0297         dx=(xg(:,3:end)-xg(:,1:end-2))/2;
0298         dx=dx(:,[1 1:ng-2 ng-2]);
0299         vs=sum(v.*dx,2);
0300         v=v./repmat(vs+(vs==0),1,ng)*fs/n;
0301     end
0302     plot(xg',v','b');
0303     set(gca,'xlim',[fe(1) fe(end)]);
0304     xlabel(['Frequency (' v_xticksi 'Hz)']);
0305 end

Generated by m2html © 2003