Description of v

V_MELBANKM determine matrix for a mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W)

 Inputs:
       p   number of filters in v_filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))]
        n   length of fft
        fs  sample rate in Hz
        fl  low end of the lowest filter as a fraction of fs [default = 0]
        fh  high end of highest filter as a fraction of fs [default = 0.5]
        w   any sensible combination of the following:
             'b' = bark scale instead of mel
             'e' = erb-rate scale
             'l' = log10 Hz frequency scale
             'f' = linear frequency scale

             'c' = fl/fh specify centre of low and high filters
             'h' = fl/fh are in Hz instead of fractions of fs
             'H' = fl/fh are in mel/erb/bark/log10

              't' = triangular shaped filters in mel/erb/bark domain (default)
              'n' = hanning shaped filters in mel/erb/bark domain
              'm' = hamming shaped filters in mel/erb/bark domain

              'z' = highest and lowest filters taper down to zero [default]
              'y' = lowest filter remains at 1 down to 0 frequency and highest filter
                   remains at 1 up to nyquist freqency. See note (1) below.

             'u' = scale filters to sum to unity

             's' = single-sided: do not double filters to account for negative frequencies

             'g' = plot idealized filters [default if no output arguments present]

 Outputs:    x     a sparse matrix containing the v_filterbank amplitudes
                  If the mn and mx outputs are given then size(x)=[p,mx-mn+1]
                 otherwise size(x)=[p,1+floor(n/2)]
                 Note that the peak filter values equal 2 to account for the power in the negative FFT frequencies.
           mc    the v_filterbank centre frequencies in mel/erb/bark
            mn    the lowest fft bin with a non-zero coefficient
            mx    the highest fft bin with a non-zero coefficient
                 NOTE: For legacy compatibility reasons, you must specify both or neither of mn and mx.

 Notes: (1) If 'ty' or 'ny' is specified, the total power in the fft is preserved.
        (2) The filter shape (triangular, hamming etc) is defined in the mel (or erb etc) domain
            rather than in the linear frequency domain which is more common (e.g. [2]).
        (3) A mel-filterbank can also be created using v_filtbank() which uses triangular
            filters in the linear frequency domain and copes better with the narrow filters
            that arise when p is large on n is small.

 Examples of use:

 (a) Calcuate the Mel-frequency Cepstral Coefficients

       f=v_rfft(s);                    % v_rfft() returns only 1+floor(n/2) coefficients
        x=v_melbankm(p,n,fs);            % n is the fft length, p is the number of filters wanted
        z=log(x*abs(f).^2);         % multiply x by the power spectrum
        c=dct(z);                   % take the DCT

 (b) Calcuate the Mel-frequency Cepstral Coefficients efficiently

       f=fft(s);                        % n is the fft length, p is the number of filters wanted
       [x,mc,na,nb]=v_melbankm(p,n,fs);   % na:nb gives the fft bins that are needed
       z=log(x*(f(na:nb)).*conj(f(na:nb)));
        c=dct(z);                   % take the DCT

 (c) Plot the calculated filterbanks

      plot((0:floor(n/2))*fs/n,melbankm(p,n,fs)')   % fs=sample frequency

 (d) Plot the idealized filterbanks (without output sampling)

      v_melbankm(p,n,fs);

 References:

 [1] S. S. Stevens, J. Volkman, and E. B. Newman. A scale for the measurement
     of the psychological magnitude of pitch. J. Acoust Soc Amer, 8: 185-19, 1937.
 [2] S. Davis and P. Mermelstein. Comparison of parametric representations for
     monosyllabic word recognition in continuously spoken sentences.
     IEEE Trans Acoustics Speech and Signal Processing, 28 (4): 357-366, Aug. 1980.

0001 function [x,mc,mn,mx]=v_melbankm(p,n,fs,fl,fh,w)
0002 %V_MELBANKM determine matrix for a mel/erb/bark-spaced filterbank [X,MN,MX]=(P,N,FS,FL,FH,W)
0003 %
0004 % Inputs:
0005 %       p   number of filters in v_filterbank or the filter spacing in k-mel/bark/erb [ceil(4.6*log10(fs))]
0006 %        n   length of fft
0007 %        fs  sample rate in Hz
0008 %        fl  low end of the lowest filter as a fraction of fs [default = 0]
0009 %        fh  high end of highest filter as a fraction of fs [default = 0.5]
0010 %        w   any sensible combination of the following:
0011 %             'b' = bark scale instead of mel
0012 %             'e' = erb-rate scale
0013 %             'l' = log10 Hz frequency scale
0014 %             'f' = linear frequency scale
0015 %
0016 %             'c' = fl/fh specify centre of low and high filters
0017 %             'h' = fl/fh are in Hz instead of fractions of fs
0018 %             'H' = fl/fh are in mel/erb/bark/log10
0019 %
0020 %              't' = triangular shaped filters in mel/erb/bark domain (default)
0021 %              'n' = hanning shaped filters in mel/erb/bark domain
0022 %              'm' = hamming shaped filters in mel/erb/bark domain
0023 %
0024 %              'z' = highest and lowest filters taper down to zero [default]
0025 %              'y' = lowest filter remains at 1 down to 0 frequency and highest filter
0026 %                   remains at 1 up to nyquist freqency. See note (1) below.
0027 %
0028 %             'u' = scale filters to sum to unity
0029 %
0030 %             's' = single-sided: do not double filters to account for negative frequencies
0031 %
0032 %             'g' = plot idealized filters [default if no output arguments present]
0033 %
0034 % Outputs:    x     a sparse matrix containing the v_filterbank amplitudes
0035 %                  If the mn and mx outputs are given then size(x)=[p,mx-mn+1]
0036 %                 otherwise size(x)=[p,1+floor(n/2)]
0037 %                 Note that the peak filter values equal 2 to account for the power in the negative FFT frequencies.
0038 %           mc    the v_filterbank centre frequencies in mel/erb/bark
0039 %            mn    the lowest fft bin with a non-zero coefficient
0040 %            mx    the highest fft bin with a non-zero coefficient
0041 %                 NOTE: For legacy compatibility reasons, you must specify both or neither of mn and mx.
0042 %
0043 % Notes: (1) If 'ty' or 'ny' is specified, the total power in the fft is preserved.
0044 %        (2) The filter shape (triangular, hamming etc) is defined in the mel (or erb etc) domain
0045 %            rather than in the linear frequency domain which is more common (e.g. [2]).
0046 %        (3) A mel-filterbank can also be created using v_filtbank() which uses triangular
0047 %            filters in the linear frequency domain and copes better with the narrow filters
0048 %            that arise when p is large on n is small.
0049 %
0050 % Examples of use:
0051 %
0052 % (a) Calcuate the Mel-frequency Cepstral Coefficients
0053 %
0054 %       f=v_rfft(s);                    % v_rfft() returns only 1+floor(n/2) coefficients
0055 %        x=v_melbankm(p,n,fs);            % n is the fft length, p is the number of filters wanted
0056 %        z=log(x*abs(f).^2);         % multiply x by the power spectrum
0057 %        c=dct(z);                   % take the DCT
0058 %
0059 % (b) Calcuate the Mel-frequency Cepstral Coefficients efficiently
0060 %
0061 %       f=fft(s);                        % n is the fft length, p is the number of filters wanted
0062 %       [x,mc,na,nb]=v_melbankm(p,n,fs);   % na:nb gives the fft bins that are needed
0063 %       z=log(x*(f(na:nb)).*conj(f(na:nb)));
0064 %        c=dct(z);                   % take the DCT
0065 %
0066 % (c) Plot the calculated filterbanks
0067 %
0068 %      plot((0:floor(n/2))*fs/n,melbankm(p,n,fs)')   % fs=sample frequency
0069 %
0070 % (d) Plot the idealized filterbanks (without output sampling)
0071 %
0072 %      v_melbankm(p,n,fs);
0073 %
0074 % References:
0075 %
0076 % [1] S. S. Stevens, J. Volkman, and E. B. Newman. A scale for the measurement
0077 %     of the psychological magnitude of pitch. J. Acoust Soc Amer, 8: 185-19, 1937.
0078 % [2] S. Davis and P. Mermelstein. Comparison of parametric representations for
0079 %     monosyllabic word recognition in continuously spoken sentences.
0080 %     IEEE Trans Acoustics Speech and Signal Processing, 28 (4): 357-366, Aug. 1980.
0081 
0082 
0083 %      Copyright (C) Mike Brookes 1997-2009
0084 %      Version: $Id: v_melbankm.m 10865 2018-09-21 17:22:45Z dmb $
0085 %
0086 %   VOICEBOX is a MATLAB toolbox for speech processing.
0087 %   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
0088 %
0089 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0090 %   This program is free software; you can redistribute it and/or modify
0091 %   it under the terms of the GNU Lesser General Public License as published by
0092 %   the Free Software Foundation; either version 3 of the License, or
0093 %   (at your option) any later version.
0094 %
0095 %   This program is distributed in the hope that it will be useful,
0096 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0097 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0098 %   GNU Lesser General Public License for more details.
0099 %
0100 %   You can obtain a copy of the GNU Lesser General Public License from
0101 %   https://www.gnu.org/licenses/ .
0102 %
0103 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0104 
0105 % Note, in the comments, "FFT bin_0" assumes DC = bin 0 whereas "FFT bin_1" assumes DC = bin 1
0106 
0107 if nargin < 6
0108     w='tz'; % default options
0109 end
0110 if nargin < 5 || isempty(fh)
0111     fh=0.5; % max freq is the nyquist
0112 end
0113 if nargin < 4 || isempty(fl)
0114     fl=0; % min freq is DC
0115 end
0116 
0117 sfact=2-any(w=='s');   % 1 if single sided else 2
0118 wr=' ';   % default warping is mel
0119 for i=1:length(w)
0120     if any(w(i)=='lebf')
0121         wr=w(i);
0122     end
0123 end
0124 if any(w=='h') || any(w=='H')
0125     mflh=[fl fh];
0126 else
0127     mflh=[fl fh]*fs;
0128 end
0129 if ~any(w=='H')
0130     switch wr
0131         case 'f'                        % no transformation
0132         case 'l'
0133             if fl<=0
0134                 error('Low frequency limit must be >0 for ''l'' option');
0135             end
0136             mflh=log10(mflh);           % convert frequency limits into log10 Hz
0137         case 'e'
0138             mflh=v_frq2erb(mflh);       % convert frequency limits into erb-rate
0139         case 'b'
0140             mflh=v_frq2bark(mflh);      % convert frequency limits into bark
0141         otherwise
0142             mflh=v_frq2mel(mflh);       % convert frequency limits into mel
0143     end
0144 end
0145 melrng=mflh*(-1:2:1)';                  % mel range
0146 fn2=floor(n/2);                         % bin index of highest positive frequency (Nyquist if n is even)
0147 if isempty(p)
0148     p=ceil(4.6*log10(fs));              % default number of filters
0149 end
0150 if any(w=='c')                          % c option: specify filter centres not edges
0151     if p<1
0152         p=round(melrng/(p*1000))+1;
0153     end
0154     melinc=melrng/(p-1);
0155     mflh=mflh+(-1:2:1)*melinc;
0156 else
0157     if p<1
0158         p=round(melrng/(p*1000))-1;
0159     end
0160     melinc=melrng/(p+1);
0161 end
0162 %
0163 % Calculate the FFT bins corresponding to [filter#1-low filter#1-mid filter#p-mid filter#p-high]
0164 %
0165 switch wr
0166     case 'f'
0167         blim=(mflh(1)+[0 1 p p+1]*melinc)*n/fs;
0168     case 'l'
0169         blim=10.^(mflh(1)+[0 1 p p+1]*melinc)*n/fs;
0170     case 'e'
0171         blim=v_erb2frq(mflh(1)+[0 1 p p+1]*melinc)*n/fs;
0172     case 'b'
0173         blim=v_bark2frq(mflh(1)+[0 1 p p+1]*melinc)*n/fs;
0174     otherwise
0175         blim=v_mel2frq(mflh(1)+[0 1 p p+1]*melinc)*n/fs;
0176 end
0177 mc=mflh(1)+(1:p)*melinc;        % mel centre frequencies
0178 b1=floor(blim(1))+1;            % lowest FFT bin_0 required (might be negative)
0179 b4=min(fn2,ceil(blim(4))-1);    % highest FFT bin_0 required
0180 %
0181 % now map all the useful FFT bins_0 to filter_1 centres
0182 %
0183 switch wr
0184     case 'f'
0185         pf=((b1:b4)*fs/n-mflh(1))/melinc;
0186     case 'l'
0187         pf=(log10((b1:b4)*fs/n)-mflh(1))/melinc;
0188     case 'e'
0189         pf=(v_frq2erb((b1:b4)*fs/n)-mflh(1))/melinc;
0190     case 'b'
0191         pf=(v_frq2bark((b1:b4)*fs/n)-mflh(1))/melinc;
0192     otherwise
0193         pf=(v_frq2mel((b1:b4)*fs/n)-mflh(1))/melinc;
0194 end
0195 %
0196 %  remove any incorrect entries in pf due to rounding errors
0197 %
0198 if pf(1)<0
0199     pf(1)=[];
0200     b1=b1+1;
0201 end
0202 if pf(end)>=p+1
0203     pf(end)=[];
0204     b4=b4-1;
0205 end
0206 fp=floor(pf);                   % FFT bin_0 i contributes to filters_1 fp(1+i-b1)+[0 1]
0207 pm=pf-fp;                       % multiplier for upper filter
0208 k2=find(fp>0,1);                % FFT bin_1 k2+b1 is the first to contribute to both upper and lower filters
0209 k3=find(fp<p,1,'last');         % FFT bin_1 k3+b1 is the last to contribute to both upper and lower filters
0210 k4=numel(fp);                   % FFT bin_1 k4+b1 is the last to contribute to any filters
0211 if isempty(k2)
0212     k2=k4+1;
0213 end
0214 if isempty(k3)
0215     k3=0;
0216 end
0217 if any(w=='y')                  % preserve power in FFT
0218     mn=1;                       % lowest fft bin required (1 = DC)
0219     mx=fn2+1;                   % highest fft bin required (1 = DC)
0220     r=[ones(1,k2+b1-1) 1+fp(k2:k3) fp(k2:k3) repmat(p,1,fn2-k3-b1+1)];  % filter number_1
0221     c=[1:k2+b1-1 k2+b1:k3+b1 k2+b1:k3+b1 k3+b1+1:fn2+1];                % FFT bin1
0222     v=[ones(1,k2+b1-1) pm(k2:k3) 1-pm(k2:k3) ones(1,fn2-k3-b1+1)];
0223 else
0224     r=[1+fp(1:k3) fp(k2:k4)];       % filter number_1
0225     c=[1:k3 k2:k4];                 % FFT bin_1 - b1
0226     v=[pm(1:k3) 1-pm(k2:k4)];
0227     mn=b1+1;                        % lowest fft bin_1
0228     mx=b4+1;                        % highest fft bin_1
0229 end
0230 if b1<0
0231     c=abs(c+b1-1)-b1+1;             % convert negative frequencies into positive
0232 end
0233 % end
0234 if any(w=='n')
0235     v=0.5-0.5*cos(v*pi);            % convert triangles to Hanning
0236 elseif any(w=='m')
0237     v=0.5-0.46/1.08*cos(v*pi);      % convert triangles to Hamming
0238 end
0239 if sfact==2                         % double all except the DC and Nyquist (if any) terms
0240     msk=(c+mn>2) & (c+mn<n-fn2+2);  % there is no Nyquist term if n is odd
0241     v(msk)=2*v(msk);
0242 end
0243 %
0244 % sort out the output argument options
0245 %
0246 if nargout > 2
0247     x=sparse(r,c,v);
0248     if nargout == 3                 % if exactly three output arguments, then
0249         mc=mn;                      % delete mc output for legacy code compatibility
0250         mn=mx;
0251     end
0252 else
0253     x=sparse(r,c+mn-1,v,p,1+fn2);
0254 end
0255 if any(w=='u')
0256     sx=sum(x,2);
0257     x=x./repmat(sx+(sx==0),1,size(x,2));
0258 end
0259 %
0260 % plot results if no output arguments or g option given
0261 %
0262 if ~nargout || any(w=='g')          % plot idealized filters
0263     ng=201;                         % 201 points
0264     me=mflh(1)+(0:p+1)'*melinc;
0265     switch wr
0266         case 'f'
0267             fe=me;                  % defining frequencies
0268             xg=repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng);
0269         case 'l'
0270             fe=10.^me;              % defining frequencies
0271             xg=10.^(repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng));
0272         case 'e'
0273             fe=v_erb2frq(me);       % defining frequencies
0274             xg=v_erb2frq(repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng));
0275         case 'b'
0276             fe=v_bark2frq(me);      % defining frequencies
0277             xg=v_bark2frq(repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng));
0278         otherwise
0279             fe=v_mel2frq(me);       % defining frequencies
0280             xg=v_mel2frq(repmat(linspace(0,1,ng),p,1).*repmat(me(3:end)-me(1:end-2),1,ng)+repmat(me(1:end-2),1,ng));
0281     end
0282 
0283     v=1-abs(linspace(-1,1,ng));
0284     if any(w=='n')
0285         v=0.5-0.5*cos(v*pi);        % convert triangles to Hanning
0286     elseif any(w=='m')
0287         v=0.5-0.46/1.08*cos(v*pi);  % convert triangles to Hamming
0288     end
0289     v=v*sfact;                      % multiply by 2 if double sided
0290     v=repmat(v,p,1);
0291     if any(w=='y')                  % extend first and last filters
0292         v(1,xg(1,:)<fe(2))=sfact;
0293         v(end,xg(end,:)>fe(p+1))=sfact;
0294     end
0295     if any(w=='u')                  % scale to unity sum
0296         dx=(xg(:,3:end)-xg(:,1:end-2))/2;
0297         dx=dx(:,[1 1:ng-2 ng-2]);
0298         vs=sum(v.*dx,2);
0299         v=v./repmat(vs+(vs==0),1,ng)*fs/n;
0300     end
0301     plot(xg',v','b');
0302     set(gca,'xlim',[fe(1) fe(end)]);
0303     xlabel(['Frequency (' v_xticksi 'Hz)']);
0304 end

v_melbankm

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SOURCE CODE