V_FILTBANKM determine matrix for a linear/mel/erb/bark-spaced v_filterbank [X,IL,IH]=(P,N,FS,FL,FH,W) Usage: (1) Calcuate the Mel-frequency Cepstral Coefficients f=v_rfft(s); % v_rfft() returns only 1+floor(n/2) coefficients x=v_filtbankm(p,n,fs,0,fs/2,'m'); % n is the fft length, p is the number of filters wanted z=log(x*abs(f).^2); % multiply the power spectrum by x to get log mel-spectrum c=dct(z); % take the DCT to get the mel-cepstrum (2) Calcuate the Mel-frequency Cepstral Coefficients efficiently f=fft(s); % n is the fft length, p is the number of filters wanted [x,cf,na,nb]=v_filtbankm(p,n,fs,0,fs/2,'m'); % na:nb gives the fft bins that are needed z=log(x*(f(na:nb)).*conj(f(na:nb))); % multiply x by the power spectrum c=dct(z); % take the DCT (3) Plot the calculated filterbanks as a graph or spectrogram v_filtbankm(p,n,fs,0,fs/2,'mg'); % use option 'mg' for a graph or 'mG' for a spectrogram (4) Convert to mel-spectrum and back again [x,cf,xi]=v_filtbankm(p,n,fs,0,fs/2,'mxXzq'); % n is the fft length, p is the number of filters wanted f=v_rfft(s); % v_rfft() returns only 1+floor(n/2) coefficients z=x*abs(f).^2; % multiply the power spectrum by x to get mel-spectrum gp=xi*z; % multiply by xi to recover the approximate power spectrum g=v_irfft(sqrt(gp).*exp(1i*angle(f))); % take the inverse DFT using the original phase to recover the time domain signal Inputs: p number of filters in v_filterbank or the filter spacing in k-mel/bark/erb (see 'p' and 'P' options) [ceil(4.6*log10(fs))] n length of dft fs sample rate in Hz fl low end of the lowest filter in Hz (or in mel/erb/bark/log10 with 'h' option) [default = 0Hz or, if 'l' option given, 30Hz] fh high end of highest filter in Hz (or in mel/erb/bark/log10 with 'h' option) [default = fs/2] w any sensible combination of the following: 'b' = bark scale 'e' = erb-rate scale 'l' = log10 Hz frequency scale 'f' = linear frequency scale [default] 'm' = mel frequency scale 'n' = round to the nearest FFT bin so each row of x contains only one non-zero entry 'c' = fl specifies centre of low filters instead of low edge 'C' = fh specifies centre of high filter instead of high edge 'h' = fl & fh are in mel/erb/bark/log10 instead of Hz 'H' = give cf outputs in mel/erb/bark/log10 instead of Hz 'x' = lowest filter remains at 1 down to 0 frequency 'X' = highest filter remains at 1 up to nyquist freqency 'p' = input p specifies the number of filters [default if p>=1] 'P' = input p specifies the approximate filter spacing in kHz/kmel/... [default if p<1] 'z' = Treat input power spectrum at 0Hz as an impulse rather than being diffuse 'Z' = Treat input power spectrum at 0Hz as the sum of an impulse and a continuous component with the same amlitude as the adjacent bin 'q' = The first output filter gives the power of the impulse at 0Hz (regardless of the 'D' option). 'zq' ensures exact retention of DC component by zi*z 'd' = input is power spectral density (power per Hz) instead of power 'D' = output is power spectral density (power per Hz) instead of power 's' = single-sided input: do not add power from symmetric negative frequencies (i.e. non-DC/Nyquist inputs have already been doubled) 'S' = single-sided output: include power from both positive and negative frequencies (this doubles the non-DC/Nyquist outputs) 'w' = size(x,2)=size(xi,1)=n rather than 1+floor(n/2) although the rightmost half of x is all zeros 'g' = plot filter coefficients as graph 'G' = plot filter coefficients as spectrogram image [default if no output arguments present] Legacy options: 'y'='xX', 'Y'='x', 'yY'='X', 'u'='dD', 'U'='D' Outputs: x(p,k) a sparse matrix containing the v_filterbank amplitudes If the il and ih output arguments are included then k=ih-il+1 otherwise k=1+floor(n/2) Note that, with the 'S' option, the peak filter values equal 2 to account for the energy in the negative FFT frequencies. cf(p) the v_filterbank centre frequencies in Hz (or in mel/erb/bark/log10 with 'H' option) xi(k,p) [optional] sparse matrix that is an approximate inverse of x il the lowest fft bin with a non-zero coefficient ih the highest fft bin with a non-zero coefficient The input power will be preserved if the options 'xXS' are given The output of the routine is a sparse filterbank matrix. The vector output of the filterbank can then be obtained by pre-multiplying an input power spectrum vector by the filterbank matrix. The input and output vectors can optionally be in either the power domain or the power spectral density domain. The routine implements the filterbank in two conceptual stages (which are merged in the practical implementation): Stage 1: The discrete input spectrum is converted to a continuous power spectral density using linear interpolation in frequency. Each element of the input spectrum influences a frequency interval of width 2d where d is the input frequency bin width. The DC component of the input is treated specially in one of three ways: (a) it can be treated as a normal element that influences an interval (-d,+d) like the other elements [default]; (b) it can be treated as an impulse at DC ['z' option]; (c) it can be treated as a mixture of an impulse and a normal component whose value equals that of the adjacent frequency bin ['Z' option]. Stage 2: The filterbank outputs are calculated by integrating the product of the continuous spectrum and a filter weight that is triangular in the frequency domain. Optionally, the first filterbank preserves the DC impulse component of the continuous spectrum ['q' option]. References: [1] S. S. Stevens, J. Volkman, and E. B. Newman. A scale for the measurement of the psychological magnitude of pitch. J. Acoust Soc Amer, 8: 185-190, 1937. [2] S. Davis and P. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoustics Speech and Signal Processing, 28 (4): 357-366, Aug. 1980.
0001 function [x,cf,xi,il,ih]=v_filtbankm(p,n,fs,fl,fh,w) 0002 %V_FILTBANKM determine matrix for a linear/mel/erb/bark-spaced v_filterbank [X,IL,IH]=(P,N,FS,FL,FH,W) 0003 % 0004 % Usage: 0005 % (1) Calcuate the Mel-frequency Cepstral Coefficients 0006 % 0007 % f=v_rfft(s); % v_rfft() returns only 1+floor(n/2) coefficients 0008 % x=v_filtbankm(p,n,fs,0,fs/2,'m'); % n is the fft length, p is the number of filters wanted 0009 % z=log(x*abs(f).^2); % multiply the power spectrum by x to get log mel-spectrum 0010 % c=dct(z); % take the DCT to get the mel-cepstrum 0011 % 0012 % (2) Calcuate the Mel-frequency Cepstral Coefficients efficiently 0013 % 0014 % f=fft(s); % n is the fft length, p is the number of filters wanted 0015 % [x,cf,na,nb]=v_filtbankm(p,n,fs,0,fs/2,'m'); % na:nb gives the fft bins that are needed 0016 % z=log(x*(f(na:nb)).*conj(f(na:nb))); % multiply x by the power spectrum 0017 % c=dct(z); % take the DCT 0018 % 0019 % (3) Plot the calculated filterbanks as a graph or spectrogram 0020 % 0021 % v_filtbankm(p,n,fs,0,fs/2,'mg'); % use option 'mg' for a graph or 'mG' for a spectrogram 0022 % 0023 % (4) Convert to mel-spectrum and back again 0024 % 0025 % [x,cf,xi]=v_filtbankm(p,n,fs,0,fs/2,'mxXzq'); % n is the fft length, p is the number of filters wanted 0026 % f=v_rfft(s); % v_rfft() returns only 1+floor(n/2) coefficients 0027 % z=x*abs(f).^2; % multiply the power spectrum by x to get mel-spectrum 0028 % gp=xi*z; % multiply by xi to recover the approximate power spectrum 0029 % g=v_irfft(sqrt(gp).*exp(1i*angle(f))); % take the inverse DFT using the original phase to recover the time domain signal 0030 % 0031 % Inputs: 0032 % p number of filters in v_filterbank or the filter spacing in k-mel/bark/erb (see 'p' and 'P' options) [ceil(4.6*log10(fs))] 0033 % n length of dft 0034 % fs sample rate in Hz 0035 % fl low end of the lowest filter in Hz (or in mel/erb/bark/log10 with 'h' option) [default = 0Hz or, if 'l' option given, 30Hz] 0036 % fh high end of highest filter in Hz (or in mel/erb/bark/log10 with 'h' option) [default = fs/2] 0037 % w any sensible combination of the following: 0038 % 0039 % 'b' = bark scale 0040 % 'e' = erb-rate scale 0041 % 'l' = log10 Hz frequency scale 0042 % 'f' = linear frequency scale [default] 0043 % 'm' = mel frequency scale 0044 % 0045 % 'n' = round to the nearest FFT bin so each row of x contains only one non-zero entry 0046 % 0047 % 'c' = fl specifies centre of low filters instead of low edge 0048 % 'C' = fh specifies centre of high filter instead of high edge 0049 % 'h' = fl & fh are in mel/erb/bark/log10 instead of Hz 0050 % 'H' = give cf outputs in mel/erb/bark/log10 instead of Hz 0051 % 0052 % 'x' = lowest filter remains at 1 down to 0 frequency 0053 % 'X' = highest filter remains at 1 up to nyquist freqency 0054 % 0055 % 'p' = input p specifies the number of filters [default if p>=1] 0056 % 'P' = input p specifies the approximate filter spacing in kHz/kmel/... [default if p<1] 0057 % 0058 % 'z' = Treat input power spectrum at 0Hz as an impulse rather than being diffuse 0059 % 'Z' = Treat input power spectrum at 0Hz as the sum of an impulse and a continuous component with the same amlitude as the adjacent bin 0060 % 'q' = The first output filter gives the power of the impulse at 0Hz (regardless of the 'D' option). 'zq' ensures exact retention of DC component by zi*z 0061 % 0062 % 'd' = input is power spectral density (power per Hz) instead of power 0063 % 'D' = output is power spectral density (power per Hz) instead of power 0064 % 0065 % 's' = single-sided input: do not add power from symmetric negative frequencies (i.e. non-DC/Nyquist inputs have already been doubled) 0066 % 'S' = single-sided output: include power from both positive and negative frequencies (this doubles the non-DC/Nyquist outputs) 0067 % 'w' = size(x,2)=size(xi,1)=n rather than 1+floor(n/2) although the rightmost half of x is all zeros 0068 % 0069 % 'g' = plot filter coefficients as graph 0070 % 'G' = plot filter coefficients as spectrogram image [default if no output arguments present] 0071 % 0072 % Legacy options: 'y'='xX', 'Y'='x', 'yY'='X', 'u'='dD', 'U'='D' 0073 % 0074 % Outputs: x(p,k) a sparse matrix containing the v_filterbank amplitudes 0075 % If the il and ih output arguments are included then k=ih-il+1 otherwise k=1+floor(n/2) 0076 % Note that, with the 'S' option, the peak filter values equal 2 to account for the energy in the negative FFT frequencies. 0077 % cf(p) the v_filterbank centre frequencies in Hz (or in mel/erb/bark/log10 with 'H' option) 0078 % xi(k,p) [optional] sparse matrix that is an approximate inverse of x 0079 % il the lowest fft bin with a non-zero coefficient 0080 % ih the highest fft bin with a non-zero coefficient 0081 % 0082 % The input power will be preserved if the options 'xXS' are given 0083 % 0084 % The output of the routine is a sparse filterbank matrix. The vector output of the filterbank can then be obtained 0085 % by pre-multiplying an input power spectrum vector by the filterbank matrix. The input and output vectors can optionally 0086 % be in either the power domain or the power spectral density domain. 0087 % The routine implements the filterbank in two conceptual stages (which are merged in the practical implementation): 0088 % 0089 % Stage 1: 0090 % The discrete input spectrum is converted to a continuous power spectral density using linear interpolation in frequency. 0091 % Each element of the input spectrum influences a frequency interval of width 2d where d is the input frequency bin width. 0092 % The DC component of the input is treated specially in one of three ways: (a) it can be treated as a normal element that 0093 % influences an interval (-d,+d) like the other elements [default]; (b) it can be treated as an impulse at DC ['z' option]; 0094 % (c) it can be treated as a mixture of an impulse and a normal component whose value equals that of the adjacent frequency 0095 % bin ['Z' option]. 0096 % 0097 % Stage 2: 0098 % The filterbank outputs are calculated by integrating the product of the continuous spectrum and a filter weight that is 0099 % triangular in the frequency domain. Optionally, the first filterbank preserves the DC impulse component of the continuous 0100 % spectrum ['q' option]. 0101 % 0102 % References: 0103 % 0104 % [1] S. S. Stevens, J. Volkman, and E. B. Newman. A scale for the measurement 0105 % of the psychological magnitude of pitch. J. Acoust Soc Amer, 8: 185-190, 1937. 0106 % [2] S. Davis and P. Mermelstein. Comparison of parametric representations for 0107 % monosyllabic word recognition in continuously spoken sentences. 0108 % IEEE Trans Acoustics Speech and Signal Processing, 28 (4): 357-366, Aug. 1980. 0109 0110 % Bugs/Suggestions 0111 % (1) default frequencies won't work if the h option is specified 0112 % (2) low default frequency is invalid if the 'l' option is specified 0113 % (3) Add 'z' option to include a DC output as the first coefficient 0114 % (4) Add 'Z' option to ignore the DC input component 0115 % (5) Add 'i' option to calculate the inverse of x instead 0116 % (6) Add option to choose the domain in which linear interpolation is performed 0117 0118 % Copyright (C) Mike Brookes 1997-2024 0119 % Version: $Id: v_filtbankm.m $ 0120 % 0121 % VOICEBOX is a MATLAB toolbox for speech processing. 0122 % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html 0123 % 0124 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0125 % This program is free software; you can redistribute it and/or modify 0126 % it under the terms of the GNU General Public License as published by 0127 % the Free Software Foundation; either version 2 of the License, or 0128 % (at your option) any later version. 0129 % 0130 % This program is distributed in the hope that it will be useful, 0131 % but WITHOUT ANY WARRANTY; without even the implied warranty of 0132 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 0133 % GNU General Public License for more details. 0134 % 0135 % You can obtain a copy of the GNU General Public License from 0136 % http://www.gnu.org/copyleft/gpl.html or by writing to 0137 % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. 0138 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0139 0140 % Notes: 0141 % (1) In the comments, "FFT bin_0" assumes DC = bin 0 whereas "FFT bin_1" means DC = bin 1nfout 0142 % (2) "input" and "output" need to be interchanged if the 'i' option is given 0143 0144 if nargin<6 || isempty(w) % if no mode option, w, is specified 0145 w='f'; % default mode option: 'f' = linear output frequency scale 0146 end 0147 wr=max(any(repmat('lebm',length(w),1)==repmat(w',1,4),1).*(1:4)); % output warping: 0=linear,1=log,2=erbrate,3=bark,4=mel 0148 ww=any(repmat('ncChHxXyYpPzZqdDuUsSgGw',length(w),1)==repmat(w',1,23),1); % decode all other options 0149 % ww elements: 1=n,2=c,3=C,4=h,5=H,6=x,7=X,8=y,9=Y,10=p,11=P,12=z,13=Z,14=q,15=d,16=D,17=u,18=U,19=s,20=S,21=g,22=G,23=w 0150 % Convert legacy option codes: 'y'='xX', 'Y'='x', 'yY'='X', 'u'='dD', 'U'='D' 0151 ww(6)=ww(6) || (ww(8) ~= ww(9)); % convert 'y' or 'Y' (but not both) to 'x'; extend low frequencies 0152 ww(7)=ww(7) || ww(8); % convert 'y' to 'X'; extend high frequencies 0153 ww(15)=ww(15) || ww(17); % convert 'u' to 'd'; input is also in power spectral density 0154 ww(16)=ww(16) || ww(17) || ww(18); % convert 'u' or 'U' to 'd'; output is in power spectral density 0155 flhconv=repmat(wr>0 && ~ww(4),1,2); % flag indicating need to convert filterbank limits from Hz to mel/erb/bark/log10 0156 if nargin < 4 || isempty(fl) 0157 fl=30*(wr==1); % lower limit is 0 Hz unless 'l' option specified, in which case it is 30 Hz 0158 flhconv(1)=wr>0; 0159 end 0160 if nargin < 5 || isempty(fh) 0161 fh=0.5*fs; % max freq is the nyquist frequency 0162 flhconv(2)=wr>0; 0163 end 0164 % 0165 % Sort out input frequency bins 0166 % 0167 if numel(n)>1 0168 error('non-standard input frequency spacing no longer supported'); 0169 else % n gives dft length 0170 nf=1+floor(n/2); % number of input positive-frequency bins from DFT 0171 df=fs/n; % input frequency bin spacing 0172 end 0173 % 0174 % Sort out output frequency bins 0175 % 0176 mflh=[fl fh]; % low and high limits of filterbank triangular filters 0177 if any(flhconv) % convert mflh from Hz to mel/erb/... unless already converted via 'h' option 0178 switch wr 0179 case 1 % 'l' = log scaled 0180 if fl<=0 0181 error('Low frequency limit must be >0 for ''l'' log10-frequency option'); 0182 end 0183 mflh(flhconv)=log10(mflh(flhconv)); % convert frequency limits into log10 Hz 0184 case 2 % 'e' = erb-rate scaled 0185 mflh(flhconv)=v_frq2erb(mflh(flhconv)); % convert frequency limits into erb-rate 0186 case 3 % 'b' = bark scaled 0187 mflh(flhconv)=v_frq2bark(mflh(flhconv)); % convert frequency limits into bark 0188 case 4 % 'm' = mel scaled 0189 mflh(flhconv)=v_frq2mel(mflh(flhconv)); % convert frequency limits into mel 0190 end 0191 end 0192 melrng=mflh(2)-mflh(1); % filterbank range in Hz/mel/erb/... 0193 if isempty(p) 0194 p=ceil(4.6*log10(2*(nf-1)*df)); % default number of output filters 0195 end 0196 puc=ww(11) || (p<1) && ~ww(10); % input p specifies the filter spacing rather than the number of filters 0197 if puc 0198 p=round(melrng/(p*1000))+ww(2)+ww(3)-1+ww(14); % p now gives the number of filters (excluding DC impulse) 0199 end 0200 melinc=melrng/(p+ww(2)+ww(3)+1-ww(14)); % inter-filter increment in mel 0201 mflh=mflh+[-ww(2) ww(3)]*melinc; % update mflh to include the full width of all filters 0202 % 0203 % Calculate the output centre frequencies in Hz including dummy end points 0204 % 0205 pmq=p-ww(14); % number of filters excluding the one for the DC impulse 0206 cf=mflh(1)+(0:pmq+1)*melinc; % centre frequencies in mel/erb/... including dummy ends 0207 cf(2:end)=max(cf(2:end),0); % only the first point can be negative [*** doesn't make sense for log scale ***] 0208 switch wr % convert centre frequencies to Hz from mel/erb/... 0209 case 1 % 'l' = log scaled 0210 mb=10.^(cf); 0211 case 2 % 'e' = erb-rate scaled 0212 mb=v_erb2frq(cf); 0213 case 3 % 'b' = bark scaled 0214 mb=v_bark2frq(cf); 0215 case 4 % 'm' = mel scaled 0216 mb=v_mel2frq(cf); 0217 otherwise % [default] = linear scaled; no conversion needed 0218 mb=cf; 0219 end 0220 % 0221 % sort out 2-sided input frequencies 0222 % 0223 fin=(-nf:nf)*df; % reflect negative frequencies excluding DC 0224 nfin=length(fin); % length of extended input frequency list [nfin=2*nf+1] 0225 % 0226 % now sort out the list of output frequencies 0227 % 0228 fout=mb; % output centre frequencies in Hz including dummy values at each end 0229 highex=ww(7) && (fout(end-1)<fin(end)); % extend at high end if 'X' specified and final centre frequency < Nyquist 0230 if ww(6) % ww(6)='x': extend first filter at low end to DC 0231 fout=[0 0 fout(2:end)]; % ... add two dummy values at DC instead of previous single dummy value 0232 end 0233 if highex % extend last filter at high end to Nyquist 0234 fout=[fout(1:end-1) fin(end) fin(end)]; % ... add two dummy values at Nyquist instead of previous single dummy value 0235 end 0236 fout=min(fout,fs/2); % limit output filters to Nyquist frequency 0237 nfout=length(fout); % number of output filters including one or two dummy points at each end 0238 foutin=[fout fin]; 0239 nfall=length(foutin); % = nfout + nfin 0240 wleft=[0 fout(2:nfout)-fout(1:nfout-1) 0 fin(2:nfin)-fin(1:nfin-1)]; % width of lower triangle attached to each node 0241 wright=[wleft(2:end) 0]; % width of upper triangle attached to each node 0242 ffact=[0 ones(1,nfout-2) 0 0 ones(1,2*nf-1) 0]; % valid triangle posts 0243 ffact(wleft+wright==0)=0; % disable null width triangles (*** probably unnecessary if all frequencies are distinct ***) 0244 [fall,ifall]=sort(foutin); % fall is sorted frequencies with fall=foutin(ifall) 0245 jfall=zeros(1,nfall); % create inverse index ... 0246 infall=1:nfall; % ... 0247 jfall(ifall)=infall; % ... inverse-index satisfying foutin=fall(jfall) 0248 ffact(ifall([1:max(jfall(1),jfall(nfout+1))-2 min(jfall(nfout),jfall(nfall))+2:nfall]))=0; % zap input nodes that lie outside the output filters 0249 nxto=cumsum(ifall<=nfout); % next output node to the right (or equal) to each node 0250 nxti=cumsum(ifall>nfout); % number of input nodes to the left (or equal) to each node 0251 nxtr=min(nxti+1+nfout,nfall); % next input node to the right of each value (or nfall if none) 0252 nxtr(ifall>nfout)=1+nxto(ifall>nfout); % next post to the right of opposite input/output type (using sorted indexes) 0253 nxtr=nxtr(jfall); % next post to the right of opposite input/output type (converted to unsorted indices) or if none: nfall or (nfout+1) 0254 % 0255 % The interpolated spectrum at any frequency can be expressed as the sum of the values at the adjacent input bins 0256 % multiplied by triangular weights that decreases from 1 to 0 between the two bins. The value at an output bin 0257 % is equal to the integral of the interpolated spectrum multiplied by a triangular weight that decreases from 0258 % 1 to 0 either side of the output bin. Thus, if all input/output bins are sorted into ascending order, the 0259 % interval between two adjacent bins contains four partial triangles (a.k.a. trapeziums): two "lower" triangles 0260 % that increase with frequency and two "upper" triangles that decrease with frequency. We need to integrate the 0261 % resultant four input-output trapezium products and add the integrals onto the sum for the appropriate output bins. 0262 % Each triangle has a "post" at one end and is zero at the other end; we enumerate the triangle pairs by pairing 0263 % all input and output triangles with the first available triangle of the other type (i.e. output or input) whose 0264 % rightmost node is to the right of the entire first triangle. 0265 % 0266 % The general result for integrating the product of two trapesiums with 0267 % heights (a,b) and (c,d) over a width x is (ad+bc+2bd+2ac)*x/6 0268 % 0269 % integrate product of lower triangles whose posts (and rightmost nodes) are ix1 and jx1 0270 % 0271 msk0=(ffact>0); % posts with a non-zero magnitude 0272 msk=msk0 & (ffact(nxtr)>0); % select triangle pairs with both posts having non-zero magnitudes 0273 ix1=infall(msk); % unsorted indices of leftmost post of pair 0274 jx1=nxtr(msk); % unsorted indices of rightmost post of pair 0275 vfgx=foutin(ix1)-foutin(jx1-1); % portion of triangle attached to rightmost post that lies to the left of the leftmost post 0276 yx=min(wleft(ix1),vfgx); % integration length. Maybe more efficient: dfall=diff(fall); yx=dfall(jfall(ix1)-1) 0277 wx1=ffact(ix1).*ffact(jx1).*yx.*(wleft(ix1).*vfgx-yx.*(0.5*(wleft(ix1)+vfgx)-yx/3))./(wleft(ix1).*wleft(jx1)+(yx==0)); 0278 0279 % integrate product of upper triangles whose posts are ix2 and jx2 and whose rightmost nodes are ix2+1 and jx2+1 0280 0281 nxtu=max([nxtr(2:end)-1 0],1); % post of the upper triangle of opposite type whose rightmost end is to the right of this triangle's rightmost end 0282 msk=msk0 & (ffact(nxtu)>0); % select triangle pairs with both posts having non-zero magnitudes 0283 ix2=infall(msk); % unsorted indices of leftmost post of pair 0284 jx2=nxtu(msk); % unsorted indices of rightmost post of pair 0285 vfgx=foutin(ix2+1)-foutin(jx2); % length of left triangle to the right of the right post 0286 yx=min(wright(ix2),vfgx); % integration length 0287 yx(foutin(jx2+1)<foutin(ix2+1))=0; % zap invalid triangles where the rightmost ends are in the wrong order 0288 wx2=ffact(ix2).*ffact(jx2).*yx.^2.*((0.5*(wright(jx2)-vfgx)+yx/3))./(wright(ix2).*wright(jx2)+(yx==0)); 0289 0290 % integrate lower triangle and upper triangle that ends to its right 0291 0292 nxtu=max(nxtr-1,1); % post of the upper triangle of opposite type whose rightmost end is to the right of this triangle's post 0293 msk=msk0 & (ffact(nxtu)>0); % select triangle pairs with both posts having non-zero magnitudes 0294 ix3=infall(msk); % unsorted indices of lower triangle 0295 jx3=nxtu(msk); % unsorted indices of upper triangle 0296 vfgx=foutin(ix3)-foutin(jx3); % length of upper triangle to the left of the lower post 0297 yx=min(wleft(ix3),vfgx); % integration length 0298 yx(foutin(jx3+1)<foutin(ix3))=0; % zap invalid triangles where the rightmost ends are in the wrong order 0299 wx3=ffact(ix3).*ffact(jx3).*yx.*(wleft(ix3).*(wright(jx3)-vfgx)+yx.*(0.5*(wleft(ix3)-wright(jx3)+vfgx)-yx/3))./(wleft(ix3).*wright(jx3)+(yx==0)); 0300 0301 % integrate upper triangle and lower triangle that starts to its right 0302 0303 nxtu=[nxtr(2:end) 1]; 0304 msk=msk0 & (ffact(nxtu)>0); % select triangle pairs with both posts having non-zero magnitudes 0305 ix4=infall(msk); % unsorted indices of upper triangle 0306 jx4=nxtu(msk); % unsorted indices of lower triangle 0307 vfgx=foutin(ix4+1)-foutin(jx4-1); % length of upper triangle to the left of the lower post 0308 yx=min(wright(ix4),vfgx); % integration length 0309 wx4=ffact(ix4).*ffact(jx4).*yx.^2.*(0.5*vfgx-yx/3)./(wright(ix4).*wleft(jx4)+(yx==0)); 0310 % 0311 % now assemble the matrix 0312 % 0313 iox=sort([ix1 ix2 ix3 ix4;jx1 jx2 jx3 jx4]); % iox(1,:) are output posts, iox(2,:) are input posts 0314 msk=iox(2,:)<=(nfall+nfout)/2; % find references to negative input frequencies 0315 iox(2,msk)=(nfall+nfout+1)-iox(2,msk); % convert negative frequencies to positive 0316 % 0317 % Sort out output gains: 0318 % If output is power then output gain is 1; if output is power/Hz then output gain is 1/area of output filter 0319 % 0320 if ww(6) % ww(6)='x': if lowest filter extended to DC, we added a dummy point at 0Hz, so 0321 iox(1,iox(1,:)==2)=3; % merge lowest two output nodes 0322 end 0323 if highex % if highest filter extended, we added a dummy point at Nyquist, so 0324 iox(1,iox(1,:)==nfout-1)=nfout-2; % merge highest two output nodes 0325 end 0326 x=sparse(iox(1,:)-1-ww(6),max(iox(2,:)-nfout-nf,1),[wx1 wx2 wx3 wx4],pmq,nf); % forward transformation matrix without input/output gains 0327 gout=full(sum(x,2)); % area of each output integral 0328 goutd=sparse(1:pmq,1:pmq,(gout+(gout==0)).^(-1)); % create sparse diagonal matrix of output gains 0329 gouti=full(sum(x(:,1+ww(12):end),2)); % area of each output integral excluding DC if 'z' option given 0330 goutid=sparse(1:pmq,1:pmq,(gouti+(gouti==0)).^(-1)); % create sparse diagonal matrix of output gains 0331 % 0332 % Sort out input gains: 0333 % If input is power then input gain is 1/area; if input is power/Hz then input gain is 1 0334 % 0335 gin=fin(3:nfin)-fin(1:nfin-2); % full width of input interpolation filters 0336 gin=2*(gin+(gin==0)).^(-1); % input gain equals 1/area 0337 ginsi=repmat(1+ww(19),1,nf-2); % 's' option means all inputs except DC and Nyquist have been doubled 0338 ginsd=sparse(1:nf,1:nf,[1-ww(12) ginsi.^(-1) 1]); % ... cancel this out with additional input scaling for forward transform 0339 ginsid=sparse(1:nf,1:nf,[2*(1-ww(12)) ginsi 2]); % and back again for inverse transform 0340 gind=sparse(1:nf,1:nf,gin(end-nf+1:end)); % input gains 0341 % 0342 % Now create the x and xi matrices 0343 % 0344 switch 2*ww(16)+ww(15) 0345 case 0 % '': input and output are both power 0346 xi=ginsid*x'*goutid; 0347 x=x*(gind*ginsd); 0348 case 1 % 'd': input is power/Hz, output is power 0349 xi=(ginsid*gind)*x'*goutid; 0350 x=x*ginsd; 0351 case 2 % 'D': input is power, output is power/Hz 0352 xi=ginsid*x'; 0353 x=goutd*x*(gind*ginsd); 0354 case 3 % 'dD': input and output are both power/Hz 0355 xi=(ginsid*gind)*x'; 0356 x=goutd*x*ginsd; 0357 end 0358 if ww(20) % 'S': double outputs to include negative frequency energy 0359 x=2*x; 0360 xi=0.5*xi; 0361 end 0362 if ww(13) % 'Z': DC input is an impulse plus a diffuse component 0363 x(:,2)=x(:,2)+x(:,1)*ginsd(2,2); % power of diffuse component at DC is equal to that opf adjacent bin corrected for 's' option 0364 x(:,1)=0; % Eliminate references to DC input in forward transform only 0365 end 0366 if ww(14) % 'q': we need an extra output that replicates the DC component 0367 if ww(12) % 'z': DC input is an impulse 0368 x=[sparse(1,1,1,1,nf); x]; 0369 xi=[sparse(1,1,1,nf,1) xi]; 0370 elseif ww(13) % 'Z': DC input is an impulse plus a diffuse component 0371 x=[sparse([1 1],[1 2],[1 -ginsd(2,2)],1,nf); x]; % impulse component is DC input minus adjacent bin corrected for 's' option 0372 xi=[sparse(1,1,1,nf,1) xi]; 0373 else 0374 x=[sparse(1,nf); x]; % '': DC input is diffuse as normal 0375 xi=[sparse(nf,1) xi]; 0376 end 0377 end 0378 % 0379 % sort out the output argument options 0380 % 0381 if ~ww(5) % output cf in Hz instead of mel/erb/... 0382 cf=[zeros(1,ww(14)) mb(2:pmq+1)]; % ... and include an initial 0 if 'q' option (ww(14)==1) 0383 else % keep cf in mel/erb/... 0384 if ww(14) % 'q' (ww(14)==1): we need an extra output for the DC component 0385 if wr==1 % log-scaled so ... 0386 cf=[-Inf cf(2:p)]; % ... DC corresponds to -Inf 0387 else % not log-scaled ... 0388 cf=[0 cf(2:p)]; % ... DC corresponds to 0 0389 end 0390 else % no 'q' option (ww(14)==0) ... 0391 cf=cf(2:p+1); % ... just remove dummy end frequencies 0392 end 0393 end 0394 if ww(1) % round outputs to the centre of gravity bin 0395 sx2=sum(x,2); % sum of each row 0396 msk=full(sx2~=0); 0397 vxc=zeros(pmq,1); 0398 vxc(msk)=round((x(msk,:)*(1:nf)')./sx2(msk)); % find centre of gravity of each row 0399 x=sparse(1:pmq,vxc,sx2,pmq,nf); % put all the weight into the centre of gravity bin 0400 end 0401 il=1; % default range is entire x maxtrix 0402 ih=nf; 0403 if nargout > 3 % if il and/or ih output arguments are specified ... 0404 if nargout==4 % xi has been omitted ... 0405 msk=full(any(x>0,1)); % find non-zero columns in x 0406 else % xi output included 0407 msk=full(any(x>0,1) | any(xi>0,2)'); % find non-zero columns in x or rows in xi 0408 end 0409 il=find(msk,1); % il is first non-zero column 0410 if ~numel(il) % if x is all zeros ... 0411 il=1; % ... set il and ih to 1 0412 ih=1; 0413 elseif nargout >3 0414 ih=find(msk,1,'last'); % ih is last non-zero column 0415 end 0416 x=x(:,il:ih); % remove redundant columns from x 0417 if nargout==4 % xi has been omitted ... 0418 xi=il; % shift the il and ih outputs up by one position 0419 il=ih; 0420 else 0421 xi=xi(il:ih,:); % remove redundant rows from xi 0422 end 0423 elseif ww(23) % ww(23)='w': use whole dft 0424 x=[x sparse(p,n-nf)]; % append zeros onto x 0425 xi=[xi; xi(n-nf+1:-1:2,:)]; % reflect elements other than the DC and Nyquist 0426 end 0427 % 0428 % plot results if no output arguments or 'g','G' options given 0429 % 0430 if ~nargout || ww(21) || ww(22) % plot idealized filters 0431 ww(22)=~ww(21); % 'G' option is the default unless 'g' is specified 0432 finax=(il-1:ih-1)*df; % input frequency axis 0433 newfig=0; 0434 if ww(21) 0435 plot(finax,x(:,il:ih)'); 0436 hold on 0437 plot(finax,sum(x,1),'--k'); 0438 v_axisenlarge([-1 -1.05]); 0439 plot(repmat(mb(2:end-1),2,1),get(gca,'ylim'),':k'); 0440 hold off 0441 title(['filtbankm: mode = ''' w '''']); 0442 xlabel(['Frequency (' v_xticksi 'Hz)']); 0443 ylabel('Weight'); 0444 newfig=1; 0445 end 0446 if ww(22) 0447 if newfig 0448 figure; 0449 end 0450 imagesc(finax,1:pmq,x); 0451 axis 'xy' 0452 colorbar; 0453 hold on 0454 ylim=get(gca,'ylim'); 0455 plot(repmat(mb(2:end-1),2,1),ylim,':w'); 0456 hold off 0457 v_cblabel('Weight'); 0458 switch wr 0459 case 1 0460 type='Log-spaced'; 0461 case 2 0462 type='Erb-spaced'; 0463 case 3 0464 type='Bark-spaced'; 0465 case 4 0466 type='Mel-spaced'; 0467 otherwise 0468 type='Linear-spaced'; 0469 end 0470 ylabel([type ' Filter']); 0471 xlabel(['Frequency (' v_xticksi 'Hz)']); 0472 title(['filtbankm: mode = ''' w '''']); 0473 end 0474 0475 end