V_STFTW converts a time-domain signal into the time-frequency domain with the Short-time Fourier Transform [Y,SO,T,F]=(X,NW,M,OV) Usage: (1) [y,so]=v_stftw(x,nw); % default parameters: m='rM', ov=2 % ... % time-frequency domain processing goes here z=v_istftw(y,so); % z is the same as x except for the first and last half-frames (2) [y,so]=v_stftw(x,nw,'en',4); % Hanning window with ov=4 z=v_istftw(y,so); % z is the same as x everywhere due to 'e' option (3) [y,so]=v_stftw([],nw,'pn',4); % initialise v_stftw state variable, so io=[]; % initialise v_istftw state variable, io while true % loop forever % ... [acquire input chunk x] % acquire new chunk of input samples [y,so]=v_stftw(x,so,'p'); % 'p' option means more chunks to come % ... [process in T-F domain] % process signal in time-frequency domain [z,io]=v_istftw(y,so,io); % convert back to time domain % ... [output chunk z]; % output chunk of processed samples end (4) v_stftw(x,nw); % Plot spectrogram of first channel (5) [y1,so]=v_stftw(x(1:1000),nw,'ep'); % convert chunk 1 of 3 [y2,so]=v_stftw(x(1001:2000),so,'p'); % Options other than 'p' will be remembered [y3,so]=v_stftw(x(2001:end),so); % Omit 'p' for final chunk (which can have x=[] if desired) y=[y1; y2; y3]; % Concatenate to get complete T-F data [z1,io]=v_istftw(y(1:5,:),so); % convert back to time domain [z2,io]=v_istftw(y(6:10,:),so,io); % convert back to time domain z3=v_istftw(y(11:end,:),so,io); % omit 'io' output on final chunk z=[z1; z2; z3]; % z is the same as x everywhere due to 'e' option Inputs: x(t,...) input signal (one signal per column) nw Window length (will be rounded up to a multiple of ov) alternatively, the so output from the previous chunk's call to v_stftw m mode string including window code ov integer overlap factor; the frame hop is nw/ov. [2] nt Optional DFT length (normally >=nw). [Default is nw as modified by 'i','I','u' options] Outputs: y(tf,k,...) output STFT (frame=tf, freq=k). Number of frequencies is normally 1+nw/2 from DC to Nyquist tax sample numbers corresponding to the centre of each frame; divide by fs to get times fax normalized centre frequencies of each bin (multiply by fs for actual frequencies) so structure used in subsequent call as the nw argument to alllow processing in chunks. Also used in v_istftw The mode string, m, is a character string containing a combination of the following options and a window code (see below): 'p' x is a partial signal which will be followed by another chunk (prevents paddding of the final frame)
0001 function [y,so,tax,fax]=v_stftw(x,nw,m,ov,nt) 0002 %V_STFTW converts a time-domain signal into the time-frequency domain with the Short-time Fourier Transform [Y,SO,T,F]=(X,NW,M,OV) 0003 % Usage: (1) [y,so]=v_stftw(x,nw); % default parameters: m='rM', ov=2 0004 % % ... % time-frequency domain processing goes here 0005 % z=v_istftw(y,so); % z is the same as x except for the first and last half-frames 0006 % 0007 % (2) [y,so]=v_stftw(x,nw,'en',4); % Hanning window with ov=4 0008 % z=v_istftw(y,so); % z is the same as x everywhere due to 'e' option 0009 % 0010 % (3) [y,so]=v_stftw([],nw,'pn',4); % initialise v_stftw state variable, so 0011 % io=[]; % initialise v_istftw state variable, io 0012 % while true % loop forever 0013 % % ... [acquire input chunk x] % acquire new chunk of input samples 0014 % [y,so]=v_stftw(x,so,'p'); % 'p' option means more chunks to come 0015 % % ... [process in T-F domain] % process signal in time-frequency domain 0016 % [z,io]=v_istftw(y,so,io); % convert back to time domain 0017 % % ... [output chunk z]; % output chunk of processed samples 0018 % end 0019 % 0020 % (4) v_stftw(x,nw); % Plot spectrogram of first channel 0021 % 0022 % (5) [y1,so]=v_stftw(x(1:1000),nw,'ep'); % convert chunk 1 of 3 0023 % [y2,so]=v_stftw(x(1001:2000),so,'p'); % Options other than 'p' will be remembered 0024 % [y3,so]=v_stftw(x(2001:end),so); % Omit 'p' for final chunk (which can have x=[] if desired) 0025 % y=[y1; y2; y3]; % Concatenate to get complete T-F data 0026 % [z1,io]=v_istftw(y(1:5,:),so); % convert back to time domain 0027 % [z2,io]=v_istftw(y(6:10,:),so,io); % convert back to time domain 0028 % z3=v_istftw(y(11:end,:),so,io); % omit 'io' output on final chunk 0029 % z=[z1; z2; z3]; % z is the same as x everywhere due to 'e' option 0030 % 0031 % Inputs: x(t,...) input signal (one signal per column) 0032 % nw Window length (will be rounded up to a multiple of ov) 0033 % alternatively, the so output from the previous chunk's call to v_stftw 0034 % m mode string including window code 0035 % ov integer overlap factor; the frame hop is nw/ov. [2] 0036 % nt Optional DFT length (normally >=nw). [Default is nw as modified by 'i','I','u' options] 0037 % 0038 % Outputs: y(tf,k,...) output STFT (frame=tf, freq=k). Number of frequencies is normally 1+nw/2 from DC to Nyquist 0039 % tax sample numbers corresponding to the centre of each frame; divide by fs to get times 0040 % fax normalized centre frequencies of each bin (multiply by fs for actual frequencies) 0041 % so structure used in subsequent call as the nw argument to 0042 % alllow processing in chunks. Also used in v_istftw 0043 % 0044 % The mode string, m, is a character string containing a combination of the following options and a window code (see below): 0045 % 0046 % 'p' x is a partial signal which will be followed by another chunk (prevents paddding of the final frame) 0047 0048 % 'r' pad final frame with reflected data if necessary [default] 0049 % 'z' pad final frame with zeros if necessary 0050 % 't' truncate data to an exact number of frames 0051 % 'e' pad with samples at beginning and end of signal to include all frames with at least one signal sample 0052 % 0053 % 'i' zero-pad by a factor of 2 to increase apparent frequency resolution 0054 % 'I' zero-pad by a factor of 4 to increase apparent frequency resolution 0055 % 'u' zero-pad to round transform length up to a power of 2 to speed computation 0056 % 0057 % 's' plot spectrogram of first channel (default if no outputs) 0058 % 'S' plot mean spectra and mean power waveforms of all channels 0059 % 0060 % Code Window wmode ov Sidelobe 3dB-BW 6dB-BW Falloff Comment 0061 % 0062 % 'c' cos(1) s 2,3,5 -23dB 1.2 1.6 -12dB/oct used in MP3 0063 % 'k' kaiser(5) boq 2 -23dB 1.2 1.7 -6dB/oct used in AAC 0064 % 'm' hamming s 3,4,5 -43dB 1.3 1.8 -6dB/oct low sidelobes [default if ov>2] 0065 % 'M' hamming sq 2,3,5 -24dB 1.1 1.5 -6dB/oct sqrt Hamming [default if ov=2] 0066 % 'n' hanning s 3,4,5 -31dB 1.4 2.0 -18dB/oct rapid falloff 0067 % 'R' rectangle 1 -13dB 0.9 1.2 -6dB/oct narrow bandwidth [default if ov=1] 0068 % 'v' vorbis s 2,15 -21dB 1.3 1.8 -18dB/oct used in Vorbis 0069 % 'V' rsqvorbis sq 2 -26dB 1.1 1.5 -6dB/oct sqrt Vorbis 0070 % 0071 % where wmode is the mode string for the v_windows call. 0072 % Notes: 0073 % 0074 % (1) The scaling preserves power in a 2-sided transform so that 0075 % 0076 % p = mean(x.^2) ~=~ mean(sum(abs(yy).^2,2)) where yy=[y conj(y(:,nw-size(y,2)+1:-1:2))] 0077 % 0078 % where yy is the double-sided spectrum. This equivalence is stochastic rather than exact. 0079 % (2) For perfect reconstruction (see [3]), the overlap factor, ov, must be a multiple of one of 0080 % the values listed above in the ov column for the selected window. 0081 % (3) Using the default parameters, perfect reconstruction will not be obtained for the first and 0082 % the last frames of the signal. Using the 'e' option will add padding frames to the start and 0083 % end of the signal so that the entire signal will be reconstructed perfectly. 0084 % (4) By default, padding at the start and end of the signal will avoid introducing a discontinuity 0085 % by using a time-reversed reflection of the input signal. Alternatively, the 'z' option will 0086 % pad with zeros and the 't' option will truncate the signal to fit into an exact number of frames. 0087 % (5) To avoid temporal aliasing in the time-frequency domain , an overlap factor of ov>=4 is required for 0088 % most windows. Using an overlap factor of ov=2 (the default) halves the number of frames to process 0089 % but may result in aliasing for some signals although this is normally slight (see [1]). 0090 % 0091 % Refs: [1] J. B. Allen. Short term spectral analysis, synthesis, and modification by discrete Fourier transform. 0092 % IEEE Trans. Acoustics, Speech and Signal Processing, 25 (3): 235–238, June 1977. 0093 % [2] R. Crochiere. A weighted overlap-add method of short-time fourier analysis/synthesis. 0094 % IEEE Trans. Acoustics, Speech and Signal Processing, 28 (1): 99–102, 1980. 0095 % [3] J. Princen and A. Bradley. Analysis/synthesis filter bank design based on time domain aliasing cancellation. 0096 % IEEE Trans. Acoustics, Speech and Signal Processing, 34 (5): 1153–1161, 1986. doi: 10.1109/TASSP.1986.1164954. 0097 % 0098 % Revision History 0099 % 2022-04-10 First version 0100 persistent nw0 nt0 wi0 w0 wt wm wp 0101 if isempty(nw0) 0102 nw0=0; 0103 wt={'rectangle','hamming','hamming','cos','kaiser','hanning','rsqvorbis','vorbis'}; 0104 wm={'','sq','s','s','boq','s','sq','s'}; 0105 wp=[0 0 0 1 5 0 0 0]; 0106 end 0107 sx=size(x); 0108 if sx(1)==1 && length(sx)==2 % transpose x if it is a row vector 0109 x=x.'; 0110 sx=size(x); 0111 end 0112 nx=sx(1); % number of samples 0113 nc=prod(sx(2:end)); % total number of channels 0114 x=reshape(x,nx,nc); % make input data 2-dimensional 0115 if nargin<3 || ~numel(m) % no mode options specified 0116 m=''; % use default modes 0117 end 0118 if isstruct(nw) % nw is structure from previous call to v_stftw 0119 so=nw; % replicate into so since most fields will be identical 0120 wa=nw.wa; % analysis window 0121 nh=nw.nh; % frame hop (samples) 0122 po=nw.po; % padding option 0123 nt=nw.nt; % overwrite nt with length of transform 0124 me=nw.me; % 'e' option given 0125 ff=nw.ff; % first frame flag 0126 mf=nw.mf; 0127 if ~isempty(nw.xx) % there is a data tail from a previous call 0128 x=cat(1,nw.xx,x); % pre-append saved data 0129 end 0130 so.mx=so.mx+nx; % cumulative signal length (using original value of nx) 0131 sx=size(x); % recalculate sx (especially necessary if input argument x=[] 0132 nx=sx(1); % recalculate the number of samples 0133 nw=length(wa); % overwrite nw with the window length 0134 else 0135 if nargin<4 || ~numel(ov) 0136 ov=2; % default overlap factor 0137 else 0138 ov=round(ov); % force integer overlap factor 0139 end 0140 nh=ceil(nw/ov); % force integer hop length 0141 nw=ov*nh; % recalculate DFT length 0142 wi=find(sum(repmat(m(:),1,8)==repmat('RMmcknVv',length(m),1),1),1); % check if m specifies the window 0143 if isempty(wi) % if no window specified 0144 wi=1+(ov>1)+(ov>2); % use 'R', 'M' or 'm' 0145 end 0146 % sort out zero-padding 0147 if nargin<5 % DFT length is not explicitly specified 0148 nt=nw; % default transform length 0149 if any(m=='i') % 'i' = multiply nt by 2 for increased frequency resolution 0150 nt=2*nt; 0151 end 0152 if any(m=='I') % 'I' = multiply nt by 4 for increased frequency resolution 0153 nt=4*nt; 0154 end 0155 if any(m=='u') % 'u' = round up nt to a power of 2 for increased frequency resolution 0156 [fw,pw]=log2(nt); 0157 nt=pow2(pw-(fw==0.5)); 0158 end 0159 end 0160 me=any(m=='e'); 0161 if nw==nw0 && wi==wi0 && nt==nt0 % check if window already cached 0162 wa=w0; % use cached version if available 0163 else 0164 wa = v_windows(wt{wi},nw,wm{wi},wp(wi)); % only need the parameter for 'c' and 'k' windows 0165 wa=wa/sqrt(sum(wa.^2)*nt); % scale to preserve power 0166 nw0=nw; % save parameters of cached window 0167 nt0=nt; 0168 wi0=wi; 0169 w0=wa; % save window in cache 0170 end 0171 % sort out enframing details 0172 ff=1; % flag to indicate the first frame 0173 po=find(sum(repmat(m(:),1,3)==repmat('zrt',length(m),1),1),1); % find padding option 0174 if isempty(po) 0175 po=2; % default to po=2 ('r') 0176 end 0177 % create so structure 0178 mf=0; 0179 so.wa=wa; % save analysis window 0180 so.po=po; % save padding option for final frame 0181 so.nh=nh; % save frame hop 0182 so.nt=nt; % length of transform 0183 so.me=me; % 'e' option given 0184 so.mx=nx; % cumulative signal length 0185 so.mf=0; % zero cumulative frames output so far (excluding prefix frames) 0186 end 0187 % now do the STFT 0188 mp=any(m=='p'); % 'p' option given: this is not the last chunk 0189 f0=(1-nw/nh)*(me && ff); % initial frame: either 0 or (1-ov) 0190 if nx>0 % there are some samples 0191 if me && ~mp % include all frames that contain at least one sample 0192 f1=ceil(nx/nh)-1; % last frame that include at least one sample 0193 elseif po<3 && ~mp % ensure all samples are in at least one frame 0194 f1=max(ceil((nx-nw)/nh),0); 0195 else % only include full frames 0196 f1=floor((nx-nw)/nh); % last full frame (might be negative and/or <f0) 0197 end 0198 else % input signal is empty 0199 f1=-1; 0200 end 0201 nf=max((f1-f0+1)*(f1>=0),0); % number of frames to output 0202 if nf>0 % if there are frames to output 0203 ff=0; % turn off first frame flag 0204 indx=repmat(nh*(f0:f1).',nw,1)+reshape(repmat(0:nw-1,nf,1),nf*nw,1); % index into x 0205 mk=indx<0 | indx>nx-1; % mask for values outside 0:nx-1 0206 indx=nx+0.5-abs(mod(indx,2*nx)+0.5-nx); % reflect values outside the range 0207 y=x(indx,:); % create output 0208 if po==1 % padding option was 'z' 0209 y(mk,:)=0; % set padding values to zero 0210 end 0211 y=v_rfft(reshape(y,[nf,nw,sx(2:end)]).*repmat(wa,[nf,1,sx(2:end)]),nt,2); % perform windowed DFT 0212 else 0213 y=zeros([0,nw,sx(2:end)]); 0214 end 0215 if nargout>1 % need additional outputs 0216 if nf>0 0217 tax=(f0+mf:f1+mf)*nh+0.5*(1+nw); % frame centre times (where input samples are 1,2,...) 0218 else 0219 tax=[]; 0220 end 0221 fax=(0:size(y,2)-1)/nt; % bin centre frequencies (normalized by sample frequency) 0222 if nf>0 0223 so.xx=x((f1+1)*nh+1:end,:); % save data tail for next chunk (starts at sample nh+1 of the last frame in y) 0224 else 0225 so.xx=x; % if nf==0 then data tail is entire signal (including any previous tail) 0226 end 0227 so.ff=ff; % first frame flag 0228 so.mp=mp; % flag indicating more chunks to follow 0229 so.mf=mf+nf+f0; % add number of additional non-prefix frames output 0230 end 0231 if nf>0 && (~nargout || any(lower(m)=='s')) % no outputs or 'sS' options so plot graphs 0232 ta=(f0+mf:f1+mf)*nh+0.5*(1+nw); 0233 nv=size(y,2); 0234 fa=(0:nv-1)/nt; 0235 clf; 0236 if any(m=='S') 0237 subplot(2,1,1); 0238 yp=abs(reshape(y,[nf,nv,nc])).^2; % calculate power spectrum 0239 plot(ta,10*log10(reshape(sum(cat(2,yp,yp(:,2:floor((nt+1)/2),:)),2),nf,nc))); % add in power from negative frequencies 0240 v_axisenlarge([-1 -1.05]); 0241 xlabel('Sample Number'); 0242 ylabel('Frame power (dB)'); 0243 subplot(2,1,2); 0244 plot(fa,10*log10(reshape(mean(yp,1),nv,nc))); 0245 v_axisenlarge([-1 -1.05]); 0246 xlabel('Normalized Frequency'); 0247 ylabel('Mean Power (dB)'); 0248 else 0249 imagesc(ta,fa,20*log10(abs(y(:,:,1)))'); % plot only the first channel 0250 colorbar; 0251 axis 'xy'; 0252 ylabel('Normalized Frequency'); 0253 xlabel('Sample Number'); 0254 v_cblabel('Energy (dB)'); 0255 hold on; 0256 clim=get(gca,'clim'); 0257 set(gca,'clim',[max(clim(1),clim(2)-40) clim(2)]); 0258 ylim=get(gca,'ylim'); 0259 plot([0.5;0.5],ylim',':k'); % indicate start of signal 0260 if ~mp 0261 plot([1;1]*(so.mx+0.5),ylim',':k'); % indicate end of signal 0262 end 0263 hold off 0264 end 0265 end 0266