v_stftw

PURPOSE ^

V_STFTW converts a time-domain signal into the time-frequency domain with the Short-time Fourier Transform [Y,SO,T,F]=(X,NW,M,OV)

SYNOPSIS ^

function [y,so,tax,fax]=v_stftw(x,nw,m,ov,nt)

DESCRIPTION ^

V_STFTW converts a time-domain signal into the time-frequency domain with the Short-time Fourier Transform [Y,SO,T,F]=(X,NW,M,OV)
  Usage:  (1) [y,so]=v_stftw(x,nw); % default parameters: m='rM', ov=2
              % ...                 % time-frequency domain processing goes here
              z=v_istftw(y,so);     % z is the same as x except for the first and last half-frames

          (2) [y,so]=v_stftw(x,nw,'en',4);         % Hanning window with ov=4
              z=v_istftw(y,so);                    % z is the same as x everywhere due to 'e' option

          (3) [y,so]=v_stftw([],nw,'pn',4);          % initialise v_stftw state variable, so
              io=[];                               % initialise v_istftw state variable, io
              while true                           % loop forever
                % ... [acquire input chunk x]         % acquire new chunk of input samples
                [y,so]=v_stftw(x,so,'p');           % 'p' option means more chunks to come
                % ... [process in T-F domain]      % process signal in time-frequency domain
                [z,io]=v_istftw(y,so,io);             % convert back to time domain
                % ... [output chunk z];            % output chunk of processed samples
              end

          (4) v_stftw(x,nw);                        % Plot spectrogram of first channel

          (5) [y1,so]=v_stftw(x(1:1000),nw,'ep');      % convert chunk 1 of 3
              [y2,so]=v_stftw(x(1001:2000),so,'p'); % Options other than 'p' will be remembered
              [y3,so]=v_stftw(x(2001:end),so);      % Omit 'p' for final chunk (which can have x=[] if desired)
              y=[y1; y2; y3];                       % Concatenate to get complete T-F data
              [z1,io]=v_istftw(y(1:5,:),so);          % convert back to time domain
              [z2,io]=v_istftw(y(6:10,:),so,io);    % convert back to time domain
              z3=v_istftw(y(11:end,:),so,io);         % omit 'io' output on final chunk
              z=[z1; z2; z3];                       % z is the same as x everywhere due to 'e' option

  Inputs: x(t,...)    input signal (one signal per column)
          nw       Window length (will be rounded up to a multiple of ov)
                       alternatively, the so output from the previous chunk's call to v_stftw
          m        mode string including window code
          ov       integer overlap factor; the frame hop is nw/ov. [2]
          nt       Optional DFT length (normally >=nw). [Default is nw as modified by 'i','I','u' options]

 Outputs: y(tf,k,...)  output STFT (frame=tf, freq=k). Number of frequencies is normally 1+nw/2 from DC to Nyquist
          tax          sample numbers corresponding to the centre of each frame; divide by fs to get times
          fax          normalized centre frequencies of each bin (multiply by fs for actual frequencies)
          so           structure used in subsequent call as the nw argument to
                       alllow processing in chunks. Also used in v_istftw

 The mode string, m, is a character string containing a combination of the following options and a window code (see below):

             'p'   x is a partial signal which will be followed by another chunk (prevents paddding of the final frame)

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [y,so,tax,fax]=v_stftw(x,nw,m,ov,nt)
0002 %V_STFTW converts a time-domain signal into the time-frequency domain with the Short-time Fourier Transform [Y,SO,T,F]=(X,NW,M,OV)
0003 %  Usage:  (1) [y,so]=v_stftw(x,nw); % default parameters: m='rM', ov=2
0004 %              % ...                 % time-frequency domain processing goes here
0005 %              z=v_istftw(y,so);     % z is the same as x except for the first and last half-frames
0006 %
0007 %          (2) [y,so]=v_stftw(x,nw,'en',4);         % Hanning window with ov=4
0008 %              z=v_istftw(y,so);                    % z is the same as x everywhere due to 'e' option
0009 %
0010 %          (3) [y,so]=v_stftw([],nw,'pn',4);          % initialise v_stftw state variable, so
0011 %              io=[];                               % initialise v_istftw state variable, io
0012 %              while true                           % loop forever
0013 %                % ... [acquire input chunk x]         % acquire new chunk of input samples
0014 %                [y,so]=v_stftw(x,so,'p');           % 'p' option means more chunks to come
0015 %                % ... [process in T-F domain]      % process signal in time-frequency domain
0016 %                [z,io]=v_istftw(y,so,io);             % convert back to time domain
0017 %                % ... [output chunk z];            % output chunk of processed samples
0018 %              end
0019 %
0020 %          (4) v_stftw(x,nw);                        % Plot spectrogram of first channel
0021 %
0022 %          (5) [y1,so]=v_stftw(x(1:1000),nw,'ep');      % convert chunk 1 of 3
0023 %              [y2,so]=v_stftw(x(1001:2000),so,'p'); % Options other than 'p' will be remembered
0024 %              [y3,so]=v_stftw(x(2001:end),so);      % Omit 'p' for final chunk (which can have x=[] if desired)
0025 %              y=[y1; y2; y3];                       % Concatenate to get complete T-F data
0026 %              [z1,io]=v_istftw(y(1:5,:),so);          % convert back to time domain
0027 %              [z2,io]=v_istftw(y(6:10,:),so,io);    % convert back to time domain
0028 %              z3=v_istftw(y(11:end,:),so,io);         % omit 'io' output on final chunk
0029 %              z=[z1; z2; z3];                       % z is the same as x everywhere due to 'e' option
0030 %
0031 %  Inputs: x(t,...)    input signal (one signal per column)
0032 %          nw       Window length (will be rounded up to a multiple of ov)
0033 %                       alternatively, the so output from the previous chunk's call to v_stftw
0034 %          m        mode string including window code
0035 %          ov       integer overlap factor; the frame hop is nw/ov. [2]
0036 %          nt       Optional DFT length (normally >=nw). [Default is nw as modified by 'i','I','u' options]
0037 %
0038 % Outputs: y(tf,k,...)  output STFT (frame=tf, freq=k). Number of frequencies is normally 1+nw/2 from DC to Nyquist
0039 %          tax          sample numbers corresponding to the centre of each frame; divide by fs to get times
0040 %          fax          normalized centre frequencies of each bin (multiply by fs for actual frequencies)
0041 %          so           structure used in subsequent call as the nw argument to
0042 %                       alllow processing in chunks. Also used in v_istftw
0043 %
0044 % The mode string, m, is a character string containing a combination of the following options and a window code (see below):
0045 %
0046 %             'p'   x is a partial signal which will be followed by another chunk (prevents paddding of the final frame)
0047 
0048 %             'r'   pad final frame with reflected data if necessary [default]
0049 %             'z'   pad final frame with zeros if necessary
0050 %             't'   truncate data to an exact number of frames
0051 %             'e'   pad with samples at beginning and end of signal to include all frames with at least one signal sample
0052 %
0053 %             'i'   zero-pad by a factor of 2 to increase apparent frequency resolution
0054 %             'I'   zero-pad by a factor of 4 to increase apparent frequency resolution
0055 %             'u'   zero-pad to round transform length up to a power of 2 to speed computation
0056 %
0057 %             's'   plot spectrogram of first channel (default if no outputs)
0058 %             'S'   plot mean spectra and mean power waveforms of all channels
0059 %
0060 %            Code Window      wmode  ov   Sidelobe  3dB-BW  6dB-BW  Falloff    Comment
0061 %
0062 %             'c'   cos(1)     s   2,3,5   -23dB     1.2     1.6  -12dB/oct   used in MP3
0063 %             'k'   kaiser(5)  boq   2     -23dB     1.2     1.7   -6dB/oct   used in AAC
0064 %             'm'   hamming    s   3,4,5   -43dB     1.3     1.8   -6dB/oct   low sidelobes [default if ov>2]
0065 %             'M'   hamming    sq  2,3,5   -24dB     1.1     1.5   -6dB/oct   sqrt Hamming  [default if ov=2]
0066 %             'n'   hanning    s   3,4,5   -31dB     1.4     2.0  -18dB/oct   rapid falloff
0067 %             'R'   rectangle        1     -13dB     0.9     1.2   -6dB/oct   narrow bandwidth [default if ov=1]
0068 %             'v'   vorbis     s    2,15   -21dB     1.3     1.8  -18dB/oct   used in Vorbis
0069 %             'V'   rsqvorbis  sq    2     -26dB     1.1     1.5   -6dB/oct   sqrt Vorbis
0070 %
0071 %             where wmode is the mode string for the v_windows call.
0072 % Notes:
0073 %
0074 %  (1) The scaling preserves power in a 2-sided transform so that
0075 %
0076 %        p = mean(x.^2) ~=~ mean(sum(abs(yy).^2,2)) where yy=[y conj(y(:,nw-size(y,2)+1:-1:2))]
0077 %
0078 %      where yy is the double-sided spectrum. This equivalence is stochastic rather than exact.
0079 %  (2) For perfect reconstruction (see [3]), the overlap factor, ov, must be a multiple of one of
0080 %      the values listed above in the ov column for the selected window.
0081 %  (3) Using the default parameters, perfect reconstruction will not be obtained for the first and
0082 %      the last frames of the signal. Using the 'e' option will add padding frames to the start and
0083 %      end of the signal so that the entire signal will be reconstructed perfectly.
0084 %  (4) By default, padding at the start and end of the signal will avoid introducing a discontinuity
0085 %      by using a time-reversed reflection of the input signal. Alternatively, the 'z' option will
0086 %      pad with zeros and the 't' option will truncate the signal to fit into an exact number of frames.
0087 %  (5) To avoid temporal aliasing in the time-frequency domain , an overlap factor of ov>=4 is required for
0088 %      most windows. Using an overlap factor of ov=2 (the default) halves the number of frames to process
0089 %      but may result in aliasing for some signals although this is normally slight (see [1]).
0090 %
0091 % Refs: [1] J. B. Allen. Short term spectral analysis, synthesis, and modification by discrete Fourier transform.
0092 %           IEEE Trans. Acoustics, Speech and Signal Processing, 25 (3): 235–238, June 1977.
0093 %       [2]    R. Crochiere. A weighted overlap-add method of short-time fourier analysis/synthesis.
0094 %           IEEE Trans. Acoustics, Speech and Signal Processing, 28 (1): 99–102, 1980.
0095 %       [3]    J. Princen and A. Bradley. Analysis/synthesis filter bank design based on time domain aliasing cancellation.
0096 %           IEEE Trans. Acoustics, Speech and Signal Processing, 34 (5): 1153–1161, 1986. doi: 10.1109/TASSP.1986.1164954.
0097 %
0098 % Revision History
0099 %    2022-04-10 First version
0100 persistent nw0 nt0 wi0 w0 wt wm wp
0101 if isempty(nw0)
0102     nw0=0;
0103     wt={'rectangle','hamming','hamming','cos','kaiser','hanning','rsqvorbis','vorbis'};
0104     wm={'','sq','s','s','boq','s','sq','s'};
0105     wp=[0 0 0 1 5 0 0 0];
0106 end
0107 sx=size(x);
0108 if sx(1)==1 && length(sx)==2 % transpose x if it is a row vector
0109     x=x.';
0110     sx=size(x);
0111 end
0112 nx=sx(1);                   % number of samples
0113 nc=prod(sx(2:end));         % total number of channels
0114 x=reshape(x,nx,nc);         % make input data 2-dimensional
0115 if nargin<3 || ~numel(m)    % no mode options specified
0116     m='';                   % use default modes
0117 end
0118 if isstruct(nw)             % nw is structure from previous call to v_stftw
0119     so=nw;                  % replicate into so since most fields will be identical
0120     wa=nw.wa;               % analysis window
0121     nh=nw.nh;               % frame hop (samples)
0122     po=nw.po;               % padding option
0123     nt=nw.nt;               % overwrite nt with length of transform
0124     me=nw.me;               % 'e' option given
0125     ff=nw.ff;               % first frame flag
0126     mf=nw.mf;
0127     if ~isempty(nw.xx)        % there is a data tail from a previous call
0128         x=cat(1,nw.xx,x);     % pre-append saved data
0129     end
0130     so.mx=so.mx+nx;          % cumulative signal length (using original value of nx)
0131     sx=size(x);             % recalculate sx (especially necessary if input argument x=[]
0132     nx=sx(1);               % recalculate the number of samples
0133     nw=length(wa);          % overwrite nw with the window length
0134 else
0135     if nargin<4 || ~numel(ov)
0136         ov=2;               % default overlap factor
0137     else
0138         ov=round(ov);         % force integer overlap factor
0139     end
0140     nh=ceil(nw/ov);         % force integer hop length
0141     nw=ov*nh;               % recalculate DFT length
0142     wi=find(sum(repmat(m(:),1,8)==repmat('RMmcknVv',length(m),1),1),1); % check if m specifies the window
0143     if isempty(wi)          % if no window specified
0144         wi=1+(ov>1)+(ov>2); % use 'R', 'M' or 'm'
0145     end
0146     % sort out zero-padding
0147     if nargin<5             % DFT length is not explicitly specified
0148         nt=nw;                  % default transform length
0149         if any(m=='i')          % 'i' = multiply nt by 2 for increased frequency resolution
0150             nt=2*nt;
0151         end
0152         if any(m=='I')          % 'I' = multiply nt by 4 for increased frequency resolution
0153             nt=4*nt;
0154         end
0155         if any(m=='u')          % 'u' = round up nt to a power of 2 for increased frequency resolution
0156             [fw,pw]=log2(nt);
0157             nt=pow2(pw-(fw==0.5));
0158         end
0159     end
0160     me=any(m=='e');
0161     if nw==nw0 && wi==wi0 && nt==nt0    % check if window already cached
0162         wa=w0;                          % use cached version if available
0163     else
0164         wa = v_windows(wt{wi},nw,wm{wi},wp(wi)); % only need the parameter for 'c' and 'k' windows
0165         wa=wa/sqrt(sum(wa.^2)*nt);      % scale to preserve power
0166         nw0=nw;                         % save parameters of cached window
0167         nt0=nt;
0168         wi0=wi;
0169         w0=wa;                          % save window in cache
0170     end
0171     % sort out enframing details
0172     ff=1;                               % flag to indicate the first frame
0173     po=find(sum(repmat(m(:),1,3)==repmat('zrt',length(m),1),1),1); % find padding option
0174     if isempty(po)
0175         po=2;                           % default to po=2 ('r')
0176     end
0177     % create so structure
0178     mf=0;
0179     so.wa=wa;                       % save analysis window
0180     so.po=po;                       % save padding option for final frame
0181     so.nh=nh;                       % save frame hop
0182     so.nt=nt;                       % length of transform
0183     so.me=me;                       % 'e' option given
0184     so.mx=nx;                       % cumulative signal length
0185     so.mf=0;                        % zero cumulative frames output so far (excluding prefix frames)
0186 end
0187 % now do the STFT
0188 mp=any(m=='p');                     % 'p' option given: this is not the last chunk
0189 f0=(1-nw/nh)*(me && ff);            % initial frame: either 0 or (1-ov)
0190 if nx>0                             % there are some samples
0191     if me && ~mp                    % include all frames that contain at least one sample
0192         f1=ceil(nx/nh)-1;           % last frame that include at least one sample
0193     elseif po<3 && ~mp              % ensure all samples are in at least one frame
0194         f1=max(ceil((nx-nw)/nh),0);
0195     else                            % only include full frames
0196         f1=floor((nx-nw)/nh);       % last full frame (might be negative and/or <f0)
0197     end
0198 else                                % input signal is empty
0199     f1=-1;
0200 end
0201 nf=max((f1-f0+1)*(f1>=0),0);        % number of frames to output
0202 if nf>0                             % if there are frames to output
0203     ff=0;                           % turn off first frame flag
0204     indx=repmat(nh*(f0:f1).',nw,1)+reshape(repmat(0:nw-1,nf,1),nf*nw,1); % index into x
0205     mk=indx<0 | indx>nx-1;          % mask for values outside 0:nx-1
0206     indx=nx+0.5-abs(mod(indx,2*nx)+0.5-nx); % reflect values outside the range
0207     y=x(indx,:);                    % create output
0208     if po==1                        % padding option was 'z'
0209         y(mk,:)=0;                  % set padding values to zero
0210     end
0211     y=v_rfft(reshape(y,[nf,nw,sx(2:end)]).*repmat(wa,[nf,1,sx(2:end)]),nt,2);  % perform windowed DFT
0212 else
0213     y=zeros([0,nw,sx(2:end)]);
0214 end
0215 if nargout>1                        % need additional outputs
0216     if nf>0
0217         tax=(f0+mf:f1+mf)*nh+0.5*(1+nw);      % frame centre times (where input samples are 1,2,...)
0218     else
0219         tax=[];
0220     end
0221     fax=(0:size(y,2)-1)/nt;           % bin centre frequencies (normalized by sample frequency)
0222     if nf>0
0223         so.xx=x((f1+1)*nh+1:end,:);    % save data tail for next chunk (starts at sample nh+1 of the last frame in y)
0224     else
0225         so.xx=x;               % if nf==0 then data tail is entire signal (including any previous tail)
0226     end
0227     so.ff=ff;                  % first frame flag
0228     so.mp=mp;                  % flag indicating more chunks to follow
0229     so.mf=mf+nf+f0;            % add number of additional non-prefix frames output
0230 end
0231 if nf>0 && (~nargout || any(lower(m)=='s'))     % no outputs or 'sS' options so plot graphs
0232     ta=(f0+mf:f1+mf)*nh+0.5*(1+nw);
0233     nv=size(y,2);
0234     fa=(0:nv-1)/nt;
0235     clf;
0236     if any(m=='S')
0237         subplot(2,1,1);
0238         yp=abs(reshape(y,[nf,nv,nc])).^2;       % calculate power spectrum
0239         plot(ta,10*log10(reshape(sum(cat(2,yp,yp(:,2:floor((nt+1)/2),:)),2),nf,nc))); % add in power from negative frequencies
0240         v_axisenlarge([-1 -1.05]);
0241         xlabel('Sample Number');
0242         ylabel('Frame power (dB)');
0243         subplot(2,1,2);
0244         plot(fa,10*log10(reshape(mean(yp,1),nv,nc)));
0245         v_axisenlarge([-1 -1.05]);
0246         xlabel('Normalized Frequency');
0247         ylabel('Mean Power (dB)');
0248     else
0249         imagesc(ta,fa,20*log10(abs(y(:,:,1)))'); % plot only the first channel
0250         colorbar;
0251         axis 'xy';
0252         ylabel('Normalized Frequency');
0253         xlabel('Sample Number');
0254         v_cblabel('Energy (dB)');
0255         hold on;
0256         clim=get(gca,'clim');
0257         set(gca,'clim',[max(clim(1),clim(2)-40) clim(2)]);
0258         ylim=get(gca,'ylim');
0259         plot([0.5;0.5],ylim',':k');             % indicate start of signal
0260         if ~mp
0261             plot([1;1]*(so.mx+0.5),ylim',':k'); % indicate end of signal
0262         end
0263         hold off
0264     end
0265 end
0266

Generated by m2html © 2003