Home > voicebox > ssubmmse.m

ssubmmse

PURPOSE ^

SSUBMMSE performs speech enhancement using mmse estimate of spectral amplitude or log amplitude [SS,ZO]=(S,FSZ,P)

SYNOPSIS ^

function [ss,gg,tt,ff,zo]=ssubmmse(si,fsz,pp)

DESCRIPTION ^

SSUBMMSE performs speech enhancement using mmse estimate of spectral amplitude or log amplitude [SS,ZO]=(S,FSZ,P)

 Usage: y=ssubmmse(x,fs);   % enhance the speech using default parameters

 Inputs:
   si      input speech signal
   fsz     sample frequency in Hz
           Alternatively, the input state from a previous call (see below)
   pp      algorithm parameters [optional]

 Outputs:
   ss        output enhanced speech
   gg(t,f,i) selected time-frequency values (see pp.tf below)
   tt        centre of frames (in seconds)
   ff        centre of frequency bins (in Hz)
   zo        output state (or the 2nd argument if gg,tt,ff are omitted)

 The algorithm operation is controlled by a small number of parameters:

        pp.of          % overlap factor = (fft length)/(frame increment) [2]
        pp.ti          % desired frame increment [0.016 seconds]
        pp.ri          % set to 1 to round ti to the nearest power of 2 samples [0]
        pp.ta          % time const for smoothing SNR estimate [0.396 seconds]
        pp.gx          % maximum posterior SNR as a power ratio [1000 = +30dB]
        pp.gn          % min posterior SNR as a power ratio when estimating prior SNR [1 = 0dB]
        pp.gz          % min posterior SNR as a power ratio [0.001 = -30dB]
        pp.xn          % minimum prior SNR [0]
        pp.xb          % bias compensation factor for prior SNR [1]
        pp.lg          % MMSE target: 0=amplitude, 1=log amplitude, 2=perceptual Bayes [1]
        pp.ne          % noise estimation: 0=min statistics, 1=MMSE [0]
        pp.bt          % threshold for binary gain or -1 for continuous gain [-1]
        pp.mx          % input mixture gain [0]
        pp.rf          % round output signal to an exact number of frames [0]
        pp.tf          % selects time-frequency planes to output in the gg() variable ['g']
                           'i' = input power spectrum
                           'I' = input complex spectrum
                           'n' = noise power spectrum
                           'z' = "posterior" SNR (i.e. (S+N)/N )
                           'x' = "prior" SNR (i.e. S/N )
                           'g' = gain
                           'o' = output power spectrum
                           'O' = output complex spectrum

 The applied gain is mx+(1-mx)*optgain where optgain is calculated according to [1] or [2].
 If pp.bt>=0 then optgain is first thresholded with pp.bt to produce a binary gain 0 or 1.

 The default parameters implement the original algorithm in [1,2].

 Several parameters relate to the estimation of xi, the so-called "prior SNR",

             xi=max(a*pp.xb*xu+(1-a)*max(gami-1,pp.gn-1),pp.xn);

 This is estimated as a smoothed version of 1 less than gami, the "posterior SNR"
 which is the noisy speech power divided by the noise power. This is
 clipped to a min of (pp.gn-1), smoothed using a factor "a" which corresponds to a
 time-constant of pp.ta and then clipped to a minimum of pp.xn. The
 previous value is taken to be pp.xb*xu where xu is the ratio of the
 estimated speech amplitude squared to the noise power.

 In addition it is possible to specify parameters for the noise estimation algorithm
 which implements reference [3] or [7] according to the setting of pp.ne

 Minimum statistics noise estimate [3]: pp.ne=0
        pp.taca      % (11): smoothing time constant for alpha_c [0.0449 seconds]
        pp.tamax     % (3): max smoothing time constant [0.392 seconds]
        pp.taminh    % (3): min smoothing time constant (upper limit) [0.0133 seconds]
        pp.tpfall    % (12): time constant for P to fall [0.064 seconds]
        pp.tbmax     % (20): max smoothing time constant [0.0717 seconds]
        pp.qeqmin    % (23): minimum value of Qeq [2]
        pp.qeqmax    % max value of Qeq per frame [14]
        pp.av        % (23)+13 lines: fudge factor for bc calculation  [2.12]
        pp.td        % time to take minimum over [1.536 seconds]
        pp.nu        % number of subwindows to use [3]
        pp.qith      % Q-inverse thresholds to select maximum noise slope [0.03 0.05 0.06 Inf ]
        pp.nsmdb     % corresponding noise slope thresholds in dB/second   [47 31.4 15.7 4.1]

 MMSE noise estimate [7]: pp.ne=1
        pp.tax      % smoothing time constant for noise power estimate [0.0717 seconds](8)
        pp.tap      % smoothing time constant for smoothed speech prob [0.152 seconds](23)
        pp.psthr    % threshold for smoothed speech probability [0.99] (24)
        pp.pnsaf    % noise probability safety value [0.01] (24)
        pp.pspri    % prior speech probability [0.5] (18)
        pp.asnr     % active SNR in dB [15] (18)
        pp.psini    % initial speech probability [0.5] (23)
        pp.tavini   % assumed speech absent time at start [0.064 seconds]

 If convenient, you can call specsub in chunks of arbitrary size. Thus the following are equivalent:

                   (a) y=ssubmmse(s,fs);

                   (b) [y1,z]=ssubmmse(s(1:1000),fs);
                       [y2,z]=ssubmmse(s(1001:2000),z);
                       y3=ssubmmse(s(2001:end),z);
                       y=[y1; y2; y3];

 If the number of output arguments is either 2 or 5, the last partial frame of samples will
 be retained for overlap adding with the output from the next call to ssubmmse().

 See also specsub() for an alternative gain function

 Refs:
    [1] Ephraim, Y. & Malah, D.
        Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator
        IEEE Trans Acoustics Speech and Signal Processing, 32(6):1109-1121, Dec 1984
    [2] Ephraim, Y. & Malah, D.
        Speech enhancement using a minimum mean-square error log-spectral amplitude estimator
        IEEE Trans Acoustics Speech and Signal Processing, 33(2):443-445, Apr 1985
    [3] Rainer Martin.
        Noise power spectral density estimation based on optimal smoothing and minimum statistics.
        IEEE Trans. Speech and Audio Processing, 9(5):504-512, July 2001.
    [4] O. Cappe.
        Elimination of the musical noise phenomenon with the ephraim and malah noise suppressor.
        IEEE Trans Speech Audio Processing, 2 (2): 345349, Apr. 1994. doi: 10.1109/89.279283.
    [5] J. Erkelens, J. Jensen, and R. Heusdens.
        A data-driven approach to optimizing spectral speech enhancement methods for various error criteria.
        Speech Communication, 49: 530541, 2007. doi: 10.1016/j.specom.2006.06.012.
    [6] R. Martin.
        Statistical methods for the enhancement of noisy speech.
        In J. Benesty, S. Makino, and J. Chen, editors,
        Speech Enhancement, chapter 3, pages 4364. Springer-Verlag, 2005.
    [7] Gerkmann, T. & Hendriks, R. C.
        Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay
        IEEE Trans Audio, Speech, Language Processing, 2012, 20, 1383-1393
    [8] Loizou, P.
        Speech enhancement based on perceptually motivated Bayesian estimators of the speech magnitude spectrum.
        IEEE Trans. Speech and Audio Processing, 13(5), 857-869, 2005.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [ss,gg,tt,ff,zo]=ssubmmse(si,fsz,pp)
0002 %SSUBMMSE performs speech enhancement using mmse estimate of spectral amplitude or log amplitude [SS,ZO]=(S,FSZ,P)
0003 %
0004 % Usage: y=ssubmmse(x,fs);   % enhance the speech using default parameters
0005 %
0006 % Inputs:
0007 %   si      input speech signal
0008 %   fsz     sample frequency in Hz
0009 %           Alternatively, the input state from a previous call (see below)
0010 %   pp      algorithm parameters [optional]
0011 %
0012 % Outputs:
0013 %   ss        output enhanced speech
0014 %   gg(t,f,i) selected time-frequency values (see pp.tf below)
0015 %   tt        centre of frames (in seconds)
0016 %   ff        centre of frequency bins (in Hz)
0017 %   zo        output state (or the 2nd argument if gg,tt,ff are omitted)
0018 %
0019 % The algorithm operation is controlled by a small number of parameters:
0020 %
0021 %        pp.of          % overlap factor = (fft length)/(frame increment) [2]
0022 %        pp.ti          % desired frame increment [0.016 seconds]
0023 %        pp.ri          % set to 1 to round ti to the nearest power of 2 samples [0]
0024 %        pp.ta          % time const for smoothing SNR estimate [0.396 seconds]
0025 %        pp.gx          % maximum posterior SNR as a power ratio [1000 = +30dB]
0026 %        pp.gn          % min posterior SNR as a power ratio when estimating prior SNR [1 = 0dB]
0027 %        pp.gz          % min posterior SNR as a power ratio [0.001 = -30dB]
0028 %        pp.xn          % minimum prior SNR [0]
0029 %        pp.xb          % bias compensation factor for prior SNR [1]
0030 %        pp.lg          % MMSE target: 0=amplitude, 1=log amplitude, 2=perceptual Bayes [1]
0031 %        pp.ne          % noise estimation: 0=min statistics, 1=MMSE [0]
0032 %        pp.bt          % threshold for binary gain or -1 for continuous gain [-1]
0033 %        pp.mx          % input mixture gain [0]
0034 %        pp.rf          % round output signal to an exact number of frames [0]
0035 %        pp.tf          % selects time-frequency planes to output in the gg() variable ['g']
0036 %                           'i' = input power spectrum
0037 %                           'I' = input complex spectrum
0038 %                           'n' = noise power spectrum
0039 %                           'z' = "posterior" SNR (i.e. (S+N)/N )
0040 %                           'x' = "prior" SNR (i.e. S/N )
0041 %                           'g' = gain
0042 %                           'o' = output power spectrum
0043 %                           'O' = output complex spectrum
0044 %
0045 % The applied gain is mx+(1-mx)*optgain where optgain is calculated according to [1] or [2].
0046 % If pp.bt>=0 then optgain is first thresholded with pp.bt to produce a binary gain 0 or 1.
0047 %
0048 % The default parameters implement the original algorithm in [1,2].
0049 %
0050 % Several parameters relate to the estimation of xi, the so-called "prior SNR",
0051 %
0052 %             xi=max(a*pp.xb*xu+(1-a)*max(gami-1,pp.gn-1),pp.xn);
0053 %
0054 % This is estimated as a smoothed version of 1 less than gami, the "posterior SNR"
0055 % which is the noisy speech power divided by the noise power. This is
0056 % clipped to a min of (pp.gn-1), smoothed using a factor "a" which corresponds to a
0057 % time-constant of pp.ta and then clipped to a minimum of pp.xn. The
0058 % previous value is taken to be pp.xb*xu where xu is the ratio of the
0059 % estimated speech amplitude squared to the noise power.
0060 %
0061 % In addition it is possible to specify parameters for the noise estimation algorithm
0062 % which implements reference [3] or [7] according to the setting of pp.ne
0063 %
0064 % Minimum statistics noise estimate [3]: pp.ne=0
0065 %        pp.taca      % (11): smoothing time constant for alpha_c [0.0449 seconds]
0066 %        pp.tamax     % (3): max smoothing time constant [0.392 seconds]
0067 %        pp.taminh    % (3): min smoothing time constant (upper limit) [0.0133 seconds]
0068 %        pp.tpfall    % (12): time constant for P to fall [0.064 seconds]
0069 %        pp.tbmax     % (20): max smoothing time constant [0.0717 seconds]
0070 %        pp.qeqmin    % (23): minimum value of Qeq [2]
0071 %        pp.qeqmax    % max value of Qeq per frame [14]
0072 %        pp.av        % (23)+13 lines: fudge factor for bc calculation  [2.12]
0073 %        pp.td        % time to take minimum over [1.536 seconds]
0074 %        pp.nu        % number of subwindows to use [3]
0075 %        pp.qith      % Q-inverse thresholds to select maximum noise slope [0.03 0.05 0.06 Inf ]
0076 %        pp.nsmdb     % corresponding noise slope thresholds in dB/second   [47 31.4 15.7 4.1]
0077 %
0078 % MMSE noise estimate [7]: pp.ne=1
0079 %        pp.tax      % smoothing time constant for noise power estimate [0.0717 seconds](8)
0080 %        pp.tap      % smoothing time constant for smoothed speech prob [0.152 seconds](23)
0081 %        pp.psthr    % threshold for smoothed speech probability [0.99] (24)
0082 %        pp.pnsaf    % noise probability safety value [0.01] (24)
0083 %        pp.pspri    % prior speech probability [0.5] (18)
0084 %        pp.asnr     % active SNR in dB [15] (18)
0085 %        pp.psini    % initial speech probability [0.5] (23)
0086 %        pp.tavini   % assumed speech absent time at start [0.064 seconds]
0087 %
0088 % If convenient, you can call specsub in chunks of arbitrary size. Thus the following are equivalent:
0089 %
0090 %                   (a) y=ssubmmse(s,fs);
0091 %
0092 %                   (b) [y1,z]=ssubmmse(s(1:1000),fs);
0093 %                       [y2,z]=ssubmmse(s(1001:2000),z);
0094 %                       y3=ssubmmse(s(2001:end),z);
0095 %                       y=[y1; y2; y3];
0096 %
0097 % If the number of output arguments is either 2 or 5, the last partial frame of samples will
0098 % be retained for overlap adding with the output from the next call to ssubmmse().
0099 %
0100 % See also specsub() for an alternative gain function
0101 %
0102 % Refs:
0103 %    [1] Ephraim, Y. & Malah, D.
0104 %        Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator
0105 %        IEEE Trans Acoustics Speech and Signal Processing, 32(6):1109-1121, Dec 1984
0106 %    [2] Ephraim, Y. & Malah, D.
0107 %        Speech enhancement using a minimum mean-square error log-spectral amplitude estimator
0108 %        IEEE Trans Acoustics Speech and Signal Processing, 33(2):443-445, Apr 1985
0109 %    [3] Rainer Martin.
0110 %        Noise power spectral density estimation based on optimal smoothing and minimum statistics.
0111 %        IEEE Trans. Speech and Audio Processing, 9(5):504-512, July 2001.
0112 %    [4] O. Cappe.
0113 %        Elimination of the musical noise phenomenon with the ephraim and malah noise suppressor.
0114 %        IEEE Trans Speech Audio Processing, 2 (2): 345349, Apr. 1994. doi: 10.1109/89.279283.
0115 %    [5] J. Erkelens, J. Jensen, and R. Heusdens.
0116 %        A data-driven approach to optimizing spectral speech enhancement methods for various error criteria.
0117 %        Speech Communication, 49: 530541, 2007. doi: 10.1016/j.specom.2006.06.012.
0118 %    [6] R. Martin.
0119 %        Statistical methods for the enhancement of noisy speech.
0120 %        In J. Benesty, S. Makino, and J. Chen, editors,
0121 %        Speech Enhancement, chapter 3, pages 4364. Springer-Verlag, 2005.
0122 %    [7] Gerkmann, T. & Hendriks, R. C.
0123 %        Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay
0124 %        IEEE Trans Audio, Speech, Language Processing, 2012, 20, 1383-1393
0125 %    [8] Loizou, P.
0126 %        Speech enhancement based on perceptually motivated Bayesian estimators of the speech magnitude spectrum.
0127 %        IEEE Trans. Speech and Audio Processing, 13(5), 857-869, 2005.
0128 
0129 % Bugs/suggestions:
0130 %   (1) sort out behaviour when si() is a matrix rather than a vector
0131 %
0132 %      Copyright (C) Mike Brookes 2004-2011
0133 %      Version: $Id: ssubmmse.m 5488 2014-11-26 16:17:52Z dmb $
0134 %
0135 %   VOICEBOX is a MATLAB toolbox for speech processing.
0136 %   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
0137 %
0138 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0139 %   This program is free software; you can redistribute it and/or modify
0140 %   it under the terms of the GNU General Public License as published by
0141 %   the Free Software Foundation; either version 2 of the License, or
0142 %   (at your option) any later version.
0143 %
0144 %   This program is distributed in the hope that it will be useful,
0145 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0146 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0147 %   GNU General Public License for more details.
0148 %
0149 %   You can obtain a copy of the GNU General Public License from
0150 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0151 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0152 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0153 persistent cc kk
0154 if ~numel(kk)
0155     kk=sqrt(2*pi);      % sqrt(8)*Gamma(1.5) - required constant
0156     cc=sqrt(2/pi);      %sqrt(2)/Gamma(0.5)
0157 end
0158 if numel(si)>length(si)
0159     error('Input speech signal must be a vector not a matrix');
0160 end
0161 if isstruct(fsz)
0162     fs=fsz.fs;
0163     qq=fsz.qq;
0164     qp=fsz.qp;
0165     ze=fsz.ze;
0166     s=zeros(length(fsz.si)+length(si(:)),1); % allocate space for speech
0167     s(1:length(fsz.si))=fsz.si;
0168     s(length(fsz.si)+1:end)=si(:);
0169 else
0170     fs=fsz;     % sample frequency
0171     s=si(:);
0172     % default algorithm constants
0173     
0174     qq.of=2;        % overlap factor = (fft length)/(frame increment)
0175     qq.ti=16e-3;    % desired frame increment (16 ms)
0176     qq.ri=0;        % round ni to the nearest power of 2
0177     qq.ta=0.396;    % Time const for smoothing SNR estimate = -tinc/log(0.98) from [1]
0178     qq.gx=1000;     % maximum posterior SNR = 30dB
0179     qq.gn=1;        % min posterior SNR as a power ratio when estimating prior SNR [1]
0180     qq.gz=0.001;    % min posterior SNR as a power ratio [0.001 = -30dB]
0181     qq.xn=0;        % minimum prior SNR = -Inf dB
0182     qq.xb=1;        % bias compensation factor for prior SNR [1]
0183     qq.lg=1;        % use log-domain estimator by default
0184     qq.ne=0;        % noise estimation: 0=min statistics, 1=MMSE [0]
0185     qq.bt=-1;       % suppress binary masking
0186     qq.mx=0;        % no input mixing
0187     qq.tf='g';      % output the gain time-frequency plane by default
0188     qq.rf=0;
0189     if nargin>=3 && ~isempty(pp)
0190         qp=pp;      % save for estnoisem call
0191         qqn=fieldnames(qq);
0192         for i=1:length(qqn)
0193             if isfield(pp,qqn{i})
0194                 qq.(qqn{i})=pp.(qqn{i});
0195             end
0196         end
0197     else
0198         qp=struct;  % make an empty structure
0199     end
0200 end
0201 % derived algorithm constants
0202 if qq.ri
0203     ni=pow2(nextpow2(qq.ti*fs*sqrt(0.5)));
0204 else
0205     ni=round(qq.ti*fs);    % frame increment in samples
0206 end
0207 tinc=ni/fs;         % true frame increment time
0208 a=exp(-tinc/qq.ta); % SNR smoothing coefficient
0209 gx=qq.gx;           % max posterior SNR as a power ratio
0210 gz=qq.gz;           % min posterior SNR as a power ratio
0211 xn=qq.xn;           % floor for prior SNR, xi
0212 ne=qq.ne;           % noise estimation: 0=min statistics, 1=MMSE [0]
0213 gn1=max(qq.gn-1,0); % floor for posterior SNR when estimating prior SNR
0214 xb=qq.xb;
0215 tf=qq.tf;
0216 rf=qq.rf || nargout==2 || nargout==5;            % round down to an exact number of frames
0217 
0218 % calculate power spectrum in frames
0219 
0220 no=round(qq.of);                      % integer overlap factor
0221 nf=ni*no;                           % fft length
0222 w=sqrt(hamming(nf+1))'; w(end)=[];  % for now always use sqrt hamming window
0223 w=w/sqrt(sum(w(1:ni:nf).^2));       % normalize to give overall gain of 1
0224 if rf>0
0225     rfm='';                         % truncated input to an exact number of frames
0226 else
0227     rfm='r';
0228 end
0229 [y,tt]=enframe(s,w,ni,rfm);
0230 tt=tt/fs;                           % frame times
0231 yf=rfft(y,nf,2);
0232 yp=yf.*conj(yf);                    % power spectrum of input speech
0233 [nr,nf2]=size(yp);                  % number of frames
0234 ff=(0:nf2-1)*fs/nf;
0235 if isstruct(fsz)
0236     if ne>0
0237         [dp,ze]=estnoiseg(yp,ze);       % estimate the noise using MMSE
0238     else
0239         [dp,ze]=estnoisem(yp,ze);       % estimate the noise using minimum statistics
0240     end
0241     ssv=fsz.ssv;
0242     xu=fsz.xu;                      % saved unsmoothed SNR
0243 else
0244     if ne>0
0245         [dp,ze]=estnoiseg(yp,tinc,qp);    % estimate the noise using MMSE
0246     else
0247         [dp,ze]=estnoisem(yp,tinc,qp);    % estimate the noise using minimum statistics
0248     end
0249     ssv=zeros(ni*(no-1),1);            % dummy saved overlap
0250     xu=1;                           % dummy unsmoothed SNR from previous frame
0251 end
0252 if ~nr                                 % no data frames
0253     ss=[];
0254     gg=[];
0255 else
0256     gam=max(min(yp./dp,gx),gz);     % gamma = posterior SNR
0257     g=zeros(nr,nf2);                % create space for gain matrix
0258     x=zeros(nr,nf2);                % create space for prior SNR
0259     switch qq.lg
0260         case 0                      % use amplitude domain estimator from [1]
0261             for i=1:nr
0262                 gami=gam(i,:);
0263                 xi=max(a*xb*xu+(1-a)*max(gami-1,gn1),xn);  % prior SNR
0264                 v=0.5*xi.*gami./(1+xi);    % note that this is 0.5*vk in [1]
0265                 gi=(0.277+2*v)./gami;     % accurate to 0.02 dB for v>0.5
0266                 mv=v<0.5;
0267                 if any(mv)
0268                     vmv=v(mv);
0269                     gi(mv)=kk*sqrt(vmv).*((0.5+vmv).*besseli(0,vmv)+vmv.*besseli(1,vmv))./(gami(mv).*exp(vmv));
0270                 end
0271                 g(i,:)=gi;              % save gain for later
0272                 x(i,:)=xi;              % save prior SNR
0273                 xu=gami.*gi.^2;         % unsmoothed prior SNR
0274             end
0275         case 2                          % perceptually motivated estimator from [8]
0276             for i=1:nr
0277                 gami=gam(i,:);
0278                 xi=max(a*xb*xu+(1-a)*max(gami-1,gn1),xn);  % prior SNR
0279                 v=0.5*xi.*gami./(1+xi);    % note that this is 0.5*vk in [8]
0280                 gi=cc*sqrt(v).*exp(v)./(gami.*besseli(0,v));
0281                 g(i,:)=gi;              % save gain for later
0282                 x(i,:)=xi;              % save prior SNR
0283                 xu=gami.*gi.^2;         % unsmoothed prior SNR
0284             end
0285         otherwise                       % use log domain estimator from [2]
0286             for i=1:nr
0287                 gami=gam(i,:);
0288                 xi=max(a*xb*xu+(1-a)*max(gami-1,gn1),xn);  % prior SNR
0289                 xir=xi./(1+xi);
0290                 gi=xir.*exp(0.5*expint(xir.*gami));
0291                 g(i,:)=gi;                 % save gain for later
0292                 x(i,:)=xi;              % save prior SNR
0293                 xu=gami.*gi.^2;         % unsmoothed prior SNR
0294             end
0295     end
0296     if qq.bt>=0
0297         g=g>qq.bt;
0298     end
0299     g=qq.mx+(1-qq.mx)*g;                    % mix in some of the input
0300     se=(irfft((yf.*g).',nf).').*repmat(w,nr,1);     % inverse dft and apply output window
0301     ss=zeros(ni*(nr+no-1),no);                      % space for overlapped output speech
0302     ss(1:ni*(no-1),end)=ssv;
0303     for i=1:no
0304         nm=nf*(1+floor((nr-i)/no));         % number of samples in this set
0305         ss(1+(i-1)*ni:nm+(i-1)*ni,i)=reshape(se(i:no:nr,:)',nm,1);
0306     end
0307     ss=sum(ss,2);
0308     if nargout>2 && ~isempty(tf)
0309         gg=zeros(nr,nf2,length(tf));  % make space
0310         for i=1:length(tf)
0311             switch tf(i)
0312                 case 'i'            % 'i' = input power spectrum
0313                     gg(:,:,i)=yp;
0314                 case 'I'            % 'i' = input power spectrum
0315                     gg(:,:,i)=yf;
0316                 case 'n'            % 'n' = noise power spectrum
0317                     gg(:,:,i)=dp;
0318                 case 'z'            % 'z' = posterior SNR (i.e. (S+N)/N )
0319                     gg(:,:,i)=gam;
0320                 case 'x'            % 'x' = prior SNR
0321                     gg(:,:,i)=x;
0322                 case 'g'            % 'g' = gain
0323                     gg(:,:,i)=g;
0324                 case 'o'            % 'o' = output power spectrum
0325                     gg(:,:,i)=yp.*g.^2;
0326                 case 'O'            % 'o' = output power spectrum
0327                     gg(:,:,i)=yf.*g;
0328             end
0329         end
0330     end
0331 end
0332 if nargout==2 || nargout==5
0333     if nr
0334         zo.ssv=ss(end-ni*(no-1)+1:end);    % save the output tail for next time
0335         ss(end-ni*(no-1)+1:end)=[];        % only output the frames that are completed
0336     else
0337         zo.ssv=ssv;  %
0338     end
0339     zo.si=s(length(ss)+1:end);      % save the tail end of the input speech signal
0340     zo.fs=fs;                       % save sample frequency
0341     zo.qq=qq;                       % save local parameters
0342     zo.qp=qp;                       % save estnoisem parameters
0343     zo.ze=ze;                       % save state of noise estimation
0344     zo.xu=xu;
0345     if nargout==2
0346         gg=zo;                      % 2nd of two arguments is zo
0347     end
0348 elseif rf==0
0349     ss=ss(1:length(s));             % trim to the correct length if not an exact number of frames
0350 end
0351 if ~nargout && nr>0
0352     ffax=ff/1000;
0353     ax=zeros(4,1);
0354     ax(1)=subplot(223);
0355     imagesc(tt,ffax,20*log10(g)');
0356     colorbar;
0357     axis('xy');
0358     title(sprintf('Filter Gain (dB): ta=%.2g',qq.ta));
0359     xlabel('Time (s)');
0360     ylabel('Frequency (kHz)');
0361     
0362     ax(2)=subplot(222);
0363     imagesc(tt,ffax,10*log10(yp)');
0364     colorbar;
0365     axis('xy');
0366     title('Noisy Speech (dB)');
0367     xlabel('Time (s)');
0368     ylabel('Frequency (kHz)');
0369     
0370     ax(3)=subplot(224);
0371     imagesc(tt,ffax,10*log10(yp.*g.^2)');
0372     colorbar;
0373     axis('xy');
0374     title('Enhanced Speech (dB)');
0375     xlabel('Time (s)');
0376     ylabel('Frequency (kHz)');
0377     
0378     ax(4)=subplot(221);
0379     imagesc(tt,ffax,10*log10(dp)');
0380     colorbar;
0381     axis('xy');
0382     title('Noise Estimate (dB)');
0383     xlabel('Time (s)');
0384     ylabel('Frequency (kHz)');
0385     linkaxes(ax);
0386 end

Generated on Tue 10-Oct-2017 08:30:10 by m2html © 2003