Description of v

V_SSUBMMSE performs speech enhancement using mmse estimate of spectral amplitude or log amplitude [SS,ZO]=(S,FSZ,PP)

 Usage:
    (1) %%%%%%%%%%%% Simple Enhancement %%%%%%%%%%%%
        y=v_ssubmmse(x,fs);               % enhance the noisy speech using default parameters

    (2) %%%%%%%%%%%% Reading input signal in chunks %%%%%%%%%%%%
        [y1,zo]=v_ssubmmse(x(1:1000,fs);       % enhance the first chunck of x
        [y2,zo]=v_ssubmmse(x(1001:2000),zo);   % enhance the second  chunck of x
        y3=v_ssubmmse(x(1001:2000),zo);        % enhance the remainder of x outputting all samples
        y=[y1; y2; y3];                      % reassembling the chunks gives same y as example (1)

    (3) %%%%%%%%%%%% Multiple channels using the same gain %%%%%%%%%%%%
        y=v_ssubmmse([s+n s n],fs);       % enhance the noisy speech, s+n, and
                                          also apply the same gain to s and n separately

    (4) %%%%%%%%%%%% Multiple channels using maximum of gains %%%%%%%%%%%%
        pp.gw=[0 0 0 0 1];              % find the maximum of the calculated time-frequency gains
        y=v_ssubmmse([x1 x2 x3],fs,pp);   % and apply this maximum gain to each of the signals

    (5) %%%%%%%%%%%% Plot gain function %%%%%%%%%%%%
        [y,g,t,f]=v_ssubmmse(x,fs);       % save the time-frequency gain function
        imagesc(t,f,20*log10(g'));      % display the time-frequency gain
        axis 'xy';                      % origin at bottom left
        xlabel('Time (s)');             % time axis
        ylabel('Frequency (Hz)');       % frequency axis
        colorbar;                       % show a color bar
        v_cblabel('Gain dB');             % label the color bar
        set(gca,'clim',[-30,5]);        % limit the displayed range to [-30,5] dB


 Inputs:
   si(n,c,p) input speech signal with n samples in each of c channels where each channel has p planes.
             All planes will use the same set of gain functions.
   fsz       sample frequency in Hz
             Alternatively, the input state from a previous call (see below)
   pp        algorithm parameters [optional]

 Outputs:
   ss(n,c,p)   output enhanced speech. Same number of samples as si unless
               zo output is given in which case incomplete frames will be retained
   gg(t,f,c,p,length(pp.tf)) selected time-frequency values (see pp.tf below)
               note that multiple planes will only be included if pp.tf  includes 'i', 'I', 'o' or 'O'
               Note also that the tt output MUST also be specified
   tt          centre of frames (in seconds)
   ff          centre of frequency bins (in Hz)
   zo          output state (or the 2nd argument if gg,tt,ff are omitted)

 The algorithm operation is controlled by a small number of parameters:

        pp.of          % overlap factor = (fft length)/(frame increment) [2]
        pp.ti          % desired frame increment [0.016 seconds]
        pp.ri          % set to 1 to round ti to the nearest power of 2 samples [0]
        pp.ta          % time const for smoothing SNR estimate [0.396 seconds]
        pp.gx          % maximum posterior SNR as a power ratio [1000 = +30dB]
        pp.gn          % min posterior SNR as a power ratio when estimating prior SNR [1 = 0dB]
        pp.gz          % min posterior SNR as a power ratio [0.001 = -30dB]
        pp.xn          % minimum prior SNR [0]
        pp.xb          % bias compensation factor for prior SNR [1]
        pp.lg          % MMSE target: 0=amplitude, 1=log amplitude, 2=perceptual Bayes [1]
        pp.ne          % noise estimation: 0=min statistics, 1=MMSE [1]
        pp.bt          % threshold for binary gain or -1 for continuous gain [-1]
        pp.mx          % input mixture gain [0]
        pp.gc          % maximum amplitude gain [10 = 20 dB]
        pp.rf          % round output signal to an exact number of frames [0]
        pp.gw          % multichannel gain weights: [chan-n chan-1 ave min max] default:[0 1 0 0 0]
        pp.tf          % selects one or more time-frequency planes to output in the gg() variable ['g']
                           'i' = input power spectrum
                           'I' = input complex spectrum
                           'n' = noise power spectrum
                           'z' = "posterior" SNR (i.e. (S+N)/N )
                           'x' = "prior" SNR (i.e. S/N )
                           'G' = raw gain (before clipping and multichannel weighting)
                           'g' = final gain
                           'o' = output power spectrum
                           'O' = output complex spectrum

 The applied gain is mx+(1-mx)*optgain where optgain is calculated according to [1] or [2].
 If pp.bt>=0 then optgain is first thresholded with pp.bt to produce a binary gain 0 or 1.

 The default parameters implement the original algorithm in [1,2].

 Several parameters relate to the estimation of xi, the so-called "prior SNR",

             xi=max(a*pp.xb*xu+(1-a)*max(gami-1,pp.gn-1),pp.xn);

 This is estimated as a smoothed version of 1 less than gami, the "posterior SNR"
 which is the noisy speech power divided by the noise power. This is
 clipped to a min of (pp.gn-1), smoothed using a factor "a" which corresponds to a
 time-constant of pp.ta and then clipped to a minimum of pp.xn. The
 previous value is taken to be pp.xb*xu where xu is the ratio of the
 estimated speech amplitude squared to the noise power.

 In addition it is possible to specify parameters for the noise estimation algorithm
 which implements reference [3] or [7] according to the setting of pp.ne

 Minimum statistics noise estimate [3]: pp.ne=0
        pp.taca      % (11): smoothing time constant for alpha_c [0.0449 seconds]
        pp.tamax     % (3): max smoothing time constant [0.392 seconds]
        pp.taminh    % (3): min smoothing time constant (upper limit) [0.0133 seconds]
        pp.tpfall    % (12): time constant for P to fall [0.064 seconds]
        pp.tbmax     % (20): max smoothing time constant [0.0717 seconds]
        pp.qeqmin    % (23): minimum value of Qeq [2]
        pp.qeqmax    % max value of Qeq per frame [14]
        pp.av        % (23)+13 lines: fudge factor for bc calculation  [2.12]
        pp.td        % time to take minimum over [1.536 seconds]
        pp.nu        % number of subwindows to use [3]
        pp.qith      % Q-inverse thresholds to select maximum noise slope [0.03 0.05 0.06 Inf ]
        pp.nsmdb     % corresponding noise slope thresholds in dB/second   [47 31.4 15.7 4.1]

 MMSE noise estimate [7]: pp.ne=1
        pp.tax      % smoothing time constant for noise power estimate [0.0717 seconds](8)
        pp.tap      % smoothing time constant for smoothed speech prob [0.152 seconds](23)
        pp.psthr    % threshold for smoothed speech probability [0.99] (24)
        pp.pnsaf    % noise probability safety value [0.01] (24)
        pp.pspri    % prior speech probability [0.5] (18)
        pp.asnr     % active SNR in dB [15] (18)
        pp.psini    % initial speech probability [0.5] (23)
        pp.tavini   % assumed speech absent time at start [0.064 seconds]

 If convenient, you can call v_specsub in chunks of arbitrary size. Thus the following are equivalent:

                   (a) y=v_ssubmmse(s,fs);

                   (b) [y1,z]=v_ssubmmse(s(1:1000),fs);
                       [y2,z]=v_ssubmmse(s(1001:2000),z);
                       y3=v_ssubmmse(s(2001:end),z);
                       y=[y1; y2; y3];

 If the number of output arguments is either 2 or 5, the last partial frame of samples will
 be retained for overlap adding with the output from the next call to v_ssubmmse().

 See also v_specsub() for an alternative gain function

 Refs:
    [1] Ephraim, Y. & Malah, D.
        Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator
        IEEE Trans Acoustics Speech and Signal Processing, 32(6):1109-1121, Dec 1984
    [2] Ephraim, Y. & Malah, D.
        Speech enhancement using a minimum mean-square error log-spectral amplitude estimator
        IEEE Trans Acoustics Speech and Signal Processing, 33(2):443-445, Apr 1985
    [3] Rainer Martin.
        Noise power spectral density estimation based on optimal smoothing and minimum statistics.
        IEEE Trans. Speech and Audio Processing, 9(5):504-512, July 2001.
    [4] O. Cappe.
        Elimination of the musical noise phenomenon with the ephraim and malah noise suppressor.
        IEEE Trans Speech Audio Processing, 2 (2): 345-349, Apr. 1994. doi: 10.1109/89.279283.
    [5] J. Erkelens, J. Jensen, and R. Heusdens.
        A data-driven approach to optimizing spectral speech enhancement methods for various error criteria.
        Speech Communication, 49: 530-541, 2007. doi: 10.1016/j.specom.2006.06.012.
    [6] R. Martin.
        Statistical methods for the enhancement of noisy speech.
        In J. Benesty, S. Makino, and J. Chen, editors,
        Speech Enhancement, chapter 3, pages 43-64. Springer-Verlag, 2005.
    [7] Gerkmann, T. & Hendriks, R. C.
        Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay
        IEEE Trans Audio, Speech, Language Processing, 2012, 20, 1383-1393
    [8] Loizou, P.
        Speech enhancement based on perceptually motivated Bayesian estimators of the speech magnitude spectrum.
        IEEE Trans. Speech and Audio Processing, 13(5), 857-869, 2005.

0001 function [ss,gg,tt,ff,zo]=v_ssubmmse(si,fsz,pp)
0002 %V_SSUBMMSE performs speech enhancement using mmse estimate of spectral amplitude or log amplitude [SS,ZO]=(S,FSZ,PP)
0003 %
0004 % Usage:
0005 %    (1) %%%%%%%%%%%% Simple Enhancement %%%%%%%%%%%%
0006 %        y=v_ssubmmse(x,fs);               % enhance the noisy speech using default parameters
0007 %
0008 %    (2) %%%%%%%%%%%% Reading input signal in chunks %%%%%%%%%%%%
0009 %        [y1,zo]=v_ssubmmse(x(1:1000,fs);       % enhance the first chunck of x
0010 %        [y2,zo]=v_ssubmmse(x(1001:2000),zo);   % enhance the second  chunck of x
0011 %        y3=v_ssubmmse(x(1001:2000),zo);        % enhance the remainder of x outputting all samples
0012 %        y=[y1; y2; y3];                      % reassembling the chunks gives same y as example (1)
0013 %
0014 %    (3) %%%%%%%%%%%% Multiple channels using the same gain %%%%%%%%%%%%
0015 %        y=v_ssubmmse([s+n s n],fs);       % enhance the noisy speech, s+n, and
0016 %                                          also apply the same gain to s and n separately
0017 %
0018 %    (4) %%%%%%%%%%%% Multiple channels using maximum of gains %%%%%%%%%%%%
0019 %        pp.gw=[0 0 0 0 1];              % find the maximum of the calculated time-frequency gains
0020 %        y=v_ssubmmse([x1 x2 x3],fs,pp);   % and apply this maximum gain to each of the signals
0021 %
0022 %    (5) %%%%%%%%%%%% Plot gain function %%%%%%%%%%%%
0023 %        [y,g,t,f]=v_ssubmmse(x,fs);       % save the time-frequency gain function
0024 %        imagesc(t,f,20*log10(g'));      % display the time-frequency gain
0025 %        axis 'xy';                      % origin at bottom left
0026 %        xlabel('Time (s)');             % time axis
0027 %        ylabel('Frequency (Hz)');       % frequency axis
0028 %        colorbar;                       % show a color bar
0029 %        v_cblabel('Gain dB');             % label the color bar
0030 %        set(gca,'clim',[-30,5]);        % limit the displayed range to [-30,5] dB
0031 %
0032 %
0033 % Inputs:
0034 %   si(n,c,p) input speech signal with n samples in each of c channels where each channel has p planes.
0035 %             All planes will use the same set of gain functions.
0036 %   fsz       sample frequency in Hz
0037 %             Alternatively, the input state from a previous call (see below)
0038 %   pp        algorithm parameters [optional]
0039 %
0040 % Outputs:
0041 %   ss(n,c,p)   output enhanced speech. Same number of samples as si unless
0042 %               zo output is given in which case incomplete frames will be retained
0043 %   gg(t,f,c,p,length(pp.tf)) selected time-frequency values (see pp.tf below)
0044 %               note that multiple planes will only be included if pp.tf  includes 'i', 'I', 'o' or 'O'
0045 %               Note also that the tt output MUST also be specified
0046 %   tt          centre of frames (in seconds)
0047 %   ff          centre of frequency bins (in Hz)
0048 %   zo          output state (or the 2nd argument if gg,tt,ff are omitted)
0049 %
0050 % The algorithm operation is controlled by a small number of parameters:
0051 %
0052 %        pp.of          % overlap factor = (fft length)/(frame increment) [2]
0053 %        pp.ti          % desired frame increment [0.016 seconds]
0054 %        pp.ri          % set to 1 to round ti to the nearest power of 2 samples [0]
0055 %        pp.ta          % time const for smoothing SNR estimate [0.396 seconds]
0056 %        pp.gx          % maximum posterior SNR as a power ratio [1000 = +30dB]
0057 %        pp.gn          % min posterior SNR as a power ratio when estimating prior SNR [1 = 0dB]
0058 %        pp.gz          % min posterior SNR as a power ratio [0.001 = -30dB]
0059 %        pp.xn          % minimum prior SNR [0]
0060 %        pp.xb          % bias compensation factor for prior SNR [1]
0061 %        pp.lg          % MMSE target: 0=amplitude, 1=log amplitude, 2=perceptual Bayes [1]
0062 %        pp.ne          % noise estimation: 0=min statistics, 1=MMSE [1]
0063 %        pp.bt          % threshold for binary gain or -1 for continuous gain [-1]
0064 %        pp.mx          % input mixture gain [0]
0065 %        pp.gc          % maximum amplitude gain [10 = 20 dB]
0066 %        pp.rf          % round output signal to an exact number of frames [0]
0067 %        pp.gw          % multichannel gain weights: [chan-n chan-1 ave min max] default:[0 1 0 0 0]
0068 %        pp.tf          % selects one or more time-frequency planes to output in the gg() variable ['g']
0069 %                           'i' = input power spectrum
0070 %                           'I' = input complex spectrum
0071 %                           'n' = noise power spectrum
0072 %                           'z' = "posterior" SNR (i.e. (S+N)/N )
0073 %                           'x' = "prior" SNR (i.e. S/N )
0074 %                           'G' = raw gain (before clipping and multichannel weighting)
0075 %                           'g' = final gain
0076 %                           'o' = output power spectrum
0077 %                           'O' = output complex spectrum
0078 %
0079 % The applied gain is mx+(1-mx)*optgain where optgain is calculated according to [1] or [2].
0080 % If pp.bt>=0 then optgain is first thresholded with pp.bt to produce a binary gain 0 or 1.
0081 %
0082 % The default parameters implement the original algorithm in [1,2].
0083 %
0084 % Several parameters relate to the estimation of xi, the so-called "prior SNR",
0085 %
0086 %             xi=max(a*pp.xb*xu+(1-a)*max(gami-1,pp.gn-1),pp.xn);
0087 %
0088 % This is estimated as a smoothed version of 1 less than gami, the "posterior SNR"
0089 % which is the noisy speech power divided by the noise power. This is
0090 % clipped to a min of (pp.gn-1), smoothed using a factor "a" which corresponds to a
0091 % time-constant of pp.ta and then clipped to a minimum of pp.xn. The
0092 % previous value is taken to be pp.xb*xu where xu is the ratio of the
0093 % estimated speech amplitude squared to the noise power.
0094 %
0095 % In addition it is possible to specify parameters for the noise estimation algorithm
0096 % which implements reference [3] or [7] according to the setting of pp.ne
0097 %
0098 % Minimum statistics noise estimate [3]: pp.ne=0
0099 %        pp.taca      % (11): smoothing time constant for alpha_c [0.0449 seconds]
0100 %        pp.tamax     % (3): max smoothing time constant [0.392 seconds]
0101 %        pp.taminh    % (3): min smoothing time constant (upper limit) [0.0133 seconds]
0102 %        pp.tpfall    % (12): time constant for P to fall [0.064 seconds]
0103 %        pp.tbmax     % (20): max smoothing time constant [0.0717 seconds]
0104 %        pp.qeqmin    % (23): minimum value of Qeq [2]
0105 %        pp.qeqmax    % max value of Qeq per frame [14]
0106 %        pp.av        % (23)+13 lines: fudge factor for bc calculation  [2.12]
0107 %        pp.td        % time to take minimum over [1.536 seconds]
0108 %        pp.nu        % number of subwindows to use [3]
0109 %        pp.qith      % Q-inverse thresholds to select maximum noise slope [0.03 0.05 0.06 Inf ]
0110 %        pp.nsmdb     % corresponding noise slope thresholds in dB/second   [47 31.4 15.7 4.1]
0111 %
0112 % MMSE noise estimate [7]: pp.ne=1
0113 %        pp.tax      % smoothing time constant for noise power estimate [0.0717 seconds](8)
0114 %        pp.tap      % smoothing time constant for smoothed speech prob [0.152 seconds](23)
0115 %        pp.psthr    % threshold for smoothed speech probability [0.99] (24)
0116 %        pp.pnsaf    % noise probability safety value [0.01] (24)
0117 %        pp.pspri    % prior speech probability [0.5] (18)
0118 %        pp.asnr     % active SNR in dB [15] (18)
0119 %        pp.psini    % initial speech probability [0.5] (23)
0120 %        pp.tavini   % assumed speech absent time at start [0.064 seconds]
0121 %
0122 % If convenient, you can call v_specsub in chunks of arbitrary size. Thus the following are equivalent:
0123 %
0124 %                   (a) y=v_ssubmmse(s,fs);
0125 %
0126 %                   (b) [y1,z]=v_ssubmmse(s(1:1000),fs);
0127 %                       [y2,z]=v_ssubmmse(s(1001:2000),z);
0128 %                       y3=v_ssubmmse(s(2001:end),z);
0129 %                       y=[y1; y2; y3];
0130 %
0131 % If the number of output arguments is either 2 or 5, the last partial frame of samples will
0132 % be retained for overlap adding with the output from the next call to v_ssubmmse().
0133 %
0134 % See also v_specsub() for an alternative gain function
0135 %
0136 % Refs:
0137 %    [1] Ephraim, Y. & Malah, D.
0138 %        Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator
0139 %        IEEE Trans Acoustics Speech and Signal Processing, 32(6):1109-1121, Dec 1984
0140 %    [2] Ephraim, Y. & Malah, D.
0141 %        Speech enhancement using a minimum mean-square error log-spectral amplitude estimator
0142 %        IEEE Trans Acoustics Speech and Signal Processing, 33(2):443-445, Apr 1985
0143 %    [3] Rainer Martin.
0144 %        Noise power spectral density estimation based on optimal smoothing and minimum statistics.
0145 %        IEEE Trans. Speech and Audio Processing, 9(5):504-512, July 2001.
0146 %    [4] O. Cappe.
0147 %        Elimination of the musical noise phenomenon with the ephraim and malah noise suppressor.
0148 %        IEEE Trans Speech Audio Processing, 2 (2): 345-349, Apr. 1994. doi: 10.1109/89.279283.
0149 %    [5] J. Erkelens, J. Jensen, and R. Heusdens.
0150 %        A data-driven approach to optimizing spectral speech enhancement methods for various error criteria.
0151 %        Speech Communication, 49: 530-541, 2007. doi: 10.1016/j.specom.2006.06.012.
0152 %    [6] R. Martin.
0153 %        Statistical methods for the enhancement of noisy speech.
0154 %        In J. Benesty, S. Makino, and J. Chen, editors,
0155 %        Speech Enhancement, chapter 3, pages 43-64. Springer-Verlag, 2005.
0156 %    [7] Gerkmann, T. & Hendriks, R. C.
0157 %        Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay
0158 %        IEEE Trans Audio, Speech, Language Processing, 2012, 20, 1383-1393
0159 %    [8] Loizou, P.
0160 %        Speech enhancement based on perceptually motivated Bayesian estimators of the speech magnitude spectrum.
0161 %        IEEE Trans. Speech and Audio Processing, 13(5), 857-869, 2005.
0162 
0163 % Bugs/suggestions:
0164 %   (1) sort out behaviour when si() is a matrix rather than a vector
0165 %
0166 %      Copyright (C) Mike Brookes 2004-2017
0167 %      Version: $Id: v_ssubmmse.m 10865 2018-09-21 17:22:45Z dmb $
0168 %
0169 %   VOICEBOX is a MATLAB toolbox for speech processing.
0170 %   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
0171 %
0172 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0173 %   This program is free software; you can redistribute it and/or modify
0174 %   it under the terms of the GNU Lesser General Public License as published by
0175 %   the Free Software Foundation; either version 3 of the License, or
0176 %   (at your option) any later version.
0177 %
0178 %   This program is distributed in the hope that it will be useful,
0179 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0180 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0181 %   GNU Lesser General Public License for more details.
0182 %
0183 %   You can obtain a copy of the GNU Lesser General Public License from
0184 %   https://www.gnu.org/licenses/ .
0185 %
0186 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0187 persistent cc kk
0188 if ~numel(kk)           % if precomputed constants don't exist yet
0189     kk=sqrt(2*pi);      % sqrt(8)*Gamma(1.5)
0190     cc=sqrt(2/pi);      % sqrt(2)/Gamma(0.5)
0191 end
0192 szs=size(si);
0193 if szs(1)==1 && szs(2)>1 && (~isstruct(fsz) || ~size(fsz.si,1) || size(fsz.si,2)==1)
0194     si=si'; % flip if only one channel and size(si,1)==1
0195     szs=size(si);
0196 end
0197 nc=szs(2);    % number of channels
0198 if length(szs)>2
0199     np=szs(3);  % number of planes
0200 else
0201     np=1;
0202 end
0203 ncp=nc*np;
0204 if isstruct(fsz)
0205     fs=fsz.fs;          % sample frequency
0206     qq=fsz.qq;          % parameter structure
0207     qp=fsz.qp;
0208     ze=fsz.ze;          % saved structure for noise estimation
0209     nrs=size(fsz.si,1);         % number of samples saved from last time
0210     s=zeros(nrs+szs(1),nc*np);     % allocate space for speech
0211     s(1:nrs,:)=fsz.si;          % preappend the saved samples
0212     s(nrs+1:end,:)=reshape(si,[],ncp);  % append the new samples
0213 else
0214     fs=fsz;     % sample frequency
0215     s=si;
0216     % default algorithm constants
0217     
0218     qq.of=2;        % overlap factor = (fft length)/(frame increment)
0219     qq.ti=16e-3;    % desired frame increment (16 ms)
0220     qq.ri=0;        % round ni to the nearest power of 2
0221     qq.ta=0.396;    % Time const for smoothing SNR estimate = -tinc/log(0.98) from [1]
0222     qq.gx=1000;     % maximum posterior SNR = 30dB
0223     qq.gn=1;        % min posterior SNR as a power ratio when estimating prior SNR [1]
0224     qq.gz=0.001;    % min posterior SNR as a power ratio [0.001 = -30dB]
0225     qq.xn=0;        % minimum prior SNR = -Inf dB
0226     qq.xb=1;        % bias compensation factor for prior SNR [1]
0227     qq.lg=1;        % use log-domain estimator by default
0228     qq.ne=1;        % noise estimation: 0=min statistics, 1=MMSE [1]
0229     qq.bt=-1;       % suppress binary masking
0230     qq.mx=0;        % no input mixing
0231     qq.gc=10;       % maximum amplitude gain [10 = 20 dB]
0232     qq.gw=[0 1 0 0 0];  % multichannel gain weights: [chan-n chan-1 ave min max]
0233     qq.tf='g';      % output the gain time-frequency plane by default
0234     qq.rf=0;
0235     if nargin>=3 && ~isempty(pp)
0236         qp=pp;      % save for v_estnoisem call
0237         qqn=fieldnames(qq);
0238         for i=1:length(qqn)
0239             if isfield(pp,qqn{i})
0240                 qq.(qqn{i})=pp.(qqn{i});
0241             end
0242         end
0243         if length(qq.gw)<5
0244             qq.gw(5)=0;
0245         end
0246         qq.gw=qq.gw(1:5)/sum(qq.gw(1:5)); % ensure gain weights are normaized and trim to length 5
0247     else
0248         qp=struct;  % make an empty structure for v_estnoisem
0249     end
0250 end
0251 % derived algorithm constants
0252 if qq.ri
0253     ni=pow2(nextpow2(qq.ti*fs*sqrt(0.5)));
0254 else
0255     ni=round(qq.ti*fs);    % frame increment in samples
0256 end
0257 tinc=ni/fs;         % true frame increment time
0258 a=exp(-tinc/qq.ta); % SNR smoothing coefficient
0259 gx=qq.gx;           % max posterior SNR as a power ratio
0260 gz=qq.gz;           % min posterior SNR as a power ratio
0261 xn=qq.xn;           % floor for prior SNR, xi
0262 ne=qq.ne;           % noise estimation: 0=min statistics, 1=MMSE [0]
0263 gn1=max(qq.gn-1,0); % floor for posterior SNR when estimating prior SNR
0264 xb=qq.xb;
0265 tf=qq.tf;
0266 
0267 % calculate power spectrum in frames
0268 
0269 no=round(qq.of);                      % integer overlap factor
0270 nf=ni*no;                           % fft length
0271 w=sqrt(hamming(nf+1))'; w(end)=[];  % for now always use sqrt hamming window
0272 w=w/sqrt(sum(w(1:ni:nf).^2));       % normalize to give overall gain of 1
0273 rf=qq.rf || nargout==2 || nargout==5;   % rf=1 if we need to round down to an exact number of frames
0274 if rf>0                             % set flag for call to v_enframe
0275     rfm='';                         % truncate input to an exact number of frames
0276 else
0277     rfm='r';                        % reflect final few input samples to fill up final frame
0278 end
0279 [y,tt]=v_enframe(s(:,1),w,ni,rfm);    % v_enframe channel 1
0280 tt=tt/fs;                           % frame times
0281 yf=v_rfft(y,nf,2);                    % complex spectrum of input speech
0282 [nr,nf2]=size(yf);                  % nr = number of frames in this chunk
0283 nf2nc=nf2*nc;                       % number of gains to calculate per frame
0284 nf2ncp=nf2*ncp;                     % number of STFT values per frame
0285 yf(1,1,ncp)=yf(1,1);                % enlarge yf to cope with nc*np channels
0286 for ic=2:ncp                        % loop for each additional channel
0287     yf(:,:,ic)=v_rfft(v_enframe(s(:,ic),w,ni,rfm),nf,2);   % complex spectrum of channel ic
0288 end
0289 yf=reshape(yf,nr,nf2ncp);           % concatenate all the channels
0290 tfe='nzxG';                         % tf entries that require enhancement on all channels
0291 nzxG=nargout>2 && ~isempty(tf) && any(any(tf(ones(length(tfe),1),:) == tfe(ones(length(tf),1),:)')); % tf parameter includes 'n','z','x' or 'G'
0292 nce= 1 + (nc-1)*(qq.gw(2)<1 || nzxG); % channels to enhance
0293 nf2nce=nf2*nce;                     % total number of frequency bins for enhancement
0294 if nc > nce                         % only need the gain from channel 1
0295     yp=yf(:,1:nf2).*conj(yf(:,1:nf2));  % power spectrum of channel 1 of input speech
0296 else
0297     yp=yf(:,1:nf2nc).*conj(yf(:,1:nf2nc));  % power spectrum of all channels but only plane 1 of input speech
0298 end
0299 ff=(0:nf2-1)*fs/nf;                 % frequency axis
0300 if isstruct(fsz)                    % check if we are following a previous call
0301     if ne>0                         % check noise estimation method
0302         [dp,ze]=v_estnoiseg(yp,ze);   % estimate the noise using MMSE in all frequency bins of nce channels
0303     else
0304         [dp,ze]=v_estnoisem(yp,ze);   % estimate the noise using minimum statistics
0305     end
0306     ssv=fsz.ssv;                    % saved output samples from previous call for overlap adding
0307     xu=fsz.xu;                      % saved unsmoothed SNR
0308 else
0309     if ne>0                         % check noise estimation method
0310         [dp,ze]=v_estnoiseg(yp,tinc,qp);    % estimate the noise using MMSE
0311     else
0312         [dp,ze]=v_estnoisem(yp,tinc,qp);    % estimate the noise using minimum statistics
0313     end
0314     ssv=zeros(ni*(no-1),ncp);       % dummy saved output samples from previous call for overlap adding
0315     xu=1;                           % dummy unsmoothed SNR from previous frame %%%%%%%%%%% check dimensions
0316 end
0317 
0318 ss=[]; % in case of no data frames
0319 gg=[]; % in case of no data frames or empty tf
0320 if nr>0
0321     gam=max(min(yp./dp,gx),gz);        % gamma = posterior SNR
0322     g=zeros(nr,nf2nce);                 % create space for gain matrix
0323     x=zeros(nr,nf2nce);                 % create space for prior SNR
0324     switch qq.lg
0325         case 0                      % use amplitude domain estimator from [1]
0326             for i=1:nr              % loop for each frame
0327                 gami=gam(i,:);
0328                 xi=max(a*xb*xu+(1-a)*max(gami-1,gn1),xn);  % prior SNR
0329                 v=0.5*xi.*gami./(1+xi);    % note that this equals 0.5*vk in [1]
0330                 gi=(0.277+2*v)./gami;     % accurate to 0.02 dB for v>0.5
0331                 mv=v<0.5;
0332                 if any(mv)
0333                     vmv=v(mv);
0334                     gi(mv)=kk*sqrt(vmv).*((0.5+vmv).*besseli(0,vmv)+vmv.*besseli(1,vmv))./(gami(mv).*exp(vmv));
0335                 end
0336                 g(i,:)=gi;           % save gain for later
0337                 x(i,:)=xi;          % save prior SNR
0338                 xu=gami.*gi.^2;       % unsmoothed prior SNR
0339             end
0340         case 2                        % perceptually motivated estimator from [8]
0341             for i=1:nr              % loop for each frame
0342                 gami=gam(i,:);
0343                 xi=max(a*xb*xu+(1-a)*max(gami-1,gn1),xn);  % prior SNR
0344                 v=0.5*xi.*gami./(1+xi);    % note that this is 0.5*vk in [8]
0345                 gi=cc*sqrt(v).*exp(v)./(gami.*besseli(0,v));
0346                 g(i,:)=gi;              % save gain for later
0347                 x(i,:)=xi;              % save prior SNR
0348                 xu=gami.*gi.^2;         % unsmoothed prior SNR
0349             end
0350         otherwise                       % use log domain estimator from [2]
0351             for i=1:nr              % loop for each frame
0352                 gami=gam(i,:);
0353                 xi=max(a*xb*xu+(1-a)*max(gami-1,gn1),xn);  % prior SNR
0354                 xir=xi./(1+xi);
0355                 gi=xir.*exp(0.5*expint(xir.*gami));
0356                 g(i,:)=gi;                 % save gain for later
0357                 x(i,:)=xi;              % save prior SNR
0358                 xu=gami.*gi.^2;         % unsmoothed prior SNR
0359             end
0360     end
0361     if nc>nce                           % only calculated the gain from channel 1
0362         g=repmat(g,1,nc);               % replicate channel 1 gain for all channels
0363     else
0364         gr=g;                           % save raw gain in case we need to output it
0365         gwx=find(qq.gw(2:end));         % find which channel-independent gain terms to include
0366         if nc>1 && ~isempty(gwx)
0367             g3=reshape(g,[nr,nf2,nc]);
0368             gx=zeros(nr,nf2);              % space for channel-independent gain component
0369             for ig=gwx                  % loop for each gain component with a non-zero weight
0370                 switch ig
0371                     case 1                      % chan 1
0372                         gx=g3(:,:,1)*qq.gw(2);
0373                     case 2                      % average
0374                         gx=gx+mean(g3,3)*qq.gw(3);
0375                     case 3                      % min
0376                         gx=gx+min(g3,[],3)*qq.gw(4);
0377                     case 4                      % max
0378                         gx=gx+max(g3,[],3)*qq.gw(5);
0379                 end
0380             end
0381             g=g*qq.gw(1)+repmat(gx,1,nc); % add in the channel-independent gain components
0382         end
0383     end
0384     g=min(qq.mx+(1-qq.mx)*g,qq.gc);                 % mix in some of the input and clip to qq.gc
0385     if qq.bt>=0
0386         g=g>qq.bt;                      % apply binary masking threshold
0387     end
0388     se=v_irfft(reshape(yf.*repmat(g,1,np),[nr nf2 nc*np]),nf,2).*repmat(w,[nr 1 nc*np]);     % inverse dft and apply output window
0389     ss=zeros(ni*(nr+no-1),nc*np,no);                % space for overlapped output speech
0390     ss(1:ni*(no-1),:,end)=ssv;                      % insert saved output speech (already overlap-added)
0391     for i=1:no                                      % insert frames into no columns for overlap-adding
0392         nm=nf*(1+floor((nr-i)/no));                 % number of samples in this column
0393         ss(1+(i-1)*ni:nm+(i-1)*ni,:,i)=reshape(permute(se(i:no:nr,:,:),[2 1 3]),nm,nc*np); % concatenate every no'th frame
0394     end
0395     ss=reshape(sum(ss,3),[],nc,np);                 % perform overlap add and split into planes
0396     if nargout>2 && ~isempty(tf)
0397         if (any(lower(tf)=='i') || any(lower(tf)=='o'))                         % tf entries that require all planes
0398             npg=np;  % number of planes in gg array
0399         else
0400             npg=1;
0401         end
0402         gg=zeros(nr,nf2nc*npg,length(tf));  % make space
0403         if ncp>nce && (any(tf=='i') || any(tf=='o'))
0404             yp=yf.*conj(yf);        % calculate power spectrum for all channels/planes
0405         end
0406         for i=1:length(tf)
0407             switch tf(i)
0408                 case 'i'            % 'i' = input power spectrum (nc*np)
0409                     gg(:,:,i)=yp;
0410                 case 'I'            % 'I' = input complex spectrum (nc*np)
0411                     gg(:,:,i)=yf;
0412                 case 'n'            % 'n' = noise power spectrum (nc)
0413                     gg(:,:,i)=repmat(dp,1,npg);
0414                 case 'z'            % 'z' = posterior SNR (i.e. (S+N)/N ) (nc)
0415                     gg(:,:,i)=repmat(gam,1,npg);
0416                 case 'x'            % 'x' = prior SNR (nc)
0417                     gg(:,:,i)=repmat(x,1,npg);
0418                 case 'G'            % 'G' = raw gain (nc)
0419                     gg(:,:,i)=repmat(gr,1,npg);
0420                 case 'g'            % 'g' = final gain (nc)
0421                     gg(:,:,i)=repmat(g,1,npg);
0422                 case 'o'            % 'o' = output power spectrum (nc*np)
0423                     gg(:,:,i)=yp.*g.^2;
0424                 case 'O'            % 'O' = output complex spectrum (nc*np)
0425                     gg(:,:,i)=yf.*g;
0426             end
0427         end
0428         gg=reshape(gg,[nr,nf2,nc,npg,length(tf)]); % make it 5D
0429     end
0430 end % if ~nr
0431 if nargout==2 || nargout==5         % we have a zo output argument
0432     if nr
0433         zo.ssv=ss(end-ni*(no-1)+1:end,:);    % save the last no-1 hops for overlap-adding next time
0434         ss(end-ni*(no-1)+1:end,:)=[];        % only output the frames that are completed
0435     else
0436         zo.ssv=ssv;                 % if no new frames just keep the old tail
0437     end
0438     zo.si=s(length(ss)+1:end,:);      % save the tail end of the input speech signal
0439     zo.fs=fs;                       % save sample frequency
0440     zo.qq=qq;                       % save local parameters
0441     zo.qp=qp;                       % save v_estnoisem parameters
0442     zo.ze=ze;                       % save state of noise estimation
0443     zo.xu=xu;
0444     if nargout==2
0445         gg=zo;                      % 2nd of two arguments is zo
0446     end
0447 elseif rf==0
0448     ss=ss(1:size(s,1),:,:);             % trim to the correct length if not an exact number of frames
0449 end
0450 if ~nargout && nr>0
0451     ffax=ff/1000;
0452     ax=zeros(4,1);
0453     ax(1)=subplot(223);
0454     imagesc(tt,ffax,20*log10(g(:,1:nf2))');
0455     colorbar;
0456     axis('xy');
0457     title(sprintf('Filter Gain (dB): ta=%.2g',qq.ta));
0458     xlabel('Time (s)');
0459     ylabel('Frequency (kHz)');
0460     
0461     ax(2)=subplot(222);
0462     imagesc(tt,ffax,10*log10(yp(:,1:nf2))');
0463     colorbar;
0464     axis('xy');
0465     title('Noisy Speech (dB)');
0466     xlabel('Time (s)');
0467     ylabel('Frequency (kHz)');
0468     
0469     ax(3)=subplot(224);
0470     imagesc(tt,ffax,10*log10(yp(:,1:nf2).*g(:,1:nf2).^2)');
0471     colorbar;
0472     axis('xy');
0473     title('Enhanced Speech (dB)');
0474     xlabel('Time (s)');
0475     ylabel('Frequency (kHz)');
0476     
0477     ax(4)=subplot(221);
0478     imagesc(tt,ffax,10*log10(dp(:,1:nf2))');
0479     colorbar;
0480     axis('xy');
0481     title('Noise Estimate (dB)');
0482     xlabel('Time (s)');
0483     ylabel('Frequency (kHz)');
0484     linkaxes(ax);
0485 end

v_ssubmmse

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SOURCE CODE