Description of v

V_ESTNOISEG - estimate MMSE noise spectrum [x,zo]=(yf,tz,pp)

 Usage:    ninc=round(0.016*fs);   % frame increment [fs=sample frequency]
           ovf=2;                  % overlap factor
           f=v_rfft(v_enframe(s,v_windows(2,ovf*ninc,'l'),ninc),ovf*ninc,2);  % enframe with hanning window
           f=f.*conj(f);           % convert to power spectrum
           x=v_estnoiseg(f,ninc/fs); % estimate the noise power spectrum

 Inputs:
   yf      input power spectra (one row per frame)
   tz      frame increment in seconds
           Alternatively, the input state from a previous call (see below)
   pp      algorithm parameters [optional]

 Outputs:
   x       estimated noise power spectra (one row per frame)
   zo      output state

 The algorithm parameters are defined in reference [1] from which equation
 numbers are given in parentheses. They are as follows:

        pp.tax      % smoothing time constant for noise power estimate [0.0717 seconds](8)
        pp.tap      % smoothing time constant for smoothed speech prob [0.152 seconds](23)
        pp.psthr    % threshold for smoothed speech probability [0.99] (24)
        pp.pnsaf    % noise probability safety value [0.01] (24)
        pp.pspri    % prior speech probability [0.5] (18)
        pp.asnr     % active SNR in dB [15] (18)
        pp.psini    % initial speech probability [0.5] (23)
        pp.tavini   % assumed speech absent time at start [0.064 seconds]

 If convenient, you can call v_estnoiseg in chunks of arbitrary size. Thus the following are equivalent:

                   (a) dp=v_estnoiseg(yp(1:300),tinc);

                   (b) [dp(1:100,:),z]=v_estnoiseg(yp(1:100,:),tinc);
                       [dp(101:200,:),z]=v_estnoiseg(yp(101:200,:),z);
                       [dp(201:300,:),z]=v_estnoiseg(yp(201:300,:),z);

0001 function [x,zo]=v_estnoiseg(yf,tz,pp)
0002 %V_ESTNOISEG - estimate MMSE noise spectrum [x,zo]=(yf,tz,pp)
0003 %
0004 % Usage:    ninc=round(0.016*fs);   % frame increment [fs=sample frequency]
0005 %           ovf=2;                  % overlap factor
0006 %           f=v_rfft(v_enframe(s,v_windows(2,ovf*ninc,'l'),ninc),ovf*ninc,2);  % enframe with hanning window
0007 %           f=f.*conj(f);           % convert to power spectrum
0008 %           x=v_estnoiseg(f,ninc/fs); % estimate the noise power spectrum
0009 %
0010 % Inputs:
0011 %   yf      input power spectra (one row per frame)
0012 %   tz      frame increment in seconds
0013 %           Alternatively, the input state from a previous call (see below)
0014 %   pp      algorithm parameters [optional]
0015 %
0016 % Outputs:
0017 %   x       estimated noise power spectra (one row per frame)
0018 %   zo      output state
0019 %
0020 % The algorithm parameters are defined in reference [1] from which equation
0021 % numbers are given in parentheses. They are as follows:
0022 %
0023 %        pp.tax      % smoothing time constant for noise power estimate [0.0717 seconds](8)
0024 %        pp.tap      % smoothing time constant for smoothed speech prob [0.152 seconds](23)
0025 %        pp.psthr    % threshold for smoothed speech probability [0.99] (24)
0026 %        pp.pnsaf    % noise probability safety value [0.01] (24)
0027 %        pp.pspri    % prior speech probability [0.5] (18)
0028 %        pp.asnr     % active SNR in dB [15] (18)
0029 %        pp.psini    % initial speech probability [0.5] (23)
0030 %        pp.tavini   % assumed speech absent time at start [0.064 seconds]
0031 %
0032 % If convenient, you can call v_estnoiseg in chunks of arbitrary size. Thus the following are equivalent:
0033 %
0034 %                   (a) dp=v_estnoiseg(yp(1:300),tinc);
0035 %
0036 %                   (b) [dp(1:100,:),z]=v_estnoiseg(yp(1:100,:),tinc);
0037 %                       [dp(101:200,:),z]=v_estnoiseg(yp(101:200,:),z);
0038 %                       [dp(201:300,:),z]=v_estnoiseg(yp(201:300,:),z);
0039 
0040 
0041 % This is intended to be a precise implementation of [1] for a frame rate of 62.5 Hz.
0042 % Time constants are adjusted for other frame rates.
0043 %
0044 % Refs:
0045 %    [1] Gerkmann, T. & Hendriks, R. C.
0046 %        Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay
0047 %        IEEE Trans Audio, Speech, Language Processing, 2012, 20, 1383-1393
0048 
0049 %       Copyright (C) Mike Brookes 2012
0050 %      Version: $Id: v_estnoiseg.m 10865 2018-09-21 17:22:45Z dmb $
0051 %
0052 %   VOICEBOX is a MATLAB toolbox for speech processing.
0053 %   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
0054 %
0055 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0056 %   This program is free software; you can redistribute it and/or modify
0057 %   it under the terms of the GNU Lesser General Public License as published by
0058 %   the Free Software Foundation; either version 3 of the License, or
0059 %   (at your option) any later version.
0060 %
0061 %   This program is distributed in the hope that it will be useful,
0062 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0063 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0064 %   GNU Lesser General Public License for more details.
0065 %
0066 %   You can obtain a copy of the GNU Lesser General Public License from
0067 %   https://www.gnu.org/licenses/ .
0068 %    See files gpl-3.0.txt and lgpl-3.0.txt included in this distribution.
0069 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0070 
0071 [nr,nrf]=size(yf);          % number of frames and freq bins
0072 x=zeros(nr,nrf);            % initialize output arrays
0073 if isempty(yf) && isstruct(tz)             % no real data
0074     zo=tz;                  % just keep the same state
0075 else
0076     if isstruct(tz)         % take parameters from a previous call
0077         nrcum=tz.nrcum;     % cumulative number of frames
0078         xt=tz.xt;           % smoothed power spectrum
0079         pslp=tz.pslp;       % correction factor (9)
0080         tinc=tz.tinc;       % frame increment
0081         qq=tz.qq;           % parameter structure
0082     else
0083         tinc = tz;          % second argument is frame increment
0084         nrcum=0;            % no frames so far
0085         % default algorithm constants
0086         qq.tax=0.0717;      % noise output smoothing time constant = -tinc/log(0.8) (8)
0087         qq.tap=0.152;       % speech prob smoothing time constant = -tinc/log(0.9) (23)
0088         qq.psthr=0.99;      % threshold for smoothed speech probability [0.99] (24)
0089         qq.pnsaf=0.01;      % noise probability safety value [0.01] (24)
0090         qq.pspri=0.5;       % prior speech probability [0.5] (18)
0091         qq.asnr=15;         % active SNR in dB [15] (18)
0092         qq.psini=0.5;       % initial speech probability [0.5] (23)
0093         qq.tavini=0.064;        % assumed speech absent time at start [64 ms]
0094 
0095         if nargin>=3 && ~isempty(pp)  % update fields from pp input
0096             qqn=fieldnames(qq);
0097             for i=1:length(qqn)
0098                 if isfield(pp,qqn{i})
0099                     qq.(qqn{i})=pp.(qqn{i});
0100                 end
0101             end
0102         end
0103         pslp=repmat(qq.psini,1,nrf); % initialize smoothed speech presence prob
0104         xt=[];                       % initialize just in case the first call has no data
0105     end
0106 
0107     % unpack parameters needed within the loop
0108 
0109     psthr=qq.psthr;     % threshold for smoothed speech probability [0.99] (24)
0110     pnsaf=qq.pnsaf;     % noise probability safety value [0.01] (24)
0111 
0112     % derived algorithm constants
0113 
0114     ax=exp(-tinc/qq.tax); % noise output smoothing factor = 0.8 (8)
0115     axc=1-ax;
0116     ap=exp(-tinc/qq.tap); % noise output smoothing factor = 0.9 (23)
0117     apc=1-ap;
0118     xih1=10^(qq.asnr/10); % speech-present SNR
0119     xih1r=1/(1+xih1)-1;
0120     pfac=(1/qq.pspri-1)*(1+xih1); % p(noise)/p(speech) (18)
0121 
0122     if nrcum==0 && nr>0       % initialize values for first frame
0123         xt=qq.psini*mean(yf(1:max(1,min(nr,round(1+qq.tavini/tinc))),:),1);  % initial noise estimate
0124     end
0125 
0126     % loop for each frame
0127     for t=1:nr
0128         yft=yf(t,:);        % noisy speech power spectrum
0129         ph1y=(1+pfac*exp(xih1r*yft./xt)).^(-1); % a-posteriori speech presence prob (18)
0130         pslp=ap*pslp+apc*ph1y; % smoothed speech presence prob (23)
0131         ph1y=min(ph1y,1-pnsaf*(pslp>psthr)); % limit ph1y (24)
0132         xtr=(1-ph1y).*yft+ph1y.*xt; % estimated raw noise spectrum (22)
0133         xt=ax*xt+axc*xtr;  % smooth the noise estimate (8)
0134         x(t,:)=xt;  % save the noise estimate
0135     end
0136     if nargout>1    % we need to store the state for next time
0137         zo.nrcum=nrcum+nr;      % number of frames so far
0138         zo.xt=xt;          % smoothed power spectrum
0139         zo.pslp=pslp;               % correction factor (9)
0140         zo.tinc=tinc;     % must be the last one
0141         zo.qq=qq;
0142     end
0143     if ~nargout
0144         clf;
0145         subplot(212);
0146         plot((1:nr)*tinc,10*log10([sum(yf,2) sum(x,2)]))
0147         ylabel('Frame Energy (dB)');
0148         xlabel(sprintf('Time (s)   [%d ms frame incr]',round(tinc*1000)));
0149         v_axisenlarge([-1 -1.05]);
0150         legend('input','noise','Location','Best');
0151         subplot(211);
0152         plot(1:nrf,10*log10([sum(yf,1)'/nr sum(x,1)'/nr]))
0153         ylabel('Power (dB)');
0154         xlabel('Frequency bin');
0155         v_axisenlarge([-1 -1.05]);
0156         legend('input','noise','Location','Best');
0157     end
0158 end
0159

v_estnoiseg

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SOURCE CODE