Home > voicebox > estnoisem.m

# estnoisem

## PURPOSE

ESTNOISEM - estimate noise spectrum using minimum statistics

## SYNOPSIS

function [x,zo,xs]=estnoisem(yf,tz,pp)

## DESCRIPTION

```ESTNOISEM - estimate noise spectrum using minimum statistics

Usage:    ninc=round(0.016*fs);   % frame increment [fs=sample frequency]
ovf=2;                  % overlap factor
f=rfft(enframe(s,hanning(ovf*ninc,'periodic'),ninc),ovf*ninc,2);
f=f.*conj(f);           % convert to power spectrum
x=estnoisem(f,ninc/fs); % estimate the noise power spectrum

Inputs:
yf      input power spectra (one row per frame)
tz      frame increment in seconds
Alternatively, the input state from a previous call (see below)
pp      algorithm parameters [optional]

Outputs:
x       estimated noise power spectra (one row per frame)
zo      output state
xs      estimated std error of x (one row per frame)
xs seems often to be an underestimate by a factor of 2 or 3

The algorithm parameters are defined in reference [1] from which equation
numbers are given in parentheses. They are as follows:

pp.taca      % (11): smoothing time constant for alpha_c [0.0449 seconds]
pp.tamax     % (3): max smoothing time constant [0.392 seconds]
pp.taminh    % (3): min smoothing time constant (upper limit) [0.0133 seconds]
pp.tpfall    % (12): time constant for P to fall [0.064 seconds]
pp.tbmax     % (20): max smoothing time constant [0.0717 seconds]
pp.qeqmin    % (23): minimum value of Qeq [2]
pp.qeqmax    % max value of Qeq per frame [14]
pp.av        % (23)+13 lines: fudge factor for bc calculation  [2.12]
pp.td        % time to take minimum over [1.536 seconds]
pp.nu        % number of subwindows to use [3]
pp.qith      % Q-inverse thresholds to select maximum noise slope [0.03 0.05 0.06 Inf ]
pp.nsmdb     % corresponding noise slope thresholds in dB/second   [47 31.4 15.7 4.1]

Example use:      y=enframe(s,w,ni);                  % divide speech signal s(n) into
% overlapping frames using window w(n)
yf=rfft(y,nf,2);                    % take fourier transform
dp=estnoisem(yf.*conj(yf),tinc);    % estimate the noise

If convenient, you can call estnoisem in chunks of arbitrary size. Thus the following are equivalent:

(a) dp=estnoisem(yp(1:300),tinc);

(b) [dp(1:100),z]=estnoisem(yp(1:100),tinc);
[dp(101:200),z]=estnoisem(yp(101:200),z);
[dp(201:300),z]=estnoisem(yp(201:300),z);```

## CROSS-REFERENCE INFORMATION

This function calls:
• axisenlarge AXISENLARGE - enlarge the axes of a figure (f,h)
This function is called by:
• specsub SPECSUB performs speech enhancement using spectral subtraction [SS,ZO]=(S,FSZ,P)
• ssubmmse SSUBMMSE performs speech enhancement using mmse estimate of spectral amplitude or log amplitude [SS,ZO]=(S,FSZ,PP)

## SOURCE CODE

```0001 function [x,zo,xs]=estnoisem(yf,tz,pp)
0002 %ESTNOISEM - estimate noise spectrum using minimum statistics
0003 %
0004 % Usage:    ninc=round(0.016*fs);   % frame increment [fs=sample frequency]
0005 %           ovf=2;                  % overlap factor
0006 %           f=rfft(enframe(s,hanning(ovf*ninc,'periodic'),ninc),ovf*ninc,2);
0007 %           f=f.*conj(f);           % convert to power spectrum
0008 %           x=estnoisem(f,ninc/fs); % estimate the noise power spectrum
0009 %
0010 % Inputs:
0011 %   yf      input power spectra (one row per frame)
0012 %   tz      frame increment in seconds
0013 %           Alternatively, the input state from a previous call (see below)
0014 %   pp      algorithm parameters [optional]
0015 %
0016 % Outputs:
0017 %   x       estimated noise power spectra (one row per frame)
0018 %   zo      output state
0019 %   xs      estimated std error of x (one row per frame)
0020 %           xs seems often to be an underestimate by a factor of 2 or 3
0021 %
0022 % The algorithm parameters are defined in reference [1] from which equation
0023 % numbers are given in parentheses. They are as follows:
0024 %
0025 %        pp.taca      % (11): smoothing time constant for alpha_c [0.0449 seconds]
0026 %        pp.tamax     % (3): max smoothing time constant [0.392 seconds]
0027 %        pp.taminh    % (3): min smoothing time constant (upper limit) [0.0133 seconds]
0028 %        pp.tpfall    % (12): time constant for P to fall [0.064 seconds]
0029 %        pp.tbmax     % (20): max smoothing time constant [0.0717 seconds]
0030 %        pp.qeqmin    % (23): minimum value of Qeq [2]
0031 %        pp.qeqmax    % max value of Qeq per frame [14]
0032 %        pp.av        % (23)+13 lines: fudge factor for bc calculation  [2.12]
0033 %        pp.td        % time to take minimum over [1.536 seconds]
0034 %        pp.nu        % number of subwindows to use [3]
0035 %        pp.qith      % Q-inverse thresholds to select maximum noise slope [0.03 0.05 0.06 Inf ]
0036 %        pp.nsmdb     % corresponding noise slope thresholds in dB/second   [47 31.4 15.7 4.1]
0037 %
0038 % Example use:      y=enframe(s,w,ni);                  % divide speech signal s(n) into
0039 %                                                       % overlapping frames using window w(n)
0040 %                   yf=rfft(y,nf,2);                    % take fourier transform
0041 %                   dp=estnoisem(yf.*conj(yf),tinc);    % estimate the noise
0042 %
0043 % If convenient, you can call estnoisem in chunks of arbitrary size. Thus the following are equivalent:
0044 %
0045 %                   (a) dp=estnoisem(yp(1:300),tinc);
0046 %
0047 %                   (b) [dp(1:100),z]=estnoisem(yp(1:100),tinc);
0048 %                       [dp(101:200),z]=estnoisem(yp(101:200),z);
0049 %                       [dp(201:300),z]=estnoisem(yp(201:300),z);
0050
0051
0052 % This is intended to be a precise implementation of [1] with Table III
0053 % replaced by the updated table 5 from [2]. The only deliberate algorithm
0054 % change is the introduction of a minimum value for 1/Qeq in equation (23).
0055 % This change only affects the first few frames and improves the
0056 % convergence of the algorithm. A minor improveemnt was reported in [3] but
0057 % this has not yet been included.
0058 %
0059 % Refs:
0060 %    [1] Rainer Martin.
0061 %        Noise power spectral density estimation based on optimal smoothing and minimum statistics.
0062 %        IEEE Trans. Speech and Audio Processing, 9(5):504-512, July 2001.
0063 %    [2] Rainer Martin.
0064 %        Bias compensation methods for minimum statistics noise power spectral density estimation
0065 %        Signal Processing, 2006, 86, 1215-1229
0066 %    [3] Dirk Mauler and Rainer Martin
0067 %        Noise power spectral density estimation on highly correlated data
0068 %        Proc IWAENC, 2006
0069
0070 %       Copyright (C) Mike Brookes 2008
0071 %      Version: \$Id: estnoisem.m 1718 2012-03-31 16:40:41Z dmb \$
0072 %
0073 %   VOICEBOX is a MATLAB toolbox for speech processing.
0075 %
0076 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0077 %   This program is free software; you can redistribute it and/or modify
0079 %   the Free Software Foundation; either version 2 of the License, or
0080 %   (at your option) any later version.
0081 %
0082 %   This program is distributed in the hope that it will be useful,
0083 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0084 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0085 %   GNU General Public License for more details.
0086 %
0087 %   You can obtain a copy of the GNU General Public License from
0088 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0089 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0090 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0091
0092 [nr,nrf]=size(yf);          % number of frames and freq bins
0093 x=zeros(nr,nrf);            % initialize output arrays
0094 xs=zeros(nr,nrf);           % will hold std error in the future
0095 if isempty(yf) && isstruct(tz)             % no real data
0096     zo=tz;              % just keep the same state
0097 else
0098     if isstruct(tz)       % take parameters from a previous call
0099         nrcum=tz.nrcum;
0100         p=tz.p;          % smoothed power spectrum
0101         ac=tz.ac;               % correction factor (9)
0102         sn2=tz.sn2;              % estimated noise power
0103         pb=tz.pb;               % smoothed noisy speech power (20)
0104         pb2=tz.pb2;
0105         pminu=tz.pminu;
0106         actmin=tz.actmin;   % Running minimum estimate
0107         actminsub=tz.actminsub;           % sub-window minimum estimate
0108         subwc=tz.subwc;                   % force a buffer switch on first loop
0109         actbuf=tz.actbuf;  % buffer to store subwindow minima
0110         ibuf=tz.ibuf;
0111         lminflag=tz.lminflag;      % flag to remember local minimum
0112         tinc=tz.tinc;     % frame increment
0113         qq=tz.qq;         % parameter structure
0114     else
0115         tinc = tz;          % second argument is frame increment
0116         nrcum=0;            % no frames so far
0117         % default algorithm constants
0118
0119         qq.taca=0.0449;    % smoothing time constant for alpha_c = -tinc/log(0.7) in equ (11)
0120         qq.tamax=0.392;    % max smoothing time constant in (3) = -tinc/log(0.96)
0121         qq.taminh=0.0133;    % min smoothing time constant (upper limit) in (3) = -tinc/log(0.3)
0122         qq.tpfall=0.064;   % time constant for P to fall (12)
0123         qq.tbmax=0.0717;   % max smoothing time constant in (20) = -tinc/log(0.8)
0124         qq.qeqmin=2;       % minimum value of Qeq (23)
0125         qq.qeqmax=14;      % max value of Qeq per frame
0126         qq.av=2.12;             % fudge factor for bc calculation (23 + 13 lines)
0127         qq.td=1.536;       % time to take minimum over
0128         qq.nu=8;           % number of subwindows
0129         qq.qith=[0.03 0.05 0.06 Inf]; % noise slope thresholds in dB/s
0130         qq.nsmdb=[47 31.4 15.7 4.1];
0131
0132         if nargin>=3 && ~isempty(pp)
0133             qqn=fieldnames(qq);
0134             for i=1:length(qqn)
0135                 if isfield(pp,qqn{i})
0136                     qq.(qqn{i})=pp.(qqn{i});
0137                 end
0138             end
0139         end
0140     end
0141
0142     % unpack parameter structure
0143
0144     taca=qq.taca;    % smoothing time constant for alpha_c = -tinc/log(0.7) in equ (11)
0145     tamax=qq.tamax;    % max smoothing time constant in (3) = -tinc/log(0.96)
0146     taminh=qq.taminh;    % min smoothing time constant (upper limit) in (3) = -tinc/log(0.3)
0147     tpfall=qq.tpfall;   % time constant for P to fall (12)
0148     tbmax=qq.tbmax;   % max smoothing time constant in (20) = -tinc/log(0.8)
0149     qeqmin=qq.qeqmin;       % minimum value of Qeq (23)
0150     qeqmax=qq.qeqmax;      % max value of Qeq per frame
0151     av=qq.av;             % fudge factor for bc calculation (23 + 13 lines)
0152     td=qq.td;       % time to take minimum over
0153     nu=qq.nu;           % number of subwindows
0154     qith=qq.qith; % noise slope thresholds in dB/s
0155     nsmdb=qq.nsmdb;   % maximum permitted +ve noise slope in dB/s
0156
0157     % derived algorithm constants
0158
0159     aca=exp(-tinc/taca); % smoothing constant for alpha_c in equ (11) = 0.7
0160     acmax=aca;          % min value of alpha_c = 0.7 in equ (11) also = 0.7
0161     amax=exp(-tinc/tamax); % max smoothing constant in (3) = 0.96
0162     aminh=exp(-tinc/taminh); % min smoothing constant (upper limit) in (3) = 0.3
0163     bmax=exp(-tinc/tbmax); % max smoothing constant in (20) = 0.8
0164     snrexp = -tinc/tpfall;
0165     nv=round(td/(tinc*nu));    % length of each subwindow in frames
0166     if nv<4            % algorithm doesn't work for miniscule frames
0167         nv=4;
0168         nu=max(round(td/(tinc*nv)),1);
0169     end
0170     nd=nu*nv;           % length of total window in frames
0171     [md,hd]=mhvals(nd); % calculate the constants M(D) and H(D) from Table III
0172     [mv,hv]=mhvals(nv); % calculate the constants M(D) and H(D) from Table III
0173     nsms=10.^(nsmdb*nv*tinc/10);  % [8 4 2 1.2] in paper
0174     qeqimax=1/qeqmin;  % maximum value of Qeq inverse (23)
0175     qeqimin=1/qeqmax; % minumum value of Qeq per frame inverse
0176
0177     if isempty(yf)      % provide dummy initialization
0178         ac=1;               % correction factor (9)
0179         subwc=nv;                   % force a buffer switch on first loop
0180         ibuf=0;
0181         p=x;          % smoothed power spectrum
0182         sn2=p;              % estimated noise power
0183         pb=p;               % smoothed noisy speech power (20)
0184         pb2=pb.^2;
0185         pminu=p;
0186         actmin=repmat(Inf,1,nrf);   % Running minimum estimate
0187         actminsub=actmin;           % sub-window minimum estimate
0188         actbuf=repmat(Inf,nu,nrf);  % buffer to store subwindow minima
0189         lminflag=zeros(1,nrf);      % flag to remember local minimum
0190     else
0191
0192         if ~nrcum       % initialize values for first frame
0193             p=yf(1,:);          % smoothed power spectrum
0194             ac=1;               % correction factor (9)
0195             sn2=p;              % estimated noise power
0196             pb=p;               % smoothed noisy speech power (20)
0197             pb2=pb.^2;
0198             pminu=p;
0199             actmin=repmat(Inf,1,nrf);   % Running minimum estimate
0200             actminsub=actmin;           % sub-window minimum estimate
0201             subwc=nv;                   % force a buffer switch on first loop
0202             actbuf=repmat(Inf,nu,nrf);  % buffer to store subwindow minima
0203             ibuf=0;
0204             lminflag=zeros(1,nrf);      % flag to remember local minimum
0205         end
0206
0207         % loop for each frame
0208
0209         for t=1:nr              % we use t instead of lambda in the paper
0210             yft=yf(t,:);        % noise speech power spectrum
0211             acb=(1+(sum(p)./sum(yft)-1).^2).^(-1);  % alpha_c-bar(t)  (9)
0212             ac=aca*ac+(1-aca)*max(acb,acmax);       % alpha_c(t)  (10)
0213             ah=amax*ac.*(1+(p./sn2-1).^2).^(-1);    % alpha_hat: smoothing factor per frequency (11)
0214             snr=sum(p)/sum(sn2);
0215             ah=max(ah,min(aminh,snr^snrexp));       % lower limit for alpha_hat (12)
0216
0217             p=ah.*p+(1-ah).*yft;            % smoothed noisy speech power (3)
0218             b=min(ah.^2,bmax);              % smoothing constant for estimating periodogram variance (22 + 2 lines)
0219             pb=b.*pb + (1-b).*p;            % smoothed periodogram (20)
0220             pb2=b.*pb2 + (1-b).*p.^2;         % smoothed periodogram squared (21)
0221
0222             qeqi=max(min((pb2-pb.^2)./(2*sn2.^2),qeqimax),qeqimin/(t+nrcum));   % Qeq inverse (23)
0223             qiav=sum(qeqi)/nrf;             % Average over all frequencies (23+12 lines) (ignore non-duplication of DC and nyquist terms)
0224             bc=1+av*sqrt(qiav);             % bias correction factor (23+11 lines)
0225             bmind=1+2*(nd-1)*(1-md)./(qeqi.^(-1)-2*md);      % we use the simplified form (17) instead of (15)
0226             bminv=1+2*(nv-1)*(1-mv)./(qeqi.^(-1)-2*mv);      % same expression but for sub windows
0227             kmod=bc*p.*bmind<actmin;        % Frequency mask for new minimum
0228             if any(kmod)
0229                 actmin(kmod)=bc*p(kmod).*bmind(kmod);
0230                 actminsub(kmod)=bc*p(kmod).*bminv(kmod);
0231             end
0232             if subwc>1 && subwc<nv              % middle of buffer - allow a local minimum
0233                 lminflag=lminflag | kmod;        % potential local minimum frequency bins
0234                 pminu=min(actminsub,pminu);
0235                 sn2=pminu;
0236             else
0237                 if subwc>=nv                    % end of buffer - do a buffer switch
0238                     ibuf=1+rem(ibuf,nu);         % increment actbuf storage pointer
0239                     actbuf(ibuf,:)=actmin;        % save sub-window minimum
0240                     pminu=min(actbuf,[],1);
0241                     i=find(qiav<qith);
0242                     nsm=nsms(i(1));              % noise slope max
0243                     lmin=lminflag & ~kmod & actminsub<nsm*pminu & actminsub>pminu;
0244                     if any(lmin)
0245                         pminu(lmin)=actminsub(lmin);
0246                         actbuf(:,lmin)=repmat(pminu(lmin),nu,1);
0247                     end
0248                     lminflag(:)=0;
0249                     actmin(:)=Inf;
0250                     subwc=0;
0251                 end
0252             end
0253             subwc=subwc+1;
0254             x(t,:)=sn2;
0255             qisq=sqrt(qeqi);
0256             % empirical formula for standard error based on Fig 15 of [2]
0257             xs(t,:)=sn2.*sqrt(0.266*(nd+100*qisq).*qisq/(1+0.005*nd+6/nd)./(0.5*qeqi.^(-1)+nd-1));
0258         end
0259     end
0260     if nargout>1    % we need to store the state for next time
0261         zo.nrcum=nrcum+nr;      % number of frames so far
0262         zo.p=p;          % smoothed power spectrum
0263         zo.ac=ac;               % correction factor (9)
0264         zo.sn2=sn2;              % estimated noise power
0265         zo.pb=pb;               % smoothed noisy speech power (20)
0266         zo.pb2=pb2;
0267         zo.pminu=pminu;
0268         zo.actmin=actmin;   % Running minimum estimate
0269         zo.actminsub=actminsub;           % sub-window minimum estimate
0270         zo.subwc=subwc;                   % force a buffer switch on first loop
0271         zo.actbuf=actbuf;  % buffer to store subwindow minima
0272         zo.ibuf=ibuf;
0273         zo.lminflag=lminflag;      % flag to remember local minimum
0274         zo.tinc=tinc;     % must be the last one
0275         zo.qq=qq;
0276     end
0277     if ~nargout
0278         clf;
0279         subplot(212);
0280         plot((1:nr)*tinc,10*log10([sum(yf,2) sum(x,2)]))
0281         ylabel('Frame Energy (dB)');
0282         xlabel(sprintf('Time (s)   [%d ms frame incr]',round(tinc*1000)));
0283         axisenlarge([-1 -1.05]);
0284         legend('input','noise','Location','Best');
0285         subplot(211);
0286         plot(1:nrf,10*log10([sum(yf,1)'/nr sum(x,1)'/nr]))
0287         ylabel('Power (dB)');
0288         xlabel('Frequency bin');
0289         axisenlarge([-1 -1.05]);
0290         legend('input','noise','Location','Best');
0291     end
0292 end
0293
0294 function [m,h,d]=mhvals(d)
0295 % Values are taken from Table 5 in [2]
0296 %[2] R. Martin,"Bias compensation methods for minimum statistics noise power
0297 %               spectral density estimation", Signal Processing Vol 86, pp1215-1229, 2006.
0298
0299 % approx: plot(d.^(-0.5),[m 1-d.^(-0.5)],'x-'), plot(d.^0.5,h,'x-')
0300 persistent dmh
0301 if isempty(dmh)
0302     dmh=[
0303         1   0       0;
0304         2   0.26    0.15;
0305         5   0.48    0.48;
0306         8   0.58    0.78;
0307         10  0.61    0.98;
0308         15  0.668   1.55;
0309         20  0.705   2;
0310         30  0.762   2.3;
0311         40  0.8     2.52;
0312         60  0.841   3.1;
0313         80  0.865   3.38;
0314         120 0.89    4.15;
0315         140 0.9     4.35;
0316         160 0.91    4.25;
0317         180 0.92    3.9;
0318         220 0.93    4.1;
0319         260 0.935   4.7;
0320         300 0.94    5];
0321 end
0322
0323 if nargin>=1
0324     i=find(d<=dmh(:,1));
0325     if isempty(i)
0326         i=size(dmh,1);
0327         j=i;
0328     else
0329         i=i(1);
0330         j=i-1;
0331     end
0332     if d==dmh(i,1)
0333         m=dmh(i,2);
0334         h=dmh(i,3);
0335     else
0336         qj=sqrt(dmh(i-1,1));    % interpolate using sqrt(d)
0337         qi=sqrt(dmh(i,1));
0338         q=sqrt(d);
0339         h=dmh(i,3)+(q-qi)*(dmh(j,3)-dmh(i,3))/(qj-qi);
0340         m=dmh(i,2)+(qi*qj/q-qj)*(dmh(j,2)-dmh(i,2))/(qi-qj);
0341     end
0342 else
0343     d=dmh(:,1);
0344     m=dmh(:,2);
0345     h=dmh(:,3);
0346 end```

Generated on Mon 06-Aug-2018 14:48:32 by m2html © 2003