Home > voicebox > estnoisem.m

estnoisem

PURPOSE ^

ESTNOISEM - estimate noise spectrum using minimum statistics

SYNOPSIS ^

function [x,zo,xs]=estnoisem(yf,tz,pp)

DESCRIPTION ^

ESTNOISEM - estimate noise spectrum using minimum statistics
 Inputs:
   yf      input power spectra (one row per frame)
   tz      frame increment in seconds
           Alternatively, the input state from a previous call (see below)
   pp      algorithm parameters [optional]

 Outputs:
   x       estimated noise power spectra (one row per frame)
   zo      output state
   xs      estimated std error of x (one row per frame)
           xs seems often to be an underestimate by a factor of 2 or 3

 The algorithm parameters are defined in reference [1] from which equation
 numbers are given in parentheses. They are as follows:

        pp.taca      % (11): smoothing time constant for alpha_c [0.0449 seconds]
        pp.tamax     % (3): max smoothing time constant [0.392 seconds]
        pp.taminh    % (3): min smoothing time constant (upper limit) [0.0133 seconds]
        pp.tpfall    % (12): time constant for P to fall [0.064 seconds]
        pp.tbmax     % (20): max smoothing time constant [0.0717 seconds]
        pp.qeqmin    % (23): minimum value of Qeq [2]
        pp.qeqmax    % max value of Qeq per frame [14]
        pp.av        % (23)+13 lines: fudge factor for bc calculation  [2.12]
        pp.td        % time to take minimum over [1.536 seconds]
        pp.nu        % number of subwindows to use [3]
        pp.qith      % Q-inverse thresholds to select maximum noise slope [0.03 0.05 0.06 Inf ]
        pp.nsmdb     % corresponding noise slope thresholds in dB/second   [47 31.4 15.7 4.1]

 Example use:      y=enframe(s,w,ni);                  % divide speech signal s(n) into
                                                       % overlapping frames using window w(n)
                   yf=rfft(y,nf,2);                    % take fourier transform
                   dp=estnoisem(yf.*conj(yf),tinc);    % estimate the noise

 If convenient, you can call estnoisem in chunks of arbitrary size. Thus the following are equivalent:

                   (a) dp=estnoisem(yp(1:300),tinc);

                   (b) [dp(1:100),z]=estnoisem(yp(1:100),tinc);
                       [dp(101:200),z]=estnoisem(yp(101:200),z);
                       [dp(201:300),z]=estnoisem(yp(201:300),z);

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [x,zo,xs]=estnoisem(yf,tz,pp)
0002 %ESTNOISEM - estimate noise spectrum using minimum statistics
0003 % Inputs:
0004 %   yf      input power spectra (one row per frame)
0005 %   tz      frame increment in seconds
0006 %           Alternatively, the input state from a previous call (see below)
0007 %   pp      algorithm parameters [optional]
0008 %
0009 % Outputs:
0010 %   x       estimated noise power spectra (one row per frame)
0011 %   zo      output state
0012 %   xs      estimated std error of x (one row per frame)
0013 %           xs seems often to be an underestimate by a factor of 2 or 3
0014 %
0015 % The algorithm parameters are defined in reference [1] from which equation
0016 % numbers are given in parentheses. They are as follows:
0017 %
0018 %        pp.taca      % (11): smoothing time constant for alpha_c [0.0449 seconds]
0019 %        pp.tamax     % (3): max smoothing time constant [0.392 seconds]
0020 %        pp.taminh    % (3): min smoothing time constant (upper limit) [0.0133 seconds]
0021 %        pp.tpfall    % (12): time constant for P to fall [0.064 seconds]
0022 %        pp.tbmax     % (20): max smoothing time constant [0.0717 seconds]
0023 %        pp.qeqmin    % (23): minimum value of Qeq [2]
0024 %        pp.qeqmax    % max value of Qeq per frame [14]
0025 %        pp.av        % (23)+13 lines: fudge factor for bc calculation  [2.12]
0026 %        pp.td        % time to take minimum over [1.536 seconds]
0027 %        pp.nu        % number of subwindows to use [3]
0028 %        pp.qith      % Q-inverse thresholds to select maximum noise slope [0.03 0.05 0.06 Inf ]
0029 %        pp.nsmdb     % corresponding noise slope thresholds in dB/second   [47 31.4 15.7 4.1]
0030 %
0031 % Example use:      y=enframe(s,w,ni);                  % divide speech signal s(n) into
0032 %                                                       % overlapping frames using window w(n)
0033 %                   yf=rfft(y,nf,2);                    % take fourier transform
0034 %                   dp=estnoisem(yf.*conj(yf),tinc);    % estimate the noise
0035 %
0036 % If convenient, you can call estnoisem in chunks of arbitrary size. Thus the following are equivalent:
0037 %
0038 %                   (a) dp=estnoisem(yp(1:300),tinc);
0039 %
0040 %                   (b) [dp(1:100),z]=estnoisem(yp(1:100),tinc);
0041 %                       [dp(101:200),z]=estnoisem(yp(101:200),z);
0042 %                       [dp(201:300),z]=estnoisem(yp(201:300),z);
0043 
0044 
0045 % This is intended to be a precise implementation of [1] with Table III
0046 % replaced by the updated table 5 from [2]. The only deliberate algorithm
0047 % change is the introduction of a minimum value for 1/Qeq in equation (23).
0048 % This change only affects the first few frames and improves the
0049 % convergence of the algorithm. A minor improveemnt was reported in [3] but
0050 % this has not yet been included.
0051 %
0052 % Refs:
0053 %    [1] Rainer Martin.
0054 %        Noise power spectral density estimation based on optimal smoothing and minimum statistics.
0055 %        IEEE Trans. Speech and Audio Processing, 9(5):504-512, July 2001.
0056 %    [2] Rainer Martin.
0057 %        Bias compensation methods for minimum statistics noise power spectral density estimation
0058 %        Signal Processing, 2006, 86, 1215-1229
0059 %    [3] Dirk Mauler and Rainer Martin
0060 %        Noise power spectral density estimation on highly correlated data
0061 %        Proc IWAENC, 2006
0062 
0063 %       Copyright (C) Mike Brookes 2008
0064 %      Version: $Id: estnoisem.m 713 2011-10-16 14:45:43Z dmb $
0065 %
0066 %   VOICEBOX is a MATLAB toolbox for speech processing.
0067 %   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
0068 %
0069 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0070 %   This program is free software; you can redistribute it and/or modify
0071 %   it under the terms of the GNU General Public License as published by
0072 %   the Free Software Foundation; either version 2 of the License, or
0073 %   (at your option) any later version.
0074 %
0075 %   This program is distributed in the hope that it will be useful,
0076 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0077 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0078 %   GNU General Public License for more details.
0079 %
0080 %   You can obtain a copy of the GNU General Public License from
0081 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0082 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0083 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0084 
0085 [nr,nrf]=size(yf);          % number of frames and freq bins
0086 x=zeros(nr,nrf);            % initialize output arrays
0087 xs=zeros(nr,nrf);           % will hold std error in the future
0088 if isempty(yf) && isstruct(tz)             % no real data
0089     zo=tz;              % just keep the same state
0090 else
0091     if isstruct(tz)       % take parameters from a previous call
0092         nrcum=tz.nrcum;
0093         p=tz.p;          % smoothed power spectrum
0094         ac=tz.ac;               % correction factor (9)
0095         sn2=tz.sn2;              % estimated noise power
0096         pb=tz.pb;               % smoothed noisy speech power (20)
0097         pb2=tz.pb2;
0098         pminu=tz.pminu;
0099         actmin=tz.actmin;   % Running minimum estimate
0100         actminsub=tz.actminsub;           % sub-window minimum estimate
0101         subwc=tz.subwc;                   % force a buffer switch on first loop
0102         actbuf=tz.actbuf;  % buffer to store subwindow minima
0103         ibuf=tz.ibuf;
0104         lminflag=tz.lminflag;      % flag to remember local minimum
0105         tinc=tz.tinc;     % frame increment
0106         qq=tz.qq;         % parameter structure
0107     else
0108         tinc = tz;          % second argument is frame increment
0109         nrcum=0;            % no frames so far
0110         % default algorithm constants
0111 
0112         qq.taca=0.0449;    % smoothing time constant for alpha_c = -tinc/log(0.7) in equ (11)
0113         qq.tamax=0.392;    % max smoothing time constant in (3) = -tinc/log(0.96)
0114         qq.taminh=0.0133;    % min smoothing time constant (upper limit) in (3) = -tinc/log(0.3)
0115         qq.tpfall=0.064;   % time constant for P to fall (12)
0116         qq.tbmax=0.0717;   % max smoothing time constant in (20) = -tinc/log(0.8)
0117         qq.qeqmin=2;       % minimum value of Qeq (23)
0118         qq.qeqmax=14;      % max value of Qeq per frame
0119         qq.av=2.12;             % fudge factor for bc calculation (23 + 13 lines)
0120         qq.td=1.536;       % time to take minimum over
0121         qq.nu=8;           % number of subwindows
0122         qq.qith=[0.03 0.05 0.06 Inf]; % noise slope thresholds in dB/s
0123         qq.nsmdb=[47 31.4 15.7 4.1];
0124 
0125         if nargin>=3 && ~isempty(pp)
0126             qqn=fieldnames(qq);
0127             for i=1:length(qqn)
0128                 if isfield(pp,qqn{i})
0129                     qq.(qqn{i})=pp.(qqn{i});
0130                 end
0131             end
0132         end
0133     end
0134 
0135     % unpack parameter structure
0136 
0137     taca=qq.taca;    % smoothing time constant for alpha_c = -tinc/log(0.7) in equ (11)
0138     tamax=qq.tamax;    % max smoothing time constant in (3) = -tinc/log(0.96)
0139     taminh=qq.taminh;    % min smoothing time constant (upper limit) in (3) = -tinc/log(0.3)
0140     tpfall=qq.tpfall;   % time constant for P to fall (12)
0141     tbmax=qq.tbmax;   % max smoothing time constant in (20) = -tinc/log(0.8)
0142     qeqmin=qq.qeqmin;       % minimum value of Qeq (23)
0143     qeqmax=qq.qeqmax;      % max value of Qeq per frame
0144     av=qq.av;             % fudge factor for bc calculation (23 + 13 lines)
0145     td=qq.td;       % time to take minimum over
0146     nu=qq.nu;           % number of subwindows
0147     qith=qq.qith; % noise slope thresholds in dB/s
0148     nsmdb=qq.nsmdb;   % maximum permitted +ve noise slope in dB/s
0149 
0150     % derived algorithm constants
0151 
0152     aca=exp(-tinc/taca); % smoothing constant for alpha_c in equ (11) = 0.7
0153     acmax=aca;          % min value of alpha_c = 0.7 in equ (11) also = 0.7
0154     amax=exp(-tinc/tamax); % max smoothing constant in (3) = 0.96
0155     aminh=exp(-tinc/taminh); % min smoothing constant (upper limit) in (3) = 0.3
0156     bmax=exp(-tinc/tbmax); % max smoothing constant in (20) = 0.8
0157     snrexp = -tinc/tpfall;
0158     nv=round(td/(tinc*nu));    % length of each subwindow in frames
0159     if nv<4            % algorithm doesn't work for miniscule frames
0160         nv=4;
0161         nu=max(round(td/(tinc*nv)),1);
0162     end
0163     nd=nu*nv;           % length of total window in frames
0164     [md,hd]=mhvals(nd); % calculate the constants M(D) and H(D) from Table III
0165     [mv,hv]=mhvals(nv); % calculate the constants M(D) and H(D) from Table III
0166     nsms=10.^(nsmdb*nv*tinc/10);  % [8 4 2 1.2] in paper
0167     qeqimax=1/qeqmin;  % maximum value of Qeq inverse (23)
0168     qeqimin=1/qeqmax; % minumum value of Qeq per frame inverse
0169 
0170     if isempty(yf)      % provide dummy initialization
0171         ac=1;               % correction factor (9)
0172         subwc=nv;                   % force a buffer switch on first loop
0173         ibuf=0;
0174         p=x;          % smoothed power spectrum
0175         sn2=p;              % estimated noise power
0176         pb=p;               % smoothed noisy speech power (20)
0177         pb2=pb.^2;
0178         pminu=p;
0179         actmin=repmat(Inf,1,nrf);   % Running minimum estimate
0180         actminsub=actmin;           % sub-window minimum estimate
0181         actbuf=repmat(Inf,nu,nrf);  % buffer to store subwindow minima
0182         lminflag=zeros(1,nrf);      % flag to remember local minimum
0183     else
0184 
0185         if ~nrcum       % initialize values for first frame
0186             p=yf(1,:);          % smoothed power spectrum
0187             ac=1;               % correction factor (9)
0188             sn2=p;              % estimated noise power
0189             pb=p;               % smoothed noisy speech power (20)
0190             pb2=pb.^2;
0191             pminu=p;
0192             actmin=repmat(Inf,1,nrf);   % Running minimum estimate
0193             actminsub=actmin;           % sub-window minimum estimate
0194             subwc=nv;                   % force a buffer switch on first loop
0195             actbuf=repmat(Inf,nu,nrf);  % buffer to store subwindow minima
0196             ibuf=0;
0197             lminflag=zeros(1,nrf);      % flag to remember local minimum
0198         end
0199 
0200         % loop for each frame
0201 
0202         for t=1:nr              % we use t instead of lambda in the paper
0203             yft=yf(t,:);        % noise speech power spectrum
0204             acb=(1+(sum(p)./sum(yft)-1).^2).^(-1);  % alpha_c-bar(t)  (9)
0205             ac=aca*ac+(1-aca)*max(acb,acmax);       % alpha_c(t)  (10)
0206             ah=amax*ac.*(1+(p./sn2-1).^2).^(-1);    % alpha_hat: smoothing factor per frequency (11)
0207             snr=sum(p)/sum(sn2);
0208             ah=max(ah,min(aminh,snr^snrexp));       % lower limit for alpha_hat (12)
0209 
0210             p=ah.*p+(1-ah).*yft;            % smoothed noisy speech power (3)
0211             b=min(ah.^2,bmax);              % smoothing constant for estimating periodogram variance (22 + 2 lines)
0212             pb=b.*pb + (1-b).*p;            % smoothed periodogram (20)
0213             pb2=b.*pb2 + (1-b).*p.^2;         % smoothed periodogram squared (21)
0214 
0215             qeqi=max(min((pb2-pb.^2)./(2*sn2.^2),qeqimax),qeqimin/(t+nrcum));   % Qeq inverse (23)
0216             qiav=sum(qeqi)/nrf;             % Average over all frequencies (23+12 lines) (ignore non-duplication of DC and nyquist terms)
0217             bc=1+av*sqrt(qiav);             % bias correction factor (23+11 lines)
0218             bmind=1+2*(nd-1)*(1-md)./(qeqi.^(-1)-2*md);      % we use the simplified form (17) instead of (15)
0219             bminv=1+2*(nv-1)*(1-mv)./(qeqi.^(-1)-2*mv);      % same expression but for sub windows
0220             kmod=bc*p.*bmind<actmin;        % Frequency mask for new minimum
0221             if any(kmod)
0222                 actmin(kmod)=bc*p(kmod).*bmind(kmod);
0223                 actminsub(kmod)=bc*p(kmod).*bminv(kmod);
0224             end
0225             if subwc>1 && subwc<nv              % middle of buffer - allow a local minimum
0226                 lminflag=lminflag | kmod;        % potential local minimum frequency bins
0227                 pminu=min(actminsub,pminu);
0228                 sn2=pminu;
0229             else
0230                 if subwc>=nv                    % end of buffer - do a buffer switch
0231                     ibuf=1+rem(ibuf,nu);         % increment actbuf storage pointer
0232                     actbuf(ibuf,:)=actmin;        % save sub-window minimum
0233                     pminu=min(actbuf,[],1);
0234                     i=find(qiav<qith);
0235                     nsm=nsms(i(1));              % noise slope max
0236                     lmin=lminflag & ~kmod & actminsub<nsm*pminu & actminsub>pminu;
0237                     if any(lmin)
0238                         pminu(lmin)=actminsub(lmin);
0239                         actbuf(:,lmin)=repmat(pminu(lmin),nu,1);
0240                     end
0241                     lminflag(:)=0;
0242                     actmin(:)=Inf;
0243                     subwc=0;
0244                 end
0245             end
0246             subwc=subwc+1;
0247             x(t,:)=sn2;
0248             qisq=sqrt(qeqi);
0249             % empirical formula for standard error based on Fig 15 of [2]
0250             xs(t,:)=sn2.*sqrt(0.266*(nd+100*qisq).*qisq/(1+0.005*nd+6/nd)./(0.5*qeqi.^(-1)+nd-1));
0251         end
0252     end
0253     if nargout>1    % we need to store the state for next time
0254         zo.nrcum=nrcum+nr;      % number of frames so far
0255         zo.p=p;          % smoothed power spectrum
0256         zo.ac=ac;               % correction factor (9)
0257         zo.sn2=sn2;              % estimated noise power
0258         zo.pb=pb;               % smoothed noisy speech power (20)
0259         zo.pb2=pb2;
0260         zo.pminu=pminu;
0261         zo.actmin=actmin;   % Running minimum estimate
0262         zo.actminsub=actminsub;           % sub-window minimum estimate
0263         zo.subwc=subwc;                   % force a buffer switch on first loop
0264         zo.actbuf=actbuf;  % buffer to store subwindow minima
0265         zo.ibuf=ibuf;
0266         zo.lminflag=lminflag;      % flag to remember local minimum
0267         zo.tinc=tinc;     % must be the last one
0268         zo.qq=qq;
0269     end
0270     if ~nargout
0271         plot((1:nr),10*log10([sum(x,2)/nrf sum(yf,2)/nrf]))
0272         legend('noise','input');
0273         ylabel('Power (dB)');
0274         xlabel(sprintf('Time (%d ms frames)',round(tinc*1000)));
0275     end
0276 end
0277 
0278 function [m,h,d]=mhvals(d)
0279 % Values are taken from Table 5 in [2]
0280 %[2] R. Martin,"Bias compensation methods for minimum statistics noise power
0281 %               spectral density estimation", Signal Processing Vol 86, pp1215-1229, 2006.
0282 
0283 % approx: plot(d.^(-0.5),[m 1-d.^(-0.5)],'x-'), plot(d.^0.5,h,'x-')
0284 persistent dmh
0285 if isempty(dmh)
0286     dmh=[
0287         1   0       0;
0288         2   0.26    0.15;
0289         5   0.48    0.48;
0290         8   0.58    0.78;
0291         10  0.61    0.98;
0292         15  0.668   1.55;
0293         20  0.705   2;
0294         30  0.762   2.3;
0295         40  0.8     2.52;
0296         60  0.841   3.1;
0297         80  0.865   3.38;
0298         120 0.89    4.15;
0299         140 0.9     4.35;
0300         160 0.91    4.25;
0301         180 0.92    3.9;
0302         220 0.93    4.1;
0303         260 0.935   4.7;
0304         300 0.94    5];
0305 end
0306 
0307 if nargin>=1
0308     i=find(d<=dmh(:,1));
0309     if isempty(i)
0310         i=size(dmh,1);
0311         j=i;
0312     else
0313         i=i(1);
0314         j=i-1;
0315     end
0316     if d==dmh(i,1)
0317         m=dmh(i,2);
0318         h=dmh(i,3);
0319     else
0320         qj=sqrt(dmh(i-1,1));    % interpolate using sqrt(d)
0321         qi=sqrt(dmh(i,1));
0322         q=sqrt(d);
0323         h=dmh(i,3)+(q-qi)*(dmh(j,3)-dmh(i,3))/(qj-qi);
0324         m=dmh(i,2)+(qi*qj/q-qj)*(dmh(j,2)-dmh(i,2))/(qi-qj);
0325     end
0326 else
0327     d=dmh(:,1);
0328     m=dmh(:,2);
0329     h=dmh(:,3);
0330 end

Generated on Thu 02-Feb-2012 09:15:04 by m2html © 2003