


ESTNOISEM - estimate noise spectrum using minimum statistics
Inputs:
yf input power spectra (one row per frame)
tz frame increment in seconds
Alternatively, the input state from a previous call (see below)
pp algorithm parameters [optional]
Outputs:
x estimated noise power spectra (one row per frame)
zo output state
xs estimated std error of x (one row per frame)
xs seems often to be an underestimate by a factor of 2 or 3
The algorithm parameters are defined in reference [1] from which equation
numbers are given in parentheses. They are as follows:
pp.taca % (11): smoothing time constant for alpha_c [0.0449 seconds]
pp.tamax % (3): max smoothing time constant [0.392 seconds]
pp.taminh % (3): min smoothing time constant (upper limit) [0.0133 seconds]
pp.tpfall % (12): time constant for P to fall [0.064 seconds]
pp.tbmax % (20): max smoothing time constant [0.0717 seconds]
pp.qeqmin % (23): minimum value of Qeq [2]
pp.qeqmax % max value of Qeq per frame [14]
pp.av % (23)+13 lines: fudge factor for bc calculation [2.12]
pp.td % time to take minimum over [1.536 seconds]
pp.nu % number of subwindows to use [3]
pp.qith % Q-inverse thresholds to select maximum noise slope [0.03 0.05 0.06 Inf ]
pp.nsmdb % corresponding noise slope thresholds in dB/second [47 31.4 15.7 4.1]
Example use: y=enframe(s,w,ni); % divide speech signal s(n) into
% overlapping frames using window w(n)
yf=rfft(y,nf,2); % take fourier transform
dp=estnoisem(yf.*conj(yf),tinc); % estimate the noise
If convenient, you can call estnoisem in chunks of arbitrary size. Thus the following are equivalent:
(a) dp=estnoisem(yp(1:300),tinc);
(b) [dp(1:100),z]=estnoisem(yp(1:100),tinc);
[dp(101:200),z]=estnoisem(yp(101:200),z);
[dp(201:300),z]=estnoisem(yp(201:300),z);


0001 function [x,zo,xs]=estnoisem(yf,tz,pp) 0002 %ESTNOISEM - estimate noise spectrum using minimum statistics 0003 % Inputs: 0004 % yf input power spectra (one row per frame) 0005 % tz frame increment in seconds 0006 % Alternatively, the input state from a previous call (see below) 0007 % pp algorithm parameters [optional] 0008 % 0009 % Outputs: 0010 % x estimated noise power spectra (one row per frame) 0011 % zo output state 0012 % xs estimated std error of x (one row per frame) 0013 % xs seems often to be an underestimate by a factor of 2 or 3 0014 % 0015 % The algorithm parameters are defined in reference [1] from which equation 0016 % numbers are given in parentheses. They are as follows: 0017 % 0018 % pp.taca % (11): smoothing time constant for alpha_c [0.0449 seconds] 0019 % pp.tamax % (3): max smoothing time constant [0.392 seconds] 0020 % pp.taminh % (3): min smoothing time constant (upper limit) [0.0133 seconds] 0021 % pp.tpfall % (12): time constant for P to fall [0.064 seconds] 0022 % pp.tbmax % (20): max smoothing time constant [0.0717 seconds] 0023 % pp.qeqmin % (23): minimum value of Qeq [2] 0024 % pp.qeqmax % max value of Qeq per frame [14] 0025 % pp.av % (23)+13 lines: fudge factor for bc calculation [2.12] 0026 % pp.td % time to take minimum over [1.536 seconds] 0027 % pp.nu % number of subwindows to use [3] 0028 % pp.qith % Q-inverse thresholds to select maximum noise slope [0.03 0.05 0.06 Inf ] 0029 % pp.nsmdb % corresponding noise slope thresholds in dB/second [47 31.4 15.7 4.1] 0030 % 0031 % Example use: y=enframe(s,w,ni); % divide speech signal s(n) into 0032 % % overlapping frames using window w(n) 0033 % yf=rfft(y,nf,2); % take fourier transform 0034 % dp=estnoisem(yf.*conj(yf),tinc); % estimate the noise 0035 % 0036 % If convenient, you can call estnoisem in chunks of arbitrary size. Thus the following are equivalent: 0037 % 0038 % (a) dp=estnoisem(yp(1:300),tinc); 0039 % 0040 % (b) [dp(1:100),z]=estnoisem(yp(1:100),tinc); 0041 % [dp(101:200),z]=estnoisem(yp(101:200),z); 0042 % [dp(201:300),z]=estnoisem(yp(201:300),z); 0043 0044 0045 % This is intended to be a precise implementation of [1] with Table III 0046 % replaced by the updated table 5 from [2]. The only deliberate algorithm 0047 % change is the introduction of a minimum value for 1/Qeq in equation (23). 0048 % This change only affects the first few frames and improves the 0049 % convergence of the algorithm. A minor improveemnt was reported in [3] but 0050 % this has not yet been included. 0051 % 0052 % Refs: 0053 % [1] Rainer Martin. 0054 % Noise power spectral density estimation based on optimal smoothing and minimum statistics. 0055 % IEEE Trans. Speech and Audio Processing, 9(5):504-512, July 2001. 0056 % [2] Rainer Martin. 0057 % Bias compensation methods for minimum statistics noise power spectral density estimation 0058 % Signal Processing, 2006, 86, 1215-1229 0059 % [3] Dirk Mauler and Rainer Martin 0060 % Noise power spectral density estimation on highly correlated data 0061 % Proc IWAENC, 2006 0062 0063 % Copyright (C) Mike Brookes 2008 0064 % Version: $Id: estnoisem.m 713 2011-10-16 14:45:43Z dmb $ 0065 % 0066 % VOICEBOX is a MATLAB toolbox for speech processing. 0067 % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html 0068 % 0069 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0070 % This program is free software; you can redistribute it and/or modify 0071 % it under the terms of the GNU General Public License as published by 0072 % the Free Software Foundation; either version 2 of the License, or 0073 % (at your option) any later version. 0074 % 0075 % This program is distributed in the hope that it will be useful, 0076 % but WITHOUT ANY WARRANTY; without even the implied warranty of 0077 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 0078 % GNU General Public License for more details. 0079 % 0080 % You can obtain a copy of the GNU General Public License from 0081 % http://www.gnu.org/copyleft/gpl.html or by writing to 0082 % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. 0083 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0084 0085 [nr,nrf]=size(yf); % number of frames and freq bins 0086 x=zeros(nr,nrf); % initialize output arrays 0087 xs=zeros(nr,nrf); % will hold std error in the future 0088 if isempty(yf) && isstruct(tz) % no real data 0089 zo=tz; % just keep the same state 0090 else 0091 if isstruct(tz) % take parameters from a previous call 0092 nrcum=tz.nrcum; 0093 p=tz.p; % smoothed power spectrum 0094 ac=tz.ac; % correction factor (9) 0095 sn2=tz.sn2; % estimated noise power 0096 pb=tz.pb; % smoothed noisy speech power (20) 0097 pb2=tz.pb2; 0098 pminu=tz.pminu; 0099 actmin=tz.actmin; % Running minimum estimate 0100 actminsub=tz.actminsub; % sub-window minimum estimate 0101 subwc=tz.subwc; % force a buffer switch on first loop 0102 actbuf=tz.actbuf; % buffer to store subwindow minima 0103 ibuf=tz.ibuf; 0104 lminflag=tz.lminflag; % flag to remember local minimum 0105 tinc=tz.tinc; % frame increment 0106 qq=tz.qq; % parameter structure 0107 else 0108 tinc = tz; % second argument is frame increment 0109 nrcum=0; % no frames so far 0110 % default algorithm constants 0111 0112 qq.taca=0.0449; % smoothing time constant for alpha_c = -tinc/log(0.7) in equ (11) 0113 qq.tamax=0.392; % max smoothing time constant in (3) = -tinc/log(0.96) 0114 qq.taminh=0.0133; % min smoothing time constant (upper limit) in (3) = -tinc/log(0.3) 0115 qq.tpfall=0.064; % time constant for P to fall (12) 0116 qq.tbmax=0.0717; % max smoothing time constant in (20) = -tinc/log(0.8) 0117 qq.qeqmin=2; % minimum value of Qeq (23) 0118 qq.qeqmax=14; % max value of Qeq per frame 0119 qq.av=2.12; % fudge factor for bc calculation (23 + 13 lines) 0120 qq.td=1.536; % time to take minimum over 0121 qq.nu=8; % number of subwindows 0122 qq.qith=[0.03 0.05 0.06 Inf]; % noise slope thresholds in dB/s 0123 qq.nsmdb=[47 31.4 15.7 4.1]; 0124 0125 if nargin>=3 && ~isempty(pp) 0126 qqn=fieldnames(qq); 0127 for i=1:length(qqn) 0128 if isfield(pp,qqn{i}) 0129 qq.(qqn{i})=pp.(qqn{i}); 0130 end 0131 end 0132 end 0133 end 0134 0135 % unpack parameter structure 0136 0137 taca=qq.taca; % smoothing time constant for alpha_c = -tinc/log(0.7) in equ (11) 0138 tamax=qq.tamax; % max smoothing time constant in (3) = -tinc/log(0.96) 0139 taminh=qq.taminh; % min smoothing time constant (upper limit) in (3) = -tinc/log(0.3) 0140 tpfall=qq.tpfall; % time constant for P to fall (12) 0141 tbmax=qq.tbmax; % max smoothing time constant in (20) = -tinc/log(0.8) 0142 qeqmin=qq.qeqmin; % minimum value of Qeq (23) 0143 qeqmax=qq.qeqmax; % max value of Qeq per frame 0144 av=qq.av; % fudge factor for bc calculation (23 + 13 lines) 0145 td=qq.td; % time to take minimum over 0146 nu=qq.nu; % number of subwindows 0147 qith=qq.qith; % noise slope thresholds in dB/s 0148 nsmdb=qq.nsmdb; % maximum permitted +ve noise slope in dB/s 0149 0150 % derived algorithm constants 0151 0152 aca=exp(-tinc/taca); % smoothing constant for alpha_c in equ (11) = 0.7 0153 acmax=aca; % min value of alpha_c = 0.7 in equ (11) also = 0.7 0154 amax=exp(-tinc/tamax); % max smoothing constant in (3) = 0.96 0155 aminh=exp(-tinc/taminh); % min smoothing constant (upper limit) in (3) = 0.3 0156 bmax=exp(-tinc/tbmax); % max smoothing constant in (20) = 0.8 0157 snrexp = -tinc/tpfall; 0158 nv=round(td/(tinc*nu)); % length of each subwindow in frames 0159 if nv<4 % algorithm doesn't work for miniscule frames 0160 nv=4; 0161 nu=max(round(td/(tinc*nv)),1); 0162 end 0163 nd=nu*nv; % length of total window in frames 0164 [md,hd]=mhvals(nd); % calculate the constants M(D) and H(D) from Table III 0165 [mv,hv]=mhvals(nv); % calculate the constants M(D) and H(D) from Table III 0166 nsms=10.^(nsmdb*nv*tinc/10); % [8 4 2 1.2] in paper 0167 qeqimax=1/qeqmin; % maximum value of Qeq inverse (23) 0168 qeqimin=1/qeqmax; % minumum value of Qeq per frame inverse 0169 0170 if isempty(yf) % provide dummy initialization 0171 ac=1; % correction factor (9) 0172 subwc=nv; % force a buffer switch on first loop 0173 ibuf=0; 0174 p=x; % smoothed power spectrum 0175 sn2=p; % estimated noise power 0176 pb=p; % smoothed noisy speech power (20) 0177 pb2=pb.^2; 0178 pminu=p; 0179 actmin=repmat(Inf,1,nrf); % Running minimum estimate 0180 actminsub=actmin; % sub-window minimum estimate 0181 actbuf=repmat(Inf,nu,nrf); % buffer to store subwindow minima 0182 lminflag=zeros(1,nrf); % flag to remember local minimum 0183 else 0184 0185 if ~nrcum % initialize values for first frame 0186 p=yf(1,:); % smoothed power spectrum 0187 ac=1; % correction factor (9) 0188 sn2=p; % estimated noise power 0189 pb=p; % smoothed noisy speech power (20) 0190 pb2=pb.^2; 0191 pminu=p; 0192 actmin=repmat(Inf,1,nrf); % Running minimum estimate 0193 actminsub=actmin; % sub-window minimum estimate 0194 subwc=nv; % force a buffer switch on first loop 0195 actbuf=repmat(Inf,nu,nrf); % buffer to store subwindow minima 0196 ibuf=0; 0197 lminflag=zeros(1,nrf); % flag to remember local minimum 0198 end 0199 0200 % loop for each frame 0201 0202 for t=1:nr % we use t instead of lambda in the paper 0203 yft=yf(t,:); % noise speech power spectrum 0204 acb=(1+(sum(p)./sum(yft)-1).^2).^(-1); % alpha_c-bar(t) (9) 0205 ac=aca*ac+(1-aca)*max(acb,acmax); % alpha_c(t) (10) 0206 ah=amax*ac.*(1+(p./sn2-1).^2).^(-1); % alpha_hat: smoothing factor per frequency (11) 0207 snr=sum(p)/sum(sn2); 0208 ah=max(ah,min(aminh,snr^snrexp)); % lower limit for alpha_hat (12) 0209 0210 p=ah.*p+(1-ah).*yft; % smoothed noisy speech power (3) 0211 b=min(ah.^2,bmax); % smoothing constant for estimating periodogram variance (22 + 2 lines) 0212 pb=b.*pb + (1-b).*p; % smoothed periodogram (20) 0213 pb2=b.*pb2 + (1-b).*p.^2; % smoothed periodogram squared (21) 0214 0215 qeqi=max(min((pb2-pb.^2)./(2*sn2.^2),qeqimax),qeqimin/(t+nrcum)); % Qeq inverse (23) 0216 qiav=sum(qeqi)/nrf; % Average over all frequencies (23+12 lines) (ignore non-duplication of DC and nyquist terms) 0217 bc=1+av*sqrt(qiav); % bias correction factor (23+11 lines) 0218 bmind=1+2*(nd-1)*(1-md)./(qeqi.^(-1)-2*md); % we use the simplified form (17) instead of (15) 0219 bminv=1+2*(nv-1)*(1-mv)./(qeqi.^(-1)-2*mv); % same expression but for sub windows 0220 kmod=bc*p.*bmind<actmin; % Frequency mask for new minimum 0221 if any(kmod) 0222 actmin(kmod)=bc*p(kmod).*bmind(kmod); 0223 actminsub(kmod)=bc*p(kmod).*bminv(kmod); 0224 end 0225 if subwc>1 && subwc<nv % middle of buffer - allow a local minimum 0226 lminflag=lminflag | kmod; % potential local minimum frequency bins 0227 pminu=min(actminsub,pminu); 0228 sn2=pminu; 0229 else 0230 if subwc>=nv % end of buffer - do a buffer switch 0231 ibuf=1+rem(ibuf,nu); % increment actbuf storage pointer 0232 actbuf(ibuf,:)=actmin; % save sub-window minimum 0233 pminu=min(actbuf,[],1); 0234 i=find(qiav<qith); 0235 nsm=nsms(i(1)); % noise slope max 0236 lmin=lminflag & ~kmod & actminsub<nsm*pminu & actminsub>pminu; 0237 if any(lmin) 0238 pminu(lmin)=actminsub(lmin); 0239 actbuf(:,lmin)=repmat(pminu(lmin),nu,1); 0240 end 0241 lminflag(:)=0; 0242 actmin(:)=Inf; 0243 subwc=0; 0244 end 0245 end 0246 subwc=subwc+1; 0247 x(t,:)=sn2; 0248 qisq=sqrt(qeqi); 0249 % empirical formula for standard error based on Fig 15 of [2] 0250 xs(t,:)=sn2.*sqrt(0.266*(nd+100*qisq).*qisq/(1+0.005*nd+6/nd)./(0.5*qeqi.^(-1)+nd-1)); 0251 end 0252 end 0253 if nargout>1 % we need to store the state for next time 0254 zo.nrcum=nrcum+nr; % number of frames so far 0255 zo.p=p; % smoothed power spectrum 0256 zo.ac=ac; % correction factor (9) 0257 zo.sn2=sn2; % estimated noise power 0258 zo.pb=pb; % smoothed noisy speech power (20) 0259 zo.pb2=pb2; 0260 zo.pminu=pminu; 0261 zo.actmin=actmin; % Running minimum estimate 0262 zo.actminsub=actminsub; % sub-window minimum estimate 0263 zo.subwc=subwc; % force a buffer switch on first loop 0264 zo.actbuf=actbuf; % buffer to store subwindow minima 0265 zo.ibuf=ibuf; 0266 zo.lminflag=lminflag; % flag to remember local minimum 0267 zo.tinc=tinc; % must be the last one 0268 zo.qq=qq; 0269 end 0270 if ~nargout 0271 plot((1:nr),10*log10([sum(x,2)/nrf sum(yf,2)/nrf])) 0272 legend('noise','input'); 0273 ylabel('Power (dB)'); 0274 xlabel(sprintf('Time (%d ms frames)',round(tinc*1000))); 0275 end 0276 end 0277 0278 function [m,h,d]=mhvals(d) 0279 % Values are taken from Table 5 in [2] 0280 %[2] R. Martin,"Bias compensation methods for minimum statistics noise power 0281 % spectral density estimation", Signal Processing Vol 86, pp1215-1229, 2006. 0282 0283 % approx: plot(d.^(-0.5),[m 1-d.^(-0.5)],'x-'), plot(d.^0.5,h,'x-') 0284 persistent dmh 0285 if isempty(dmh) 0286 dmh=[ 0287 1 0 0; 0288 2 0.26 0.15; 0289 5 0.48 0.48; 0290 8 0.58 0.78; 0291 10 0.61 0.98; 0292 15 0.668 1.55; 0293 20 0.705 2; 0294 30 0.762 2.3; 0295 40 0.8 2.52; 0296 60 0.841 3.1; 0297 80 0.865 3.38; 0298 120 0.89 4.15; 0299 140 0.9 4.35; 0300 160 0.91 4.25; 0301 180 0.92 3.9; 0302 220 0.93 4.1; 0303 260 0.935 4.7; 0304 300 0.94 5]; 0305 end 0306 0307 if nargin>=1 0308 i=find(d<=dmh(:,1)); 0309 if isempty(i) 0310 i=size(dmh,1); 0311 j=i; 0312 else 0313 i=i(1); 0314 j=i-1; 0315 end 0316 if d==dmh(i,1) 0317 m=dmh(i,2); 0318 h=dmh(i,3); 0319 else 0320 qj=sqrt(dmh(i-1,1)); % interpolate using sqrt(d) 0321 qi=sqrt(dmh(i,1)); 0322 q=sqrt(d); 0323 h=dmh(i,3)+(q-qi)*(dmh(j,3)-dmh(i,3))/(qj-qi); 0324 m=dmh(i,2)+(qi*qj/q-qj)*(dmh(j,2)-dmh(i,2))/(qi-qj); 0325 end 0326 else 0327 d=dmh(:,1); 0328 m=dmh(:,2); 0329 h=dmh(:,3); 0330 end