V_ESTNOISEM - estimate noise spectrum using minimum statistics Usage: ninc=round(0.016*fs); % frame increment [fs=sample frequency] ovf=2; % overlap factor f=v_rfft(v_enframe(s,v_windows(2,ovf*ninc,'l'),ninc),ovf*ninc,2); f=f.*conj(f); % convert to power spectrum x=v_estnoisem(f,ninc/fs); % estimate the noise power spectrum Inputs: yf input power spectra (one row per frame) tz frame increment in seconds Alternatively, the input state from a previous call (see below) pp algorithm parameters [optional] Outputs: x estimated noise power spectra (one row per frame) zo output state xs estimated std error of x (one row per frame) xs seems often to be an underestimate by a factor of 2 or 3 The algorithm parameters are defined in reference [1] from which equation numbers are given in parentheses. They are as follows: pp.taca % (11): smoothing time constant for alpha_c [0.0449 seconds] pp.tamax % (3): max smoothing time constant [0.392 seconds] pp.taminh % (3): min smoothing time constant (upper limit) [0.0133 seconds] pp.tpfall % (12): time constant for P to fall [0.064 seconds] pp.tbmax % (20): max smoothing time constant [0.0717 seconds] pp.qeqmin % (23): minimum value of Qeq [2] pp.qeqmax % max value of Qeq per frame [14] pp.av % (23)+13 lines: fudge factor for bc calculation [2.12] pp.td % time to take minimum over [1.536 seconds] pp.nu % number of subwindows to use [3] pp.qith % Q-inverse thresholds to select maximum noise slope [0.03 0.05 0.06 Inf ] pp.nsmdb % corresponding noise slope thresholds in dB/second [47 31.4 15.7 4.1] Example use: y=v_enframe(s,w,ni); % divide speech signal s(n) into % overlapping frames using window w(n) yf=v_rfft(y,nf,2); % take fourier transform dp=v_estnoisem(yf.*conj(yf),tinc); % estimate the noise If convenient, you can call v_estnoisem in chunks of arbitrary size. Thus the following are equivalent: (a) dp=v_estnoisem(yp(1:300),tinc); (b) [dp(1:100),z]=v_estnoisem(yp(1:100),tinc); [dp(101:200),z]=v_estnoisem(yp(101:200),z); [dp(201:300),z]=v_estnoisem(yp(201:300),z);
0001 function [x,zo,xs]=v_estnoisem(yf,tz,pp) 0002 %V_ESTNOISEM - estimate noise spectrum using minimum statistics 0003 % 0004 % Usage: ninc=round(0.016*fs); % frame increment [fs=sample frequency] 0005 % ovf=2; % overlap factor 0006 % f=v_rfft(v_enframe(s,v_windows(2,ovf*ninc,'l'),ninc),ovf*ninc,2); 0007 % f=f.*conj(f); % convert to power spectrum 0008 % x=v_estnoisem(f,ninc/fs); % estimate the noise power spectrum 0009 % 0010 % Inputs: 0011 % yf input power spectra (one row per frame) 0012 % tz frame increment in seconds 0013 % Alternatively, the input state from a previous call (see below) 0014 % pp algorithm parameters [optional] 0015 % 0016 % Outputs: 0017 % x estimated noise power spectra (one row per frame) 0018 % zo output state 0019 % xs estimated std error of x (one row per frame) 0020 % xs seems often to be an underestimate by a factor of 2 or 3 0021 % 0022 % The algorithm parameters are defined in reference [1] from which equation 0023 % numbers are given in parentheses. They are as follows: 0024 % 0025 % pp.taca % (11): smoothing time constant for alpha_c [0.0449 seconds] 0026 % pp.tamax % (3): max smoothing time constant [0.392 seconds] 0027 % pp.taminh % (3): min smoothing time constant (upper limit) [0.0133 seconds] 0028 % pp.tpfall % (12): time constant for P to fall [0.064 seconds] 0029 % pp.tbmax % (20): max smoothing time constant [0.0717 seconds] 0030 % pp.qeqmin % (23): minimum value of Qeq [2] 0031 % pp.qeqmax % max value of Qeq per frame [14] 0032 % pp.av % (23)+13 lines: fudge factor for bc calculation [2.12] 0033 % pp.td % time to take minimum over [1.536 seconds] 0034 % pp.nu % number of subwindows to use [3] 0035 % pp.qith % Q-inverse thresholds to select maximum noise slope [0.03 0.05 0.06 Inf ] 0036 % pp.nsmdb % corresponding noise slope thresholds in dB/second [47 31.4 15.7 4.1] 0037 % 0038 % Example use: y=v_enframe(s,w,ni); % divide speech signal s(n) into 0039 % % overlapping frames using window w(n) 0040 % yf=v_rfft(y,nf,2); % take fourier transform 0041 % dp=v_estnoisem(yf.*conj(yf),tinc); % estimate the noise 0042 % 0043 % If convenient, you can call v_estnoisem in chunks of arbitrary size. Thus the following are equivalent: 0044 % 0045 % (a) dp=v_estnoisem(yp(1:300),tinc); 0046 % 0047 % (b) [dp(1:100),z]=v_estnoisem(yp(1:100),tinc); 0048 % [dp(101:200),z]=v_estnoisem(yp(101:200),z); 0049 % [dp(201:300),z]=v_estnoisem(yp(201:300),z); 0050 0051 0052 % This is intended to be a precise implementation of [1] with Table III 0053 % replaced by the updated table 5 from [2]. The only deliberate algorithm 0054 % change is the introduction of a minimum value for 1/Qeq in equation (23). 0055 % This change only affects the first few frames and improves the 0056 % convergence of the algorithm. A minor improveemnt was reported in [3] but 0057 % this has not yet been included. 0058 % 0059 % Refs: 0060 % [1] Rainer Martin. 0061 % Noise power spectral density estimation based on optimal smoothing and minimum statistics. 0062 % IEEE Trans. Speech and Audio Processing, 9(5):504-512, July 2001. 0063 % [2] Rainer Martin. 0064 % Bias compensation methods for minimum statistics noise power spectral density estimation 0065 % Signal Processing, 2006, 86, 1215-1229 0066 % [3] Dirk Mauler and Rainer Martin 0067 % Noise power spectral density estimation on highly correlated data 0068 % Proc IWAENC, 2006 0069 0070 % Copyright (C) Mike Brookes 2008 0071 % Version: $Id: v_estnoisem.m 10865 2018-09-21 17:22:45Z dmb $ 0072 % 0073 % VOICEBOX is a MATLAB toolbox for speech processing. 0074 % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html 0075 % 0076 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0077 % This program is free software; you can redistribute it and/or modify 0078 % it under the terms of the GNU General Public License as published by 0079 % the Free Software Foundation; either version 2 of the License, or 0080 % (at your option) any later version. 0081 % 0082 % This program is distributed in the hope that it will be useful, 0083 % but WITHOUT ANY WARRANTY; without even the implied warranty of 0084 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 0085 % GNU General Public License for more details. 0086 % 0087 % You can obtain a copy of the GNU General Public License from 0088 % http://www.gnu.org/copyleft/gpl.html or by writing to 0089 % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. 0090 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0091 0092 [nr,nrf]=size(yf); % number of frames and freq bins 0093 x=zeros(nr,nrf); % initialize output arrays 0094 xs=zeros(nr,nrf); % will hold std error in the future 0095 if isempty(yf) && isstruct(tz) % no real data 0096 zo=tz; % just keep the same state 0097 else 0098 if isstruct(tz) % take parameters from a previous call 0099 nrcum=tz.nrcum; 0100 p=tz.p; % smoothed power spectrum 0101 ac=tz.ac; % correction factor (9) 0102 sn2=tz.sn2; % estimated noise power 0103 pb=tz.pb; % smoothed noisy speech power (20) 0104 pb2=tz.pb2; 0105 pminu=tz.pminu; 0106 actmin=tz.actmin; % Running minimum estimate 0107 actminsub=tz.actminsub; % sub-window minimum estimate 0108 subwc=tz.subwc; % force a buffer switch on first loop 0109 actbuf=tz.actbuf; % buffer to store subwindow minima 0110 ibuf=tz.ibuf; 0111 lminflag=tz.lminflag; % flag to remember local minimum 0112 tinc=tz.tinc; % frame increment 0113 qq=tz.qq; % parameter structure 0114 else 0115 tinc = tz; % second argument is frame increment 0116 nrcum=0; % no frames so far 0117 % default algorithm constants 0118 0119 qq.taca=0.0449; % smoothing time constant for alpha_c = -tinc/log(0.7) in equ (11) 0120 qq.tamax=0.392; % max smoothing time constant in (3) = -tinc/log(0.96) 0121 qq.taminh=0.0133; % min smoothing time constant (upper limit) in (3) = -tinc/log(0.3) 0122 qq.tpfall=0.064; % time constant for P to fall (12) 0123 qq.tbmax=0.0717; % max smoothing time constant in (20) = -tinc/log(0.8) 0124 qq.qeqmin=2; % minimum value of Qeq (23) 0125 qq.qeqmax=14; % max value of Qeq per frame 0126 qq.av=2.12; % fudge factor for bc calculation (23 + 13 lines) 0127 qq.td=1.536; % time to take minimum over 0128 qq.nu=8; % number of subwindows 0129 qq.qith=[0.03 0.05 0.06 Inf]; % noise slope thresholds in dB/s 0130 qq.nsmdb=[47 31.4 15.7 4.1]; 0131 0132 if nargin>=3 && ~isempty(pp) 0133 qqn=fieldnames(qq); 0134 for i=1:length(qqn) 0135 if isfield(pp,qqn{i}) 0136 qq.(qqn{i})=pp.(qqn{i}); 0137 end 0138 end 0139 end 0140 end 0141 0142 % unpack parameter structure 0143 0144 taca=qq.taca; % smoothing time constant for alpha_c = -tinc/log(0.7) in equ (11) 0145 tamax=qq.tamax; % max smoothing time constant in (3) = -tinc/log(0.96) 0146 taminh=qq.taminh; % min smoothing time constant (upper limit) in (3) = -tinc/log(0.3) 0147 tpfall=qq.tpfall; % time constant for P to fall (12) 0148 tbmax=qq.tbmax; % max smoothing time constant in (20) = -tinc/log(0.8) 0149 qeqmin=qq.qeqmin; % minimum value of Qeq (23) 0150 qeqmax=qq.qeqmax; % max value of Qeq per frame 0151 av=qq.av; % fudge factor for bc calculation (23 + 13 lines) 0152 td=qq.td; % time to take minimum over 0153 nu=qq.nu; % number of subwindows 0154 qith=qq.qith; % noise slope thresholds in dB/s 0155 nsmdb=qq.nsmdb; % maximum permitted +ve noise slope in dB/s 0156 0157 % derived algorithm constants 0158 0159 aca=exp(-tinc/taca); % smoothing constant for alpha_c in equ (11) = 0.7 0160 acmax=aca; % min value of alpha_c = 0.7 in equ (11) also = 0.7 0161 amax=exp(-tinc/tamax); % max smoothing constant in (3) = 0.96 0162 aminh=exp(-tinc/taminh); % min smoothing constant (upper limit) in (3) = 0.3 0163 bmax=exp(-tinc/tbmax); % max smoothing constant in (20) = 0.8 0164 snrexp = -tinc/tpfall; 0165 nv=round(td/(tinc*nu)); % length of each subwindow in frames 0166 if nv<4 % algorithm doesn't work for miniscule frames 0167 nv=4; 0168 nu=max(round(td/(tinc*nv)),1); 0169 end 0170 nd=nu*nv; % length of total window in frames 0171 [md,hd]=mhvals(nd); % calculate the constants M(D) and H(D) from Table III 0172 [mv,hv]=mhvals(nv); % calculate the constants M(D) and H(D) from Table III 0173 nsms=10.^(nsmdb*nv*tinc/10); % [8 4 2 1.2] in paper 0174 qeqimax=1/qeqmin; % maximum value of Qeq inverse (23) 0175 qeqimin=1/qeqmax; % minumum value of Qeq per frame inverse 0176 0177 if isempty(yf) % provide dummy initialization 0178 ac=1; % correction factor (9) 0179 subwc=nv; % force a buffer switch on first loop 0180 ibuf=0; 0181 p=x; % smoothed power spectrum 0182 sn2=p; % estimated noise power 0183 pb=p; % smoothed noisy speech power (20) 0184 pb2=pb.^2; 0185 pminu=p; 0186 actmin=repmat(Inf,1,nrf); % Running minimum estimate 0187 actminsub=actmin; % sub-window minimum estimate 0188 actbuf=repmat(Inf,nu,nrf); % buffer to store subwindow minima 0189 lminflag=zeros(1,nrf); % flag to remember local minimum 0190 else 0191 0192 if ~nrcum % initialize values for first frame 0193 p=yf(1,:); % smoothed power spectrum 0194 ac=1; % correction factor (9) 0195 sn2=p; % estimated noise power 0196 pb=p; % smoothed noisy speech power (20) 0197 pb2=pb.^2; 0198 pminu=p; 0199 actmin=repmat(Inf,1,nrf); % Running minimum estimate 0200 actminsub=actmin; % sub-window minimum estimate 0201 subwc=nv; % force a buffer switch on first loop 0202 actbuf=repmat(Inf,nu,nrf); % buffer to store subwindow minima 0203 ibuf=0; 0204 lminflag=zeros(1,nrf); % flag to remember local minimum 0205 end 0206 0207 % loop for each frame 0208 0209 for t=1:nr % we use t instead of lambda in the paper 0210 yft=yf(t,:); % noise speech power spectrum 0211 acb=(1+(sum(p)./sum(yft)-1).^2).^(-1); % alpha_c-bar(t) (9) 0212 ac=aca*ac+(1-aca)*max(acb,acmax); % alpha_c(t) (10) 0213 ah=amax*ac.*(1+(p./sn2-1).^2).^(-1); % alpha_hat: smoothing factor per frequency (11) 0214 snr=sum(p)/sum(sn2); 0215 ah=max(ah,min(aminh,snr^snrexp)); % lower limit for alpha_hat (12) 0216 0217 p=ah.*p+(1-ah).*yft; % smoothed noisy speech power (3) 0218 b=min(ah.^2,bmax); % smoothing constant for estimating periodogram variance (22 + 2 lines) 0219 pb=b.*pb + (1-b).*p; % smoothed periodogram (20) 0220 pb2=b.*pb2 + (1-b).*p.^2; % smoothed periodogram squared (21) 0221 0222 qeqi=max(min((pb2-pb.^2)./(2*sn2.^2),qeqimax),qeqimin/(t+nrcum)); % Qeq inverse (23) 0223 qiav=sum(qeqi)/nrf; % Average over all frequencies (23+12 lines) (ignore non-duplication of DC and nyquist terms) 0224 bc=1+av*sqrt(qiav); % bias correction factor (23+11 lines) 0225 bmind=1+2*(nd-1)*(1-md)./(qeqi.^(-1)-2*md); % we use the simplified form (17) instead of (15) 0226 bminv=1+2*(nv-1)*(1-mv)./(qeqi.^(-1)-2*mv); % same expression but for sub windows 0227 kmod=bc*p.*bmind<actmin; % Frequency mask for new minimum 0228 if any(kmod) 0229 actmin(kmod)=bc*p(kmod).*bmind(kmod); 0230 actminsub(kmod)=bc*p(kmod).*bminv(kmod); 0231 end 0232 if subwc>1 && subwc<nv % middle of buffer - allow a local minimum 0233 lminflag=lminflag | kmod; % potential local minimum frequency bins 0234 pminu=min(actminsub,pminu); 0235 sn2=pminu; 0236 else 0237 if subwc>=nv % end of buffer - do a buffer switch 0238 ibuf=1+rem(ibuf,nu); % increment actbuf storage pointer 0239 actbuf(ibuf,:)=actmin; % save sub-window minimum 0240 pminu=min(actbuf,[],1); 0241 i=find(qiav<qith); 0242 nsm=nsms(i(1)); % noise slope max 0243 lmin=lminflag & ~kmod & actminsub<nsm*pminu & actminsub>pminu; 0244 if any(lmin) 0245 pminu(lmin)=actminsub(lmin); 0246 actbuf(:,lmin)=repmat(pminu(lmin),nu,1); 0247 end 0248 lminflag(:)=0; 0249 actmin(:)=Inf; 0250 subwc=0; 0251 end 0252 end 0253 subwc=subwc+1; 0254 x(t,:)=sn2; 0255 qisq=sqrt(qeqi); 0256 % empirical formula for standard error based on Fig 15 of [2] 0257 xs(t,:)=sn2.*sqrt(0.266*(nd+100*qisq).*qisq/(1+0.005*nd+6/nd)./(0.5*qeqi.^(-1)+nd-1)); 0258 end 0259 end 0260 if nargout>1 % we need to store the state for next time 0261 zo.nrcum=nrcum+nr; % number of frames so far 0262 zo.p=p; % smoothed power spectrum 0263 zo.ac=ac; % correction factor (9) 0264 zo.sn2=sn2; % estimated noise power 0265 zo.pb=pb; % smoothed noisy speech power (20) 0266 zo.pb2=pb2; 0267 zo.pminu=pminu; 0268 zo.actmin=actmin; % Running minimum estimate 0269 zo.actminsub=actminsub; % sub-window minimum estimate 0270 zo.subwc=subwc; % force a buffer switch on first loop 0271 zo.actbuf=actbuf; % buffer to store subwindow minima 0272 zo.ibuf=ibuf; 0273 zo.lminflag=lminflag; % flag to remember local minimum 0274 zo.tinc=tinc; % must be the last one 0275 zo.qq=qq; 0276 end 0277 if ~nargout 0278 clf; 0279 subplot(212); 0280 plot((1:nr)*tinc,10*log10([sum(yf,2) sum(x,2)])) 0281 ylabel('Frame Energy (dB)'); 0282 xlabel(sprintf('Time (s) [%d ms frame incr]',round(tinc*1000))); 0283 v_axisenlarge([-1 -1.05]); 0284 legend('input','noise','Location','Best'); 0285 subplot(211); 0286 plot(1:nrf,10*log10([sum(yf,1)'/nr sum(x,1)'/nr])) 0287 ylabel('Power (dB)'); 0288 xlabel('Frequency bin'); 0289 v_axisenlarge([-1 -1.05]); 0290 legend('input','noise','Location','Best'); 0291 end 0292 end 0293 0294 function [m,h,d]=mhvals(d) 0295 % Values are taken from Table 5 in [2] 0296 %[2] R. Martin,"Bias compensation methods for minimum statistics noise power 0297 % spectral density estimation", Signal Processing Vol 86, pp1215-1229, 2006. 0298 0299 % approx: plot(d.^(-0.5),[m 1-d.^(-0.5)],'x-'), plot(d.^0.5,h,'x-') 0300 persistent dmh 0301 if isempty(dmh) 0302 dmh=[ 0303 1 0 0; 0304 2 0.26 0.15; 0305 5 0.48 0.48; 0306 8 0.58 0.78; 0307 10 0.61 0.98; 0308 15 0.668 1.55; 0309 20 0.705 2; 0310 30 0.762 2.3; 0311 40 0.8 2.52; 0312 60 0.841 3.1; 0313 80 0.865 3.38; 0314 120 0.89 4.15; 0315 140 0.9 4.35; 0316 160 0.91 4.25; 0317 180 0.92 3.9; 0318 220 0.93 4.1; 0319 260 0.935 4.7; 0320 300 0.94 5]; 0321 end 0322 0323 if nargin>=1 0324 i=find(d<=dmh(:,1)); 0325 if isempty(i) 0326 i=size(dmh,1); 0327 j=i; 0328 else 0329 i=i(1); 0330 j=i-1; 0331 end 0332 if d==dmh(i,1) 0333 m=dmh(i,2); 0334 h=dmh(i,3); 0335 else 0336 qj=sqrt(dmh(i-1,1)); % interpolate using sqrt(d) 0337 qi=sqrt(dmh(i,1)); 0338 q=sqrt(d); 0339 h=dmh(i,3)+(q-qi)*(dmh(j,3)-dmh(i,3))/(qj-qi); 0340 m=dmh(i,2)+(qi*qj/q-qj)*(dmh(j,2)-dmh(i,2))/(qi-qj); 0341 end 0342 else 0343 d=dmh(:,1); 0344 m=dmh(:,2); 0345 h=dmh(:,3); 0346 end