V_ESTNOISEG - estimate MMSE noise spectrum [x,zo]=(yf,tz,pp) Usage: ninc=round(0.016*fs); % frame increment [fs=sample frequency] ovf=2; % overlap factor f=v_rfft(v_enframe(s,v_windows(2,ovf*ninc,'l'),ninc),ovf*ninc,2); % enframe with hanning window f=f.*conj(f); % convert to power spectrum x=v_estnoiseg(f,ninc/fs); % estimate the noise power spectrum Inputs: yf input power spectra (one row per frame) tz frame increment in seconds Alternatively, the input state from a previous call (see below) pp algorithm parameters [optional] Outputs: x estimated noise power spectra (one row per frame) zo output state The algorithm parameters are defined in reference [1] from which equation numbers are given in parentheses. They are as follows: pp.tax % smoothing time constant for noise power estimate [0.0717 seconds](8) pp.tap % smoothing time constant for smoothed speech prob [0.152 seconds](23) pp.psthr % threshold for smoothed speech probability [0.99] (24) pp.pnsaf % noise probability safety value [0.01] (24) pp.pspri % prior speech probability [0.5] (18) pp.asnr % active SNR in dB [15] (18) pp.psini % initial speech probability [0.5] (23) pp.tavini % assumed speech absent time at start [0.064 seconds] If convenient, you can call v_estnoiseg in chunks of arbitrary size. Thus the following are equivalent: (a) dp=v_estnoiseg(yp(1:300),tinc); (b) [dp(1:100,:),z]=v_estnoiseg(yp(1:100,:),tinc); [dp(101:200,:),z]=v_estnoiseg(yp(101:200,:),z); [dp(201:300,:),z]=v_estnoiseg(yp(201:300,:),z);
0001 function [x,zo]=v_estnoiseg(yf,tz,pp) 0002 %V_ESTNOISEG - estimate MMSE noise spectrum [x,zo]=(yf,tz,pp) 0003 % 0004 % Usage: ninc=round(0.016*fs); % frame increment [fs=sample frequency] 0005 % ovf=2; % overlap factor 0006 % f=v_rfft(v_enframe(s,v_windows(2,ovf*ninc,'l'),ninc),ovf*ninc,2); % enframe with hanning window 0007 % f=f.*conj(f); % convert to power spectrum 0008 % x=v_estnoiseg(f,ninc/fs); % estimate the noise power spectrum 0009 % 0010 % Inputs: 0011 % yf input power spectra (one row per frame) 0012 % tz frame increment in seconds 0013 % Alternatively, the input state from a previous call (see below) 0014 % pp algorithm parameters [optional] 0015 % 0016 % Outputs: 0017 % x estimated noise power spectra (one row per frame) 0018 % zo output state 0019 % 0020 % The algorithm parameters are defined in reference [1] from which equation 0021 % numbers are given in parentheses. They are as follows: 0022 % 0023 % pp.tax % smoothing time constant for noise power estimate [0.0717 seconds](8) 0024 % pp.tap % smoothing time constant for smoothed speech prob [0.152 seconds](23) 0025 % pp.psthr % threshold for smoothed speech probability [0.99] (24) 0026 % pp.pnsaf % noise probability safety value [0.01] (24) 0027 % pp.pspri % prior speech probability [0.5] (18) 0028 % pp.asnr % active SNR in dB [15] (18) 0029 % pp.psini % initial speech probability [0.5] (23) 0030 % pp.tavini % assumed speech absent time at start [0.064 seconds] 0031 % 0032 % If convenient, you can call v_estnoiseg in chunks of arbitrary size. Thus the following are equivalent: 0033 % 0034 % (a) dp=v_estnoiseg(yp(1:300),tinc); 0035 % 0036 % (b) [dp(1:100,:),z]=v_estnoiseg(yp(1:100,:),tinc); 0037 % [dp(101:200,:),z]=v_estnoiseg(yp(101:200,:),z); 0038 % [dp(201:300,:),z]=v_estnoiseg(yp(201:300,:),z); 0039 0040 0041 % This is intended to be a precise implementation of [1] for a frame rate of 62.5 Hz. 0042 % Time constants are adjusted for other frame rates. 0043 % 0044 % Refs: 0045 % [1] Gerkmann, T. & Hendriks, R. C. 0046 % Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay 0047 % IEEE Trans Audio, Speech, Language Processing, 2012, 20, 1383-1393 0048 0049 % Copyright (C) Mike Brookes 2012 0050 % Version: $Id: v_estnoiseg.m 10865 2018-09-21 17:22:45Z dmb $ 0051 % 0052 % VOICEBOX is a MATLAB toolbox for speech processing. 0053 % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html 0054 % 0055 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0056 % This program is free software; you can redistribute it and/or modify 0057 % it under the terms of the GNU General Public License as published by 0058 % the Free Software Foundation; either version 2 of the License, or 0059 % (at your option) any later version. 0060 % 0061 % This program is distributed in the hope that it will be useful, 0062 % but WITHOUT ANY WARRANTY; without even the implied warranty of 0063 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 0064 % GNU General Public License for more details. 0065 % 0066 % You can obtain a copy of the GNU General Public License from 0067 % http://www.gnu.org/copyleft/gpl.html or by writing to 0068 % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. 0069 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0070 0071 [nr,nrf]=size(yf); % number of frames and freq bins 0072 x=zeros(nr,nrf); % initialize output arrays 0073 if isempty(yf) && isstruct(tz) % no real data 0074 zo=tz; % just keep the same state 0075 else 0076 if isstruct(tz) % take parameters from a previous call 0077 nrcum=tz.nrcum; % cumulative number of frames 0078 xt=tz.xt; % smoothed power spectrum 0079 pslp=tz.pslp; % correction factor (9) 0080 tinc=tz.tinc; % frame increment 0081 qq=tz.qq; % parameter structure 0082 else 0083 tinc = tz; % second argument is frame increment 0084 nrcum=0; % no frames so far 0085 % default algorithm constants 0086 qq.tax=0.0717; % noise output smoothing time constant = -tinc/log(0.8) (8) 0087 qq.tap=0.152; % speech prob smoothing time constant = -tinc/log(0.9) (23) 0088 qq.psthr=0.99; % threshold for smoothed speech probability [0.99] (24) 0089 qq.pnsaf=0.01; % noise probability safety value [0.01] (24) 0090 qq.pspri=0.5; % prior speech probability [0.5] (18) 0091 qq.asnr=15; % active SNR in dB [15] (18) 0092 qq.psini=0.5; % initial speech probability [0.5] (23) 0093 qq.tavini=0.064; % assumed speech absent time at start [64 ms] 0094 0095 if nargin>=3 && ~isempty(pp) % update fields from pp input 0096 qqn=fieldnames(qq); 0097 for i=1:length(qqn) 0098 if isfield(pp,qqn{i}) 0099 qq.(qqn{i})=pp.(qqn{i}); 0100 end 0101 end 0102 end 0103 pslp=repmat(qq.psini,1,nrf); % initialize smoothed speech presence prob 0104 xt=[]; % initialize just in case the first call has no data 0105 end 0106 0107 % unpack parameters needed within the loop 0108 0109 psthr=qq.psthr; % threshold for smoothed speech probability [0.99] (24) 0110 pnsaf=qq.pnsaf; % noise probability safety value [0.01] (24) 0111 0112 % derived algorithm constants 0113 0114 ax=exp(-tinc/qq.tax); % noise output smoothing factor = 0.8 (8) 0115 axc=1-ax; 0116 ap=exp(-tinc/qq.tap); % noise output smoothing factor = 0.9 (23) 0117 apc=1-ap; 0118 xih1=10^(qq.asnr/10); % speech-present SNR 0119 xih1r=1/(1+xih1)-1; 0120 pfac=(1/qq.pspri-1)*(1+xih1); % p(noise)/p(speech) (18) 0121 0122 if nrcum==0 && nr>0 % initialize values for first frame 0123 xt=qq.psini*mean(yf(1:max(1,min(nr,round(1+qq.tavini/tinc))),:),1); % initial noise estimate 0124 end 0125 0126 % loop for each frame 0127 for t=1:nr 0128 yft=yf(t,:); % noisy speech power spectrum 0129 ph1y=(1+pfac*exp(xih1r*yft./xt)).^(-1); % a-posteriori speech presence prob (18) 0130 pslp=ap*pslp+apc*ph1y; % smoothed speech presence prob (23) 0131 ph1y=min(ph1y,1-pnsaf*(pslp>psthr)); % limit ph1y (24) 0132 xtr=(1-ph1y).*yft+ph1y.*xt; % estimated raw noise spectrum (22) 0133 xt=ax*xt+axc*xtr; % smooth the noise estimate (8) 0134 x(t,:)=xt; % save the noise estimate 0135 end 0136 if nargout>1 % we need to store the state for next time 0137 zo.nrcum=nrcum+nr; % number of frames so far 0138 zo.xt=xt; % smoothed power spectrum 0139 zo.pslp=pslp; % correction factor (9) 0140 zo.tinc=tinc; % must be the last one 0141 zo.qq=qq; 0142 end 0143 if ~nargout 0144 clf; 0145 subplot(212); 0146 plot((1:nr)*tinc,10*log10([sum(yf,2) sum(x,2)])) 0147 ylabel('Frame Energy (dB)'); 0148 xlabel(sprintf('Time (s) [%d ms frame incr]',round(tinc*1000))); 0149 v_axisenlarge([-1 -1.05]); 0150 legend('input','noise','Location','Best'); 0151 subplot(211); 0152 plot(1:nrf,10*log10([sum(yf,1)'/nr sum(x,1)'/nr])) 0153 ylabel('Power (dB)'); 0154 xlabel('Frequency bin'); 0155 v_axisenlarge([-1 -1.05]); 0156 legend('input','noise','Location','Best'); 0157 end 0158 end 0159