V_EARNOISE Add noise to simulate the hearing threshold of a listener [Y,X,V]=(S,FS,M,SPL) Usage: (1) y=v_earnoise(s,fs); % scale the speech to 62.35 dB SPL, add "internal ear noise" and then filter (2) spl=62.35; % this code does the same but with explicit signal scaling x=10^(0.05*spl)*v_activlev(s,fs,'n') y=v_earnoise(x,fs,'u'); (3) v_earnoise(s,fs); % If outputs are omitted, a graph is plotted showing SNR spectrrum (4) y=v_earnoise(s,fs,[],50); % scale the speech to 50 dB SPL instead of the default 62.35 (5) y=v_earnoise(s,fs,'n',spl); % Assume the input signal, s, has already been scaled to 0 dB (saves computation) Inputs: s(n,c) speech signal: n samples with one channel per column fs sample frequency in Hz m mode string as shown below [default '??'] 'n' Input s has been normalized to 0 dB (e.g. with the 'n' option of v_activlev.m) 'u' Input s is already scaled correctly in SPL (so ignore the spl input argument) spl target active speech level in db SPL [default: 62.35] Outputs: y(n,c) filtered speech signal with added noise which simulates the ear input signal x(n,c) filtered input speech signal v(n,c) noise added to filtered speech signal This function adds ficticious "internal ear noise" onto an audio signal to simulate the effects of the frequency-dependent hearing threshold of a normal listener. To avoid having to add very high noise levels at low and high frequencies, it instead filters the input signal by the inverse of the desired noise spectrum and then adds white noise with 0 dB power spectral density. The noise spectrum is taken from Table 1 of [1] (which derived it from [2]) and, at a particular frequency, equals the pure-tone hearing threshold minus 10*log10(R) where R is the critical ratio. The critical ratio, R, is the power of a pure tone divided by the power spectral density of a white noise that just masks it; this ratio is approximately independent of level. By default the input speech for the strongest channel is scaled to correspond to a normal speaking level at 1 metre from the lips (62.35 dB from [1]). The speech level at the centre of the listener's head can alternatively be specified explicitly in dB SPL using the spl input parameter. For distant sources, the level should be reduced by 20*log10(dist) where dist is the distance in metres between the speaker's lips and the centre of the listener's head. The same scaling is used for all channels. This function assumes normal hearing; to account for hearing loss, use the 'u' option (as in usage example 2 above) and apply a filter to x that reduces the signal level by the hearing loss at each frequency. For example, if the hearing loss is 20 dB at all frequencies, then x should be multiplied by 0.1. Refs: [1] ANSI. Methods for the calculation of the speech intelligibility index. ANSI Standard S3.5-1997 (R2007), American National Standards Institute, 1997. [2] C. V. Pavlovic. Derivation of primary parameters and procedures for use in speech intelligibility predictions. J. Acoust. Soc. Amer., 82: 413–422, 1987.
0001 function [y,x,v]=v_earnoise(s,fs,m,spl) 0002 %V_EARNOISE Add noise to simulate the hearing threshold of a listener [Y,X,V]=(S,FS,M,SPL) 0003 % 0004 % Usage: (1) y=v_earnoise(s,fs); % scale the speech to 62.35 dB SPL, add "internal ear noise" and then filter 0005 % (2) spl=62.35; % this code does the same but with explicit signal scaling 0006 % x=10^(0.05*spl)*v_activlev(s,fs,'n') 0007 % y=v_earnoise(x,fs,'u'); 0008 % (3) v_earnoise(s,fs); % If outputs are omitted, a graph is plotted showing SNR spectrrum 0009 % (4) y=v_earnoise(s,fs,[],50); % scale the speech to 50 dB SPL instead of the default 62.35 0010 % (5) y=v_earnoise(s,fs,'n',spl); % Assume the input signal, s, has already been scaled to 0 dB (saves computation) 0011 % 0012 % Inputs: s(n,c) speech signal: n samples with one channel per column 0013 % fs sample frequency in Hz 0014 % m mode string as shown below [default '??'] 0015 % 'n' Input s has been normalized to 0 dB (e.g. with the 'n' option of v_activlev.m) 0016 % 'u' Input s is already scaled correctly in SPL (so ignore the spl input argument) 0017 % spl target active speech level in db SPL [default: 62.35] 0018 % 0019 % Outputs: y(n,c) filtered speech signal with added noise which simulates the ear input signal 0020 % x(n,c) filtered input speech signal 0021 % v(n,c) noise added to filtered speech signal 0022 % 0023 % This function adds ficticious "internal ear noise" onto an audio signal to simulate the effects of the 0024 % frequency-dependent hearing threshold of a normal listener. To avoid having to add very high noise 0025 % levels at low and high frequencies, it instead filters the input signal by the inverse of the desired 0026 % noise spectrum and then adds white noise with 0 dB power spectral density. The noise spectrum is taken 0027 % from Table 1 of [1] (which derived it from [2]) and, at a particular frequency, equals the pure-tone 0028 % hearing threshold minus 10*log10(R) where R is the critical ratio. The critical ratio, R, is the power 0029 % of a pure tone divided by the power spectral density of a white noise that just masks it; this ratio is 0030 % approximately independent of level. 0031 % 0032 % By default the input speech for the strongest channel is scaled to correspond to a normal speaking level 0033 % at 1 metre from the lips (62.35 dB from [1]). The speech level at the centre of the listener's head can 0034 % alternatively be specified explicitly in dB SPL using the spl input parameter. For distant sources, the 0035 % level should be reduced by 20*log10(dist) where dist is the distance in metres between the speaker's 0036 % lips and the centre of the listener's head. The same scaling is used for all channels. 0037 % 0038 % This function assumes normal hearing; to account for hearing loss, use the 'u' option (as in usage 0039 % example 2 above) and apply a filter to x that reduces the signal level by the hearing loss at each 0040 % frequency. For example, if the hearing loss is 20 dB at all frequencies, then x should be multiplied by 0.1. 0041 % 0042 % Refs: [1] ANSI. Methods for the calculation of the speech intelligibility index. 0043 % ANSI Standard S3.5-1997 (R2007), American National Standards Institute, 1997. 0044 % [2] C. V. Pavlovic. Derivation of primary parameters and procedures for use in speech 0045 % intelligibility predictions. J. Acoust. Soc. Amer., 82: 413–422, 1987. 0046 % 0047 persistent fs0 a b 0048 if isempty(fs0) || fs~=fs0 0049 [b,a]=v_stdspectrum(7,'z',fs); % inverse internal noise spectrum filter 0050 fs0=fs; 0051 end 0052 [ns,nc]=size(s); 0053 if nc>ns 0054 error('s input has more columns (channels) than rows (samples)'); 0055 end 0056 if nargin<3 || isempty(m) 0057 m=' '; 0058 end 0059 if nargin<4 || isempty(spl) 0060 spl=62.35; 0061 end 0062 if any(m=='n') || any(m=='u') 0063 if any(m=='n') 0064 dboff=spl; 0065 else 0066 dboff=0; 0067 end 0068 else 0069 if nc>1 0070 sal=zeros(1,nc); 0071 for i=1:nc 0072 sal(i)=v_activlev(s(:,i),fs,'d'); 0073 end 0074 dboff=spl-max(sal); % gain to apply to speech signal in dB 0075 else 0076 dboff=spl-v_activlev(s,fs,'d'); % gain to apply to speech signal in dB 0077 end 0078 end 0079 x=10^(0.05*dboff)*filter(b,a,s); 0080 v=sqrt(0.5*fs)*randn(size(s)); % Add noise at 0 dB power spectral density 0081 y=x+v; 0082 if ~nargout 0083 nfft=2*round(10e-3*fs/2); % FFT length is even number approximately 5 ms long 0084 fax=(0:nfft/2)*fs/nfft; % frequency axis for plot 0085 win=hamming(nfft); 0086 sal=zeros(1,nc); 0087 for i=1:nc 0088 sal(i)=v_activlev(s(:,i),fs,'d')+dboff; 0089 end 0090 [salmax,imax]=max(sal); 0091 [salmax,af,fso,vad]=v_activlev(s(:,imax),fs,'d'); % Get VAD from highest power input sigal 0092 fvad=sum(v_enframe(vad,nfft,nfft/2),2)>nfft/2; % frames with mostly speech in them 0093 minmax=[Inf -Inf]; 0094 leg=cell(nc,1); 0095 gsnr=zeros(nc,1); 0096 cols='brgcmyk'; 0097 for i=1:nc 0098 col=cols(1+mod(i-1,length(cols))); 0099 px=v_enframe(x(:,i),win,nfft/2,'sdp',fs); % computer first half of PSD 0100 psxm=mean(px(fvad,:),1); 0101 psxmdb=db(psxm,'p'); 0102 minmax=[min(minmax(1),min(psxmdb)) max(minmax(2),max(psxmdb))]; 0103 gsnr(i)=db(mean(psxm),'p'); 0104 semilogx(fax,psxmdb,[col '-']); 0105 hold on 0106 leg{i}=sprintf('Chan %d: %+.1f dB SPL',i,sal(i)); 0107 end 0108 for i=1:nc 0109 col=cols(1+mod(i-1,length(cols))); 0110 semilogx(fax([2 end]),gsnr([i i]),[col '--']); 0111 end 0112 snrrange=60; 0113 ylim=[max(minmax(1),minmax(2)-snrrange) minmax(2)]*[1.05 -0.05; -0.05 1.05]; 0114 set(gca,'ylim',ylim,'xlim',[100 fs/2]); 0115 if ylim(1)<0 && ylim(2)>0 0116 semilogx(fax([2 end]),[0 0],'k:'); 0117 end 0118 hold off 0119 grid on; 0120 legend(leg,'location','best') 0121 xlabel(['Frequency (' v_xticksi 'Hz)']); 0122 ylabel('SNR (dB)') 0123 title('Hearing threshold equivalent SNR'); 0124 end