V_SNRSEG Measure segmental and global SNR [SEG,GLO]=(S,R,FS,M,TF) Usage: (1) seg=v_snrseg(s,r,fs); % s & r are noisy and clean signal (2) seg=v_snrseg(s,r,fs,'wz'); % no VAD or inerpolation used ['Vq' is default] (3) [seg,glo]=v_snrseg(s,r,fs,'Vq',0.03); % 30 ms frames Inputs: s test signal r reference signal fs sample frequency (Hz) m mode [default = 'Vq'] w = No VAD - use whole file v = use sohn VAD to discard silent portions V = use P.56-based VAD to discard silent portions [default] a = A-weight the signals b = weight signals by BS-468 q = use linear interpolation to remove delays +- 1 sample [default] z = do not do any alignment p = plot results tf frame increment [0.01] Outputs: seg = Segmental SNR in dB glo = Global SNR in dB (typically 7 dB greater than SNR-seg) tc = time at centre of each frame (seconds) snf = Segmental SNR in dB in each frame vf = Boolean mask indicating frames that valid (from VAD) This function compares a noisy signal, S, with a clean reference, R, and computes the segemntal signal-to-noise ratio (SNR) in dB. The signals, which must be of the same length, are split into non-overlapping frames of length TF (default 10 ms) and the SNR of each frame in dB is calculated. The segmental SNR is the average of these values, i.e. SEG = mean(10*log10(sum(Ri^2)/sum((Si-Ri)^2)) where the mean is over frames and the sum runs over one particular frame. Two optional modifications can be made to this basic formula: (a) Frames are excluded if there is no significant energy in the R signal. The idea is to limit the calculation to frames in which speech is active. By default, the v_voicebox function "v_activlev" is used to detect the inactive frames (the 'V' mode option). (b) In each frame independently, the reference signal is shifted by up to +- 1 sample to find the alignment than minimizes the noise component (S-R)^2. This shifting accounts for small misalignments and/or sample frequency differences between the two signals. For larger shifts, you can use the v_voicebox function "v_sigalign". Accurate alignemnt is especially important at high SNR values. If no M argument is specified, both these modifications will be applied; this is equivalent to specifying M='Vq'.
0001 function [seg,glo,tc,snf,vf]=v_snrseg(s,r,fs,m,tf) 0002 %V_SNRSEG Measure segmental and global SNR [SEG,GLO]=(S,R,FS,M,TF) 0003 % 0004 %Usage: (1) seg=v_snrseg(s,r,fs); % s & r are noisy and clean signal 0005 % (2) seg=v_snrseg(s,r,fs,'wz'); % no VAD or inerpolation used ['Vq' is default] 0006 % (3) [seg,glo]=v_snrseg(s,r,fs,'Vq',0.03); % 30 ms frames 0007 % 0008 % Inputs: s test signal 0009 % r reference signal 0010 % fs sample frequency (Hz) 0011 % m mode [default = 'Vq'] 0012 % w = No VAD - use whole file 0013 % v = use sohn VAD to discard silent portions 0014 % V = use P.56-based VAD to discard silent portions [default] 0015 % a = A-weight the signals 0016 % b = weight signals by BS-468 0017 % q = use linear interpolation to remove delays +- 1 sample [default] 0018 % z = do not do any alignment 0019 % p = plot results 0020 % tf frame increment [0.01] 0021 % 0022 % Outputs: seg = Segmental SNR in dB 0023 % glo = Global SNR in dB (typically 7 dB greater than SNR-seg) 0024 % tc = time at centre of each frame (seconds) 0025 % snf = Segmental SNR in dB in each frame 0026 % vf = Boolean mask indicating frames that valid (from VAD) 0027 % 0028 % This function compares a noisy signal, S, with a clean reference, R, and 0029 % computes the segemntal signal-to-noise ratio (SNR) in dB. The signals, 0030 % which must be of the same length, are split into non-overlapping frames 0031 % of length TF (default 10 ms) and the SNR of each frame in dB is calculated. 0032 % The segmental SNR is the average of these values, i.e. 0033 % SEG = mean(10*log10(sum(Ri^2)/sum((Si-Ri)^2)) 0034 % where the mean is over frames and the sum runs over one particular frame. 0035 % Two optional modifications can be made to this basic formula: 0036 % 0037 % (a) Frames are excluded if there is no significant energy in the R 0038 % signal. The idea is to limit the calculation to frames in which 0039 % speech is active. By default, the v_voicebox function "v_activlev" is 0040 % used to detect the inactive frames (the 'V' mode option). 0041 % 0042 % (b) In each frame independently, the reference signal is shifted by up 0043 % to +- 1 sample to find the alignment than minimizes the noise 0044 % component (S-R)^2. This shifting accounts for small misalignments 0045 % and/or sample frequency differences between the two signals. For 0046 % larger shifts, you can use the v_voicebox function "v_sigalign". 0047 % Accurate alignemnt is especially important at high SNR values. 0048 % 0049 % If no M argument is specified, both these modifications will be applied; 0050 % this is equivalent to specifying M='Vq'. 0051 0052 % Bugs/suggestions 0053 % (1) Optionally restrict the bandwidth to the smaller of the two 0054 % bandwidths either with an extra parameter or automatically determined 0055 % (2) Optionally apply a gain to the s signal to maximize the SNR and 0056 % output the gain as an additional parameter 0057 0058 % Copyright (C) Mike Brookes 2011 0059 % Version: $Id: v_snrseg.m 10865 2018-09-21 17:22:45Z dmb $ 0060 % 0061 % VOICEBOX is a MATLAB toolbox for speech processing. 0062 % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html 0063 % 0064 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0065 % This program is free software; you can redistribute it and/or modify 0066 % it under the terms of the GNU General Public License as published by 0067 % the Free Software Foundation; either version 2 of the License, or 0068 % (at your option) any later version. 0069 % 0070 % This program is distributed in the hope that it will be useful, 0071 % but WITHOUT ANY WARRANTY; without even the implied warranty of 0072 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 0073 % GNU General Public License for more details. 0074 % 0075 % You can obtain a copy of the GNU General Public License from 0076 % http://www.gnu.org/copyleft/gpl.html or by writing to 0077 % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. 0078 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0079 0080 if nargin<4 || ~ischar(m) 0081 m='Vq'; 0082 end 0083 if nargin<5 || ~numel(tf) 0084 tf=0.01; % default frame length is 10 ms 0085 end 0086 snmax=100; % clipping limit for SNR 0087 0088 % filter the input signals if required 0089 0090 if any(m=='a') % A-weighting 0091 [b,a]=v_stdspectrum(2,'z',fs); % create bandpass filter 0092 s=filter(b,a,s); % filter test signal 0093 r=filter(b,a,r); % filter reference signal 0094 elseif any(m=='b') % BS-468 weighting 0095 [b,a]=v_stdspectrum(8,'z',fs); % create bandpass filter 0096 s=filter(b,a,s); % filter test signal 0097 r=filter(b,a,r); % filter reference signal 0098 end 0099 0100 mq=~any(m=='z'); % flag indicating we need to do alignment 0101 nr=min(length(r), length(s)); % signal length to align 0102 kf=round(tf*fs); % length of frame in samples 0103 ifr=kf+mq:kf:nr-mq; % ending sample of each frame (allowing +-mq at each end) 0104 ifl=ifr(end); % end sample of last frame 0105 nf=numel(ifr); % number of frames 0106 if mq % perform linear interpolation 0107 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0108 % Consider the linearly interpolated signal sm(i)=(1-am)*s(i)+am*s(i+1) where 0<=am<=1. % 0109 % For each frame we calculate the value of am that minimizes the error sum((sm(i)-r(i))^2). % 0110 % The solution is am = sum((s(i)-r(i)).*(s(i)-s(i+1))) / sum((s(i)-s(i+1))^2). % 0111 % Similarly, we calculate the optimum ap in sp(i)=(1-ap)*s(i)+ap*s(i-1) and then % 0112 % finally, we select whichever of sm(i) and sp(i) with the lowest error. % 0113 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0114 ssm=reshape(s(2:ifl)-s(3:ifl+1),kf,nf); 0115 ssp=reshape(s(2:ifl)-s(1:ifl-1),kf,nf); 0116 sr=reshape(s(2:ifl)-r(2:ifl),kf,nf); % enframed error: test-ref (one frame per column) 0117 am=min(max(sum(sr.*ssm,1)./sum(ssm.^2,1),0),1); % optimum dist between s(2:ifl) and s(3:ifl+1) 0118 ap=min(max(sum(sr.*ssp,1)./sum(ssp.^2,1),0),1); % optimum dist between s(2:ifl) and s(1:ifl-1) 0119 ef=min(sum((sr-repmat(am,kf,1).*ssm).^2,1),sum((sr-repmat(ap,kf,1).*ssp).^2,1)); % select the best for each frame 0120 else % no interpolation 0121 ef=sum(reshape((s(1:ifl)-r(1:ifl)).^2,kf,nf),1); % sum of squared errors in each frame 0122 end 0123 rf=sum(reshape(r(mq+1:ifl).^2,kf,nf),1); % sum of sqared reference signal in each frame 0124 em=ef==0; % mask for zero noise frames 0125 rm=rf==0; % mask for zero reference frames 0126 snf=10*log10((rf+rm)./(ef+em)); 0127 snf(rm)=-snmax; 0128 snf(em)=snmax; 0129 0130 % select the frames to include 0131 0132 if any(m=='w') % 'w' option: do not discard silent frames 0133 vf=true(1,nf); % include all frames 0134 elseif any(m=='v') % 'v' option: use Sohn VAD to discard silent frames 0135 vs=v_vadsohn(r,fs,'na'); 0136 nvs=length(vs); 0137 [vss,vix]=sort([ifr'; vs(:,2)]); 0138 vjx=zeros(nvs+nf,5); 0139 vjx(vix,1)=(1:nvs+nf)'; % sorted position 0140 vjx(1:nf,2)=vjx(1:nf,1)-(1:nf)'; % prev VAD frame end (or 0 or nvs+1 if none) 0141 vjx(nf+1:end,2)=vjx(nf+1:end,1)-(1:nvs)'; % prev snr frame end (or 0 or nvs+1 if none) 0142 dvs=[vss(1)-mq; vss(2:end)-vss(1:end-1)]; % number of samples from previous frame boundary 0143 vjx(:,3)=dvs(vjx(:,1)); % number of samples from previous frame boundary 0144 vjx(1:nf,4)=vs(min(1+vjx(1:nf,2),nvs),3); % VAD result for samples between prev frame boundary and this one 0145 vjx(nf+1:end,4)=vs(:,3); % VAD result for samples between prev frame boundary and this one 0146 vjx(1:nf,5)=1:nf; % SNR frame to accumulate into 0147 vjx(vjx(nf+1:end,2)>=nf,3)=0; % zap any VAD frame beyond the last snr frame 0148 vjx(nf+1:end,5)=min(vjx(nf+1:end,2)+1,nf); % SNR frame to accumulate into 0149 vf=full(sparse(1,vjx(:,5),vjx(:,3).*vjx(:,4),1,nf))>kf/2; % accumulate into SNR frames and compare with threshold 0150 else % default is 'V': use P.56 VAD to discard silent frames 0151 [lev,af,fso,vad]=v_activlev(r,fs); % do VAD on reference signal 0152 vf=sum(reshape(vad(mq+1:ifl),kf,nf),1)>kf/2; % find frames that are mostly active 0153 tc=((1:nf)*kf+(1-kf)/2)/fs; % time at centre of frame 0154 end 0155 seg=mean(snf(vf)); 0156 glo=10*log10(sum(rf(vf))/sum(ef(vf))); 0157 if ~nargout || any (m=='p') 0158 subplot(312); % plot reference signal 0159 plot((1:length(r))/fs,r); 0160 ylabel('Reference'); 0161 axh(2)=gca; 0162 subplot(313); % plot SNR + global and segmental means 0163 snv=snf; 0164 snv(~vf)=NaN; % plot valid frames in blue 0165 snu=snf; 0166 snu(vf>0)=NaN; % plot invalid frames in red 0167 plot([1 nr]/fs,[glo seg; glo seg],':k',tc,snv,'-b',tc,snu,'-r'); 0168 ylabel('Frame SNR'); 0169 xlabel('Time (s)'); 0170 set(gca,'ylim',min([snv(:);snu(:)])+[-3 60]); % restrict displayed SNR range 0171 axh(3)=gca; 0172 subplot(311); % plot test signal 0173 plot((1:length(s))/fs,s); 0174 ylabel('Signal'); 0175 title(sprintf('SNR = %.1f dB, SNR_{seg} = %.1f dB',glo,seg)); 0176 axh(1)=gca; 0177 linkaxes(axh,'x'); % link time axes 0178 linkaxes(axh(1:2),'y'); % link signal amplitude axes 0179 set(gca,'xlim',[1 nr]/fs); 0180 end