


SNRSEG Measure segmental and global SNR [SEG,GLO]=(S,R,FS,M,TF)
Usage: (1) seg=snrseg(s,r,fs); % s & r are noisy and clean signal
(2) seg=snrseg(s,r,fs,'wz'); % no VAD or inerpolation used ['Vq' is default]
(3) [seg,glo]=snrseg(s,r,fs,'Vq',0.03); % 30 ms frames
Inputs: s test signal
r reference signal
fs sample frequency (Hz)
m mode [default = 'V']
w = No VAD - use whole file
v = use sohn VAD to discard silent portions
V = use P.56-based VAD to discard silent portions [default]
a = A-weight the signals
b = weight signals by BS-468
q = use quadratic interpolation to remove delays +- 1 sample
z = do not do any alignment
p = plot results
tf frame increment [0.01]
Outputs: seg = Segmental SNR in dB
glo = Global SNR in dB (typically 7 dB greater than SNR-seg)
This function compares a noisy signal, S, with a clean reference, R, and
computes the segemntal signal-to-noise ratio (SNR) in dB. The signals,
which must be of the same length, are split into non-overlapping frames
of length TF (default 10 ms) and the SNR of each frame in dB is calculated.
The segmental SNR is the average of these values, i.e.
SEG = mean(10*log10(sum(Ri^2)/sum((Si-Ri)^2))
where the mean is over frames and the sum runs over one particular frame.
Two optional modifications can be made to this basic formula:
(a) Frames are excluded if there is no significant energy in the R
signal. The idea is to limit the calculation to frames in which
speech is active. By default, the voicebox function "activlev" is
used to detect the inactive frames (the 'V' mode option).
(b) In each frame independently, the reference signal is shifted by up
to +- 1 sample to find the alignment than minimizes the noise
component (S-R)^2. This shifting accounts for small misalignments
and/or sample frequency differences between the two signals. For
larger shifts, you can use the voicebox function "sigalign".
Accurate alignemnt is especially important at high SNR values.
If no M argument is specified, both these modifications will be applied;
this is equivalent to specifying M='Vq'.

0001 function [seg,glo]=snrseg(s,r,fs,m,tf) 0002 %SNRSEG Measure segmental and global SNR [SEG,GLO]=(S,R,FS,M,TF) 0003 % 0004 %Usage: (1) seg=snrseg(s,r,fs); % s & r are noisy and clean signal 0005 % (2) seg=snrseg(s,r,fs,'wz'); % no VAD or inerpolation used ['Vq' is default] 0006 % (3) [seg,glo]=snrseg(s,r,fs,'Vq',0.03); % 30 ms frames 0007 % 0008 % Inputs: s test signal 0009 % r reference signal 0010 % fs sample frequency (Hz) 0011 % m mode [default = 'V'] 0012 % w = No VAD - use whole file 0013 % v = use sohn VAD to discard silent portions 0014 % V = use P.56-based VAD to discard silent portions [default] 0015 % a = A-weight the signals 0016 % b = weight signals by BS-468 0017 % q = use quadratic interpolation to remove delays +- 1 sample 0018 % z = do not do any alignment 0019 % p = plot results 0020 % tf frame increment [0.01] 0021 % 0022 % Outputs: seg = Segmental SNR in dB 0023 % glo = Global SNR in dB (typically 7 dB greater than SNR-seg) 0024 % 0025 % This function compares a noisy signal, S, with a clean reference, R, and 0026 % computes the segemntal signal-to-noise ratio (SNR) in dB. The signals, 0027 % which must be of the same length, are split into non-overlapping frames 0028 % of length TF (default 10 ms) and the SNR of each frame in dB is calculated. 0029 % The segmental SNR is the average of these values, i.e. 0030 % SEG = mean(10*log10(sum(Ri^2)/sum((Si-Ri)^2)) 0031 % where the mean is over frames and the sum runs over one particular frame. 0032 % Two optional modifications can be made to this basic formula: 0033 % 0034 % (a) Frames are excluded if there is no significant energy in the R 0035 % signal. The idea is to limit the calculation to frames in which 0036 % speech is active. By default, the voicebox function "activlev" is 0037 % used to detect the inactive frames (the 'V' mode option). 0038 % 0039 % (b) In each frame independently, the reference signal is shifted by up 0040 % to +- 1 sample to find the alignment than minimizes the noise 0041 % component (S-R)^2. This shifting accounts for small misalignments 0042 % and/or sample frequency differences between the two signals. For 0043 % larger shifts, you can use the voicebox function "sigalign". 0044 % Accurate alignemnt is especially important at high SNR values. 0045 % 0046 % If no M argument is specified, both these modifications will be applied; 0047 % this is equivalent to specifying M='Vq'. 0048 0049 % Bugs/suggestions 0050 % (1) Optionally restrict the bandwidth to the smaller of the two 0051 % bandwidths either with an extra parameter or automatically determined 0052 0053 % Copyright (C) Mike Brookes 2011 0054 % Version: $Id: snrseg.m 713 2011-10-16 14:45:43Z dmb $ 0055 % 0056 % VOICEBOX is a MATLAB toolbox for speech processing. 0057 % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html 0058 % 0059 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0060 % This program is free software; you can redistribute it and/or modify 0061 % it under the terms of the GNU General Public License as published by 0062 % the Free Software Foundation; either version 2 of the License, or 0063 % (at your option) any later version. 0064 % 0065 % This program is distributed in the hope that it will be useful, 0066 % but WITHOUT ANY WARRANTY; without even the implied warranty of 0067 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 0068 % GNU General Public License for more details. 0069 % 0070 % You can obtain a copy of the GNU General Public License from 0071 % http://www.gnu.org/copyleft/gpl.html or by writing to 0072 % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. 0073 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0074 0075 if nargin<4 || ~ischar(m) 0076 m='Vq'; 0077 end 0078 if nargin<5 || ~numel(tf) 0079 tf=0.01; % default frame length is 10 ms 0080 end 0081 snmax=100; % clipping limit for SNR 0082 0083 % filter the input signals if required 0084 0085 if any(m=='a') % A-weighting 0086 [b,a]=stdspectrum(2,'z',fs); 0087 s=filter(b,a,s); 0088 r=filter(b,a,r); 0089 elseif any(m=='b') % BS-468 weighting 0090 [b,a]=stdspectrum(8,'z',fs); 0091 s=filter(b,a,s); 0092 r=filter(b,a,r); 0093 end 0094 0095 mq=~any(m=='z'); 0096 nr=min(length(r), length(s)); 0097 kf=round(tf*fs); % length of frame in samples 0098 ifr=kf+mq:kf:nr-mq; % ending sample of each frame 0099 ifl=ifr(end); 0100 nf=numel(ifr); 0101 rf=sum(reshape(r(mq+1:ifl).^2,kf,nf),1); 0102 ef=sum(reshape((s(mq+1:ifl)-r(mq+1:ifl)).^2,kf,nf),1); 0103 if mq 0104 efm=sum(reshape((s(3:ifl+1)-r(2:ifl)).^2,kf,nf),1); 0105 efp=sum(reshape((s(1:ifl-1)-r(2:ifl)).^2,kf,nf),1); 0106 efa=0.5*(efp+efm)-ef; 0107 efb=0.5*(efp-efm); 0108 efmk=(abs(efb)<2*efa) & (efa>0); % mask for frames with a valid minimum 0109 if any(efmk) 0110 ef(efmk)=ef(efmk)-0.25*efb(efmk).^2./efa(efmk); 0111 end 0112 ef=min(min(ef,efm),efp); 0113 end 0114 0115 em=ef==0; % mask for zero noise frames 0116 rm=rf==0; % mask for zero reference frames 0117 snf=10*log10((rf+rm)./(ef+em)); 0118 snf(rm)=-snmax; 0119 snf(em)=snmax; 0120 0121 % select the frames to include 0122 0123 if any(m=='w') 0124 vf=ones(1,nf); % include all frames 0125 elseif any(m=='v'); 0126 vs=vadsohn(r,fs,'na'); 0127 nvs=length(vs); 0128 [vss,vix]=sort([ifr'; vs(:,2)]); 0129 vjx=zeros(nvs+nf,5); 0130 vjx(vix,1)=(1:nvs+nf)'; % sorted position 0131 vjx(1:nf,2)=vjx(1:nf,1)-(1:nf)'; % prev VAD frame end (or 0 or nvs+1 if none) 0132 vjx(nf+1:end,2)=vjx(nf+1:end,1)-(1:nvs)'; % prev snr frame end (or 0 or nvs+1 if none) 0133 dvs=[vss(1)-mq; vss(2:end)-vss(1:end-1)]; % number of samples from previous frame boundary 0134 vjx(:,3)=dvs(vjx(:,1)); % number of samples from previous frame boundary 0135 vjx(1:nf,4)=vs(min(1+vjx(1:nf,2),nvs),3); % VAD result for samples between prev frame boundary and this one 0136 vjx(nf+1:end,4)=vs(:,3); % VAD result for samples between prev frame boundary and this one 0137 vjx(1:nf,5)=1:nf; % SNR frame to accumulte into 0138 vjx(vjx(nf+1:end,2)>=nf,3)=0; % zap any VAD frame beyond the last snr fram 0139 vjx(nf+1:end,5)=min(vjx(nf+1:end,2)+1,nf); % SNR frame to accumulate into 0140 vf=full(sparse(1,vjx(:,5),vjx(:,3).*vjx(:,4),1,nf))>kf/2; % accumulate into SNR frames and compare with threshold 0141 else % default is 'V' 0142 [lev,af,fso,vad]=activlev(r,fs); % do VAD on reference signal 0143 vf=sum(reshape(vad(mq+1:ifl),kf,nf),1)>kf/2; % find frames that are mostly active 0144 end 0145 seg=mean(snf(vf)); 0146 glo=10*log10(sum(rf(vf))/sum(ef(vf))); 0147 0148 if ~nargout || any (m=='p') 0149 subplot(311); 0150 plot((1:length(s))/fs,s); 0151 ylabel('Signal'); 0152 title(sprintf('SNR = %.1f dB, SNR_{seg} = %.1f dB',glo,seg)); 0153 axh(1)=gca; 0154 subplot(312); 0155 plot((1:length(r))/fs,r); 0156 ylabel('Reference'); 0157 axh(2)=gca; 0158 subplot(313); 0159 snv=snf; 0160 snv(~vf)=NaN; 0161 snu=snf; 0162 snu(vf>0)=NaN; 0163 plot([1 nr]/fs,[glo seg; glo seg],':k',((1:nf)*kf+(1-kf)/2)/fs,snv,'-b',((1:nf)*kf+(1-kf)/2)/fs,snu,'-r'); 0164 ylabel('Frame SNR'); 0165 xlabel('Time (s)'); 0166 axh(3)=gca; 0167 linkaxes(axh,'x'); 0168 end 0169