v_snrseg

PURPOSE

V_SNRSEG Measure segmental and global SNR [SEG,GLO]=(S,R,FS,M,TF)

SYNOPSIS

function [seg,glo]=v_snrseg(s,r,fs,m,tf)

DESCRIPTION

```V_SNRSEG Measure segmental and global SNR [SEG,GLO]=(S,R,FS,M,TF)

Usage: (1) seg=v_snrseg(s,r,fs);                  % s & r are noisy and clean signal
(2) seg=v_snrseg(s,r,fs,'wz');             % no VAD or inerpolation used ['Vq' is default]
(3) [seg,glo]=v_snrseg(s,r,fs,'Vq',0.03);  % 30 ms frames

Inputs:    s  test signal
r  reference signal
fs  sample frequency (Hz)
m  mode [default = 'Vq']
w = No VAD - use whole file
a = A-weight the signals
b = weight signals by BS-468
q = use quadratic interpolation to remove delays +- 1 sample
z = do not do any alignment
p = plot results
tf  frame increment [0.01]

Outputs: seg = Segmental SNR in dB
glo = Global SNR in dB (typically 7 dB greater than SNR-seg)

This function compares a noisy signal, S, with a clean reference, R, and
computes the segemntal signal-to-noise ratio (SNR) in dB. The signals,
which must be of the same length, are split into non-overlapping frames
of length TF (default 10 ms) and the SNR of each frame in dB is calculated.
The segmental SNR is the average of these values, i.e.
SEG = mean(10*log10(sum(Ri^2)/sum((Si-Ri)^2))
where the mean is over frames and the sum runs over one particular frame.
Two optional modifications can be made to this basic formula:

(a) Frames are excluded if there is no significant energy in the R
signal. The idea is to limit the calculation to frames in which
speech is active. By default, the v_voicebox function "v_activlev" is
used to detect the inactive frames (the 'V' mode option).

(b) In each frame independently, the reference signal is shifted by up
to +- 1 sample to find the alignment than minimizes the noise
component (S-R)^2. This shifting accounts for small misalignments
and/or sample frequency differences between the two signals. For
larger shifts, you can use the v_voicebox function "v_sigalign".
Accurate alignemnt is especially important at high SNR values.

If no M argument is specified, both these modifications will be applied;
this is equivalent to specifying M='Vq'.```

CROSS-REFERENCE INFORMATION

This function calls:
• v_activlev V_ACTIVLEV Measure active speech level as in ITU-T P.56 [LEV,AF,FSO]=(sp,FS,MODE)
• v_stdspectrum V_STDSPECTRUM Generate standard acoustic/speech spectra in s- or z-domain [B,A,SI,SN]=(S,M,F,N,ZI,BS,AS)
This function is called by:

SOURCE CODE

```0001 function [seg,glo]=v_snrseg(s,r,fs,m,tf)
0002 %V_SNRSEG Measure segmental and global SNR [SEG,GLO]=(S,R,FS,M,TF)
0003 %
0004 %Usage: (1) seg=v_snrseg(s,r,fs);                  % s & r are noisy and clean signal
0005 %       (2) seg=v_snrseg(s,r,fs,'wz');             % no VAD or inerpolation used ['Vq' is default]
0006 %       (3) [seg,glo]=v_snrseg(s,r,fs,'Vq',0.03);  % 30 ms frames
0007 %
0008 % Inputs:    s  test signal
0009 %            r  reference signal
0010 %           fs  sample frequency (Hz)
0011 %            m  mode [default = 'Vq']
0012 %                 w = No VAD - use whole file
0014 %                 V = use P.56-based VAD to discard silent portions [default]
0015 %                 a = A-weight the signals
0016 %                 b = weight signals by BS-468
0017 %                 q = use quadratic interpolation to remove delays +- 1 sample
0018 %                 z = do not do any alignment
0019 %                 p = plot results
0020 %           tf  frame increment [0.01]
0021 %
0022 % Outputs: seg = Segmental SNR in dB
0023 %          glo = Global SNR in dB (typically 7 dB greater than SNR-seg)
0024 %
0025 % This function compares a noisy signal, S, with a clean reference, R, and
0026 % computes the segemntal signal-to-noise ratio (SNR) in dB. The signals,
0027 % which must be of the same length, are split into non-overlapping frames
0028 % of length TF (default 10 ms) and the SNR of each frame in dB is calculated.
0029 % The segmental SNR is the average of these values, i.e.
0030 %         SEG = mean(10*log10(sum(Ri^2)/sum((Si-Ri)^2))
0031 % where the mean is over frames and the sum runs over one particular frame.
0032 % Two optional modifications can be made to this basic formula:
0033 %
0034 %    (a) Frames are excluded if there is no significant energy in the R
0035 %        signal. The idea is to limit the calculation to frames in which
0036 %        speech is active. By default, the v_voicebox function "v_activlev" is
0037 %        used to detect the inactive frames (the 'V' mode option).
0038 %
0039 %    (b) In each frame independently, the reference signal is shifted by up
0040 %        to +- 1 sample to find the alignment than minimizes the noise
0041 %        component (S-R)^2. This shifting accounts for small misalignments
0042 %        and/or sample frequency differences between the two signals. For
0043 %        larger shifts, you can use the v_voicebox function "v_sigalign".
0044 %        Accurate alignemnt is especially important at high SNR values.
0045 %
0046 % If no M argument is specified, both these modifications will be applied;
0047 % this is equivalent to specifying M='Vq'.
0048
0049 % Bugs/suggestions
0050 % (1) Optionally restrict the bandwidth to the smaller of the two
0051 %     bandwidths either with an extra parameter or automatically determined
0052
0053 %      Copyright (C) Mike Brookes 2011
0054 %      Version: \$Id: v_snrseg.m 10865 2018-09-21 17:22:45Z dmb \$
0055 %
0056 %   VOICEBOX is a MATLAB toolbox for speech processing.
0058 %
0059 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0060 %   This program is free software; you can redistribute it and/or modify
0062 %   the Free Software Foundation; either version 2 of the License, or
0063 %   (at your option) any later version.
0064 %
0065 %   This program is distributed in the hope that it will be useful,
0066 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0067 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0068 %   GNU General Public License for more details.
0069 %
0070 %   You can obtain a copy of the GNU General Public License from
0071 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0072 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0073 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0074
0075 if nargin<4 || ~ischar(m)
0076     m='Vq';
0077 end
0078 if nargin<5 || ~numel(tf)
0079     tf=0.01; % default frame length is 10 ms
0080 end
0081 snmax=100;  % clipping limit for SNR
0082
0083 % filter the input signals if required
0084
0085 if any(m=='a')  % A-weighting
0086     [b,a]=v_stdspectrum(2,'z',fs);
0087     s=filter(b,a,s);
0088     r=filter(b,a,r);
0089 elseif any(m=='b') %  BS-468 weighting
0090     [b,a]=v_stdspectrum(8,'z',fs);
0091     s=filter(b,a,s);
0092     r=filter(b,a,r);
0093 end
0094
0095 mq=~any(m=='z');
0096 nr=min(length(r), length(s));
0097 kf=round(tf*fs); % length of frame in samples
0098 ifr=kf+mq:kf:nr-mq; % ending sample of each frame
0099 ifl=ifr(end);
0100 nf=numel(ifr); % number of frames
0101 if mq % perform interpolation
0102     ssm=reshape(s(2:ifl)-s(3:ifl+1),kf,nf);
0103     ssp=reshape(s(2:ifl)-s(1:ifl-1),kf,nf);
0104     sr=reshape(s(2:ifl)-r(2:ifl),kf,nf);
0105     am=min(max(sum(sr.*ssm,1)./sum(ssm.^2,1),0),1); %optimum dist between s(2:ifl) and s(3:ifl+1)
0106     ap=min(max(sum(sr.*ssp,1)./sum(ssp.^2,1),0),1); %optimum dist between s(2:ifl) and s(1:ifl-1)
0107     ef=min(sum((sr-repmat(am,kf,1).*ssm).^2,1),sum((sr-repmat(ap,kf,1).*ssp).^2,1)); % select the best for each frame
0108 else % no interpolation
0109     ef=sum(reshape((s(1:ifl)-r(1:ifl)).^2,kf,nf),1);
0110 end
0111 rf=sum(reshape(r(mq+1:ifl).^2,kf,nf),1);
0112 em=ef==0; % mask for zero noise frames
0113 rm=rf==0; % mask for zero reference frames
0114 snf=10*log10((rf+rm)./(ef+em));
0115 snf(rm)=-snmax;
0116 snf(em)=snmax;
0117
0118 % select the frames to include
0119
0120 if any(m=='w')
0121     vf=true(1,nf); % include all frames
0122 elseif any(m=='v')
0124     nvs=length(vs);
0125     [vss,vix]=sort([ifr'; vs(:,2)]);
0126     vjx=zeros(nvs+nf,5);
0127     vjx(vix,1)=(1:nvs+nf)'; % sorted position
0128     vjx(1:nf,2)=vjx(1:nf,1)-(1:nf)'; % prev VAD frame end (or 0 or nvs+1 if none)
0129     vjx(nf+1:end,2)=vjx(nf+1:end,1)-(1:nvs)'; % prev snr frame end (or 0 or nvs+1 if none)
0130     dvs=[vss(1)-mq; vss(2:end)-vss(1:end-1)];  % number of samples from previous frame boundary
0131     vjx(:,3)=dvs(vjx(:,1)); % number of samples from previous frame boundary
0132     vjx(1:nf,4)=vs(min(1+vjx(1:nf,2),nvs),3); % VAD result for samples between prev frame boundary and this one
0133     vjx(nf+1:end,4)=vs(:,3); % VAD result for samples between prev frame boundary and this one
0134     vjx(1:nf,5)=1:nf; % SNR frame to accumulate into
0135     vjx(vjx(nf+1:end,2)>=nf,3)=0;  % zap any VAD frame beyond the last snr frame
0136     vjx(nf+1:end,5)=min(vjx(nf+1:end,2)+1,nf); % SNR frame to accumulate into
0137     vf=full(sparse(1,vjx(:,5),vjx(:,3).*vjx(:,4),1,nf))>kf/2; % accumulate into SNR frames and compare with threshold
0138 else  % default is 'V'
0140     vf=sum(reshape(vad(mq+1:ifl),kf,nf),1)>kf/2; % find frames that are mostly active
0141 end
0142 seg=mean(snf(vf));
0143 glo=10*log10(sum(rf(vf))/sum(ef(vf)));
0144
0145 if ~nargout || any (m=='p')
0146     subplot(311);
0147     plot((1:length(s))/fs,s);
0148     ylabel('Signal');
0149     title(sprintf('SNR = %.1f dB, SNR_{seg} = %.1f dB',glo,seg));
0150     axh(1)=gca;
0151     subplot(312);
0152     plot((1:length(r))/fs,r);
0153     ylabel('Reference');
0154     axh(2)=gca;
0155     subplot(313);
0156     snv=snf;
0157     snv(~vf)=NaN;
0158     snu=snf;
0159     snu(vf>0)=NaN;
0160     plot([1 nr]/fs,[glo seg; glo seg],':k',((1:nf)*kf+(1-kf)/2)/fs,snv,'-b',((1:nf)*kf+(1-kf)/2)/fs,snu,'-r');
0161     ylabel('Frame SNR');
0162     xlabel('Time (s)');
0163     axh(3)=gca;