Home > voicebox > snrseg.m

# snrseg

## PURPOSE

SNRSEG Measure segmental and global SNR [SEG,GLO]=(S,R,FS,M,TF)

## SYNOPSIS

function [seg,glo]=snrseg(s,r,fs,m,tf)

## DESCRIPTION

```SNRSEG Measure segmental and global SNR [SEG,GLO]=(S,R,FS,M,TF)

Usage: (1) seg=snrseg(s,r,fs);                  % s & r are noisy and clean signal
(2) seg=snrseg(s,r,fs,'wz');             % no VAD or inerpolation used ['Vq' is default]
(3) [seg,glo]=snrseg(s,r,fs,'Vq',0.03);  % 30 ms frames

Inputs:    s  test signal
r  reference signal
fs  sample frequency (Hz)
m  mode [default = 'V']
w = No VAD - use whole file
a = A-weight the signals
b = weight signals by BS-468
q = use quadratic interpolation to remove delays +- 1 sample
z = do not do any alignment
p = plot results
tf  frame increment [0.01]

Outputs: seg = Segmental SNR in dB
glo = Global SNR in dB (typically 7 dB greater than SNR-seg)

This function compares a noisy signal, S, with a clean reference, R, and
computes the segemntal signal-to-noise ratio (SNR) in dB. The signals,
which must be of the same length, are split into non-overlapping frames
of length TF (default 10 ms) and the SNR of each frame in dB is calculated.
The segmental SNR is the average of these values, i.e.
SEG = mean(10*log10(sum(Ri^2)/sum((Si-Ri)^2))
where the mean is over frames and the sum runs over one particular frame.
Two optional modifications can be made to this basic formula:

(a) Frames are excluded if there is no significant energy in the R
signal. The idea is to limit the calculation to frames in which
speech is active. By default, the voicebox function "activlev" is
used to detect the inactive frames (the 'V' mode option).

(b) In each frame independently, the reference signal is shifted by up
to +- 1 sample to find the alignment than minimizes the noise
component (S-R)^2. This shifting accounts for small misalignments
and/or sample frequency differences between the two signals. For
larger shifts, you can use the voicebox function "sigalign".
Accurate alignemnt is especially important at high SNR values.

If no M argument is specified, both these modifications will be applied;
this is equivalent to specifying M='Vq'.```

## CROSS-REFERENCE INFORMATION

This function calls:
• activlev ACTIVLEV Measure active speech level as in ITU-T P.56 [LEV,AF,FSO]=(sp,FS,MODE)
• stdspectrum STDSPECTRUM Generate standard acoustic/speech spectra in s- or z-domain [B,A,SI,SN]=(S,M,F,N,ZI,BS,AS)
This function is called by:

## SOURCE CODE

```0001 function [seg,glo]=snrseg(s,r,fs,m,tf)
0002 %SNRSEG Measure segmental and global SNR [SEG,GLO]=(S,R,FS,M,TF)
0003 %
0004 %Usage: (1) seg=snrseg(s,r,fs);                  % s & r are noisy and clean signal
0005 %       (2) seg=snrseg(s,r,fs,'wz');             % no VAD or inerpolation used ['Vq' is default]
0006 %       (3) [seg,glo]=snrseg(s,r,fs,'Vq',0.03);  % 30 ms frames
0007 %
0008 % Inputs:    s  test signal
0009 %            r  reference signal
0010 %           fs  sample frequency (Hz)
0011 %            m  mode [default = 'V']
0012 %                 w = No VAD - use whole file
0014 %                 V = use P.56-based VAD to discard silent portions [default]
0015 %                 a = A-weight the signals
0016 %                 b = weight signals by BS-468
0017 %                 q = use quadratic interpolation to remove delays +- 1 sample
0018 %                 z = do not do any alignment
0019 %                 p = plot results
0020 %           tf  frame increment [0.01]
0021 %
0022 % Outputs: seg = Segmental SNR in dB
0023 %          glo = Global SNR in dB (typically 7 dB greater than SNR-seg)
0024 %
0025 % This function compares a noisy signal, S, with a clean reference, R, and
0026 % computes the segemntal signal-to-noise ratio (SNR) in dB. The signals,
0027 % which must be of the same length, are split into non-overlapping frames
0028 % of length TF (default 10 ms) and the SNR of each frame in dB is calculated.
0029 % The segmental SNR is the average of these values, i.e.
0030 %         SEG = mean(10*log10(sum(Ri^2)/sum((Si-Ri)^2))
0031 % where the mean is over frames and the sum runs over one particular frame.
0032 % Two optional modifications can be made to this basic formula:
0033 %
0034 %    (a) Frames are excluded if there is no significant energy in the R
0035 %        signal. The idea is to limit the calculation to frames in which
0036 %        speech is active. By default, the voicebox function "activlev" is
0037 %        used to detect the inactive frames (the 'V' mode option).
0038 %
0039 %    (b) In each frame independently, the reference signal is shifted by up
0040 %        to +- 1 sample to find the alignment than minimizes the noise
0041 %        component (S-R)^2. This shifting accounts for small misalignments
0042 %        and/or sample frequency differences between the two signals. For
0043 %        larger shifts, you can use the voicebox function "sigalign".
0044 %        Accurate alignemnt is especially important at high SNR values.
0045 %
0046 % If no M argument is specified, both these modifications will be applied;
0047 % this is equivalent to specifying M='Vq'.
0048
0049 % Bugs/suggestions
0050 % (1) Optionally restrict the bandwidth to the smaller of the two
0051 %     bandwidths either with an extra parameter or automatically determined
0052
0053 %      Copyright (C) Mike Brookes 2011
0054 %      Version: \$Id: snrseg.m 10712 2018-08-06 13:46:52Z dmb \$
0055 %
0056 %   VOICEBOX is a MATLAB toolbox for speech processing.
0058 %
0059 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0060 %   This program is free software; you can redistribute it and/or modify
0062 %   the Free Software Foundation; either version 2 of the License, or
0063 %   (at your option) any later version.
0064 %
0065 %   This program is distributed in the hope that it will be useful,
0066 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0067 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0068 %   GNU General Public License for more details.
0069 %
0070 %   You can obtain a copy of the GNU General Public License from
0071 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0072 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0073 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0074
0075 if nargin<4 || ~ischar(m)
0076     m='Vq';
0077 end
0078 if nargin<5 || ~numel(tf)
0079     tf=0.01; % default frame length is 10 ms
0080 end
0081 snmax=100;  % clipping limit for SNR
0082
0083 % filter the input signals if required
0084
0085 if any(m=='a')  % A-weighting
0086     [b,a]=stdspectrum(2,'z',fs);
0087     s=filter(b,a,s);
0088     r=filter(b,a,r);
0089 elseif any(m=='b') %  BS-468 weighting
0090     [b,a]=stdspectrum(8,'z',fs);
0091     s=filter(b,a,s);
0092     r=filter(b,a,r);
0093 end
0094
0095 mq=~any(m=='z');
0096 nr=min(length(r), length(s));
0097 kf=round(tf*fs); % length of frame in samples
0098 ifr=kf+mq:kf:nr-mq; % ending sample of each frame
0099 ifl=ifr(end);
0100 nf=numel(ifr); % number of frames
0101 if mq % perform interpolation
0102     ssm=reshape(s(2:ifl)-s(3:ifl+1),kf,nf);
0103     ssp=reshape(s(2:ifl)-s(1:ifl-1),kf,nf);
0104     sr=reshape(s(2:ifl)-r(2:ifl),kf,nf);
0105     am=min(max(sum(sr.*ssm,1)./sum(ssm.^2,1),0),1); %optimum dist between s(2:ifl) and s(3:ifl+1)
0106     ap=min(max(sum(sr.*ssp,1)./sum(ssp.^2,1),0),1); %optimum dist between s(2:ifl) and s(1:ifl-1)
0107     ef=min(sum((sr-repmat(am,kf,1).*ssm).^2,1),sum((sr-repmat(ap,kf,1).*ssp).^2,1)); % select the best for each frame
0108 else % no interpolation
0109     ef=sum(reshape((s(1:ifl)-r(1:ifl)).^2,kf,nf),1);
0110 end
0111 rf=sum(reshape(r(mq+1:ifl).^2,kf,nf),1);
0112 em=ef==0; % mask for zero noise frames
0113 rm=rf==0; % mask for zero reference frames
0114 snf=10*log10((rf+rm)./(ef+em));
0115 snf(rm)=-snmax;
0116 snf(em)=snmax;
0117
0118 % select the frames to include
0119
0120 if any(m=='w')
0121     vf=true(1,nf); % include all frames
0122 elseif any(m=='v')
0124     nvs=length(vs);
0125     [vss,vix]=sort([ifr'; vs(:,2)]);
0126     vjx=zeros(nvs+nf,5);
0127     vjx(vix,1)=(1:nvs+nf)'; % sorted position
0128     vjx(1:nf,2)=vjx(1:nf,1)-(1:nf)'; % prev VAD frame end (or 0 or nvs+1 if none)
0129     vjx(nf+1:end,2)=vjx(nf+1:end,1)-(1:nvs)'; % prev snr frame end (or 0 or nvs+1 if none)
0130     dvs=[vss(1)-mq; vss(2:end)-vss(1:end-1)];  % number of samples from previous frame boundary
0131     vjx(:,3)=dvs(vjx(:,1)); % number of samples from previous frame boundary
0132     vjx(1:nf,4)=vs(min(1+vjx(1:nf,2),nvs),3); % VAD result for samples between prev frame boundary and this one
0133     vjx(nf+1:end,4)=vs(:,3); % VAD result for samples between prev frame boundary and this one
0134     vjx(1:nf,5)=1:nf; % SNR frame to accumulate into
0135     vjx(vjx(nf+1:end,2)>=nf,3)=0;  % zap any VAD frame beyond the last snr frame
0136     vjx(nf+1:end,5)=min(vjx(nf+1:end,2)+1,nf); % SNR frame to accumulate into
0137     vf=full(sparse(1,vjx(:,5),vjx(:,3).*vjx(:,4),1,nf))>kf/2; % accumulate into SNR frames and compare with threshold
0138 else  % default is 'V'
0140     vf=sum(reshape(vad(mq+1:ifl),kf,nf),1)>kf/2; % find frames that are mostly active
0141 end
0142 seg=mean(snf(vf));
0143 glo=10*log10(sum(rf(vf))/sum(ef(vf)));
0144
0145 if ~nargout || any (m=='p')
0146     subplot(311);
0147     plot((1:length(s))/fs,s);
0148     ylabel('Signal');
0149     title(sprintf('SNR = %.1f dB, SNR_{seg} = %.1f dB',glo,seg));
0150     axh(1)=gca;
0151     subplot(312);
0152     plot((1:length(r))/fs,r);
0153     ylabel('Reference');
0154     axh(2)=gca;
0155     subplot(313);
0156     snv=snf;
0157     snv(~vf)=NaN;
0158     snu=snf;
0159     snu(vf>0)=NaN;
0160     plot([1 nr]/fs,[glo seg; glo seg],':k',((1:nf)*kf+(1-kf)/2)/fs,snv,'-b',((1:nf)*kf+(1-kf)/2)/fs,snu,'-r');
0161     ylabel('Frame SNR');
0162     xlabel('Time (s)');
0163     axh(3)=gca;