Home > voicebox > snrseg.m

snrseg

PURPOSE ^

SNRSEG Measure segmental and global SNR [SEG,GLO]=(S,R,FS,M,TF)

SYNOPSIS ^

function [seg,glo]=snrseg(s,r,fs,m,tf)

DESCRIPTION ^

SNRSEG Measure segmental and global SNR [SEG,GLO]=(S,R,FS,M,TF)

Usage: (1) seg=snrseg(s,r,fs);                  % s & r are noisy and clean signal
       (2) seg=snrseg(s,r,fs,'wz');             % no VAD or inerpolation used ['Vq' is default]
       (3) [seg,glo]=snrseg(s,r,fs,'Vq',0.03);  % 30 ms frames

 Inputs:    s  test signal
            r  reference signal
           fs  sample frequency (Hz)
            m  mode [default = 'V']
                 w = No VAD - use whole file
                 v = use sohn VAD to discard silent portions
                 V = use P.56-based VAD to discard silent portions [default]
                 a = A-weight the signals
                 b = weight signals by BS-468
                 q = use quadratic interpolation to remove delays +- 1 sample
                 z = do not do any alignment
                 p = plot results
           tf  frame increment [0.01]

 Outputs: seg = Segmental SNR in dB
          glo = Global SNR in dB (typically 7 dB greater than SNR-seg)

 This function compares a noisy signal, S, with a clean reference, R, and
 computes the segemntal signal-to-noise ratio (SNR) in dB. The signals,
 which must be of the same length, are split into non-overlapping frames
 of length TF (default 10 ms) and the SNR of each frame in dB is calculated.
 The segmental SNR is the average of these values, i.e.
         SEG = mean(10*log10(sum(Ri^2)/sum((Si-Ri)^2))
 where the mean is over frames and the sum runs over one particular frame.
 Two optional modifications can be made to this basic formula:

    (a) Frames are excluded if there is no significant energy in the R
        signal. The idea is to limit the calculation to frames in which
        speech is active. By default, the voicebox function "activlev" is
        used to detect the inactive frames (the 'V' mode option).

    (b) In each frame independently, the reference signal is shifted by up
        to +- 1 sample to find the alignment than minimizes the noise
        component (S-R)^2. This shifting accounts for small misalignments
        and/or sample frequency differences between the two signals. For
        larger shifts, you can use the voicebox function "sigalign".
        Accurate alignemnt is especially important at high SNR values.

 If no M argument is specified, both these modifications will be applied;
 this is equivalent to specifying M='Vq'.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [seg,glo]=snrseg(s,r,fs,m,tf)
0002 %SNRSEG Measure segmental and global SNR [SEG,GLO]=(S,R,FS,M,TF)
0003 %
0004 %Usage: (1) seg=snrseg(s,r,fs);                  % s & r are noisy and clean signal
0005 %       (2) seg=snrseg(s,r,fs,'wz');             % no VAD or inerpolation used ['Vq' is default]
0006 %       (3) [seg,glo]=snrseg(s,r,fs,'Vq',0.03);  % 30 ms frames
0007 %
0008 % Inputs:    s  test signal
0009 %            r  reference signal
0010 %           fs  sample frequency (Hz)
0011 %            m  mode [default = 'V']
0012 %                 w = No VAD - use whole file
0013 %                 v = use sohn VAD to discard silent portions
0014 %                 V = use P.56-based VAD to discard silent portions [default]
0015 %                 a = A-weight the signals
0016 %                 b = weight signals by BS-468
0017 %                 q = use quadratic interpolation to remove delays +- 1 sample
0018 %                 z = do not do any alignment
0019 %                 p = plot results
0020 %           tf  frame increment [0.01]
0021 %
0022 % Outputs: seg = Segmental SNR in dB
0023 %          glo = Global SNR in dB (typically 7 dB greater than SNR-seg)
0024 %
0025 % This function compares a noisy signal, S, with a clean reference, R, and
0026 % computes the segemntal signal-to-noise ratio (SNR) in dB. The signals,
0027 % which must be of the same length, are split into non-overlapping frames
0028 % of length TF (default 10 ms) and the SNR of each frame in dB is calculated.
0029 % The segmental SNR is the average of these values, i.e.
0030 %         SEG = mean(10*log10(sum(Ri^2)/sum((Si-Ri)^2))
0031 % where the mean is over frames and the sum runs over one particular frame.
0032 % Two optional modifications can be made to this basic formula:
0033 %
0034 %    (a) Frames are excluded if there is no significant energy in the R
0035 %        signal. The idea is to limit the calculation to frames in which
0036 %        speech is active. By default, the voicebox function "activlev" is
0037 %        used to detect the inactive frames (the 'V' mode option).
0038 %
0039 %    (b) In each frame independently, the reference signal is shifted by up
0040 %        to +- 1 sample to find the alignment than minimizes the noise
0041 %        component (S-R)^2. This shifting accounts for small misalignments
0042 %        and/or sample frequency differences between the two signals. For
0043 %        larger shifts, you can use the voicebox function "sigalign".
0044 %        Accurate alignemnt is especially important at high SNR values.
0045 %
0046 % If no M argument is specified, both these modifications will be applied;
0047 % this is equivalent to specifying M='Vq'.
0048 
0049 % Bugs/suggestions
0050 % (1) Optionally restrict the bandwidth to the smaller of the two
0051 %     bandwidths either with an extra parameter or automatically determined
0052 
0053 %      Copyright (C) Mike Brookes 2011
0054 %      Version: $Id: snrseg.m 2953 2013-05-02 12:51:26Z dmb $
0055 %
0056 %   VOICEBOX is a MATLAB toolbox for speech processing.
0057 %   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
0058 %
0059 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0060 %   This program is free software; you can redistribute it and/or modify
0061 %   it under the terms of the GNU General Public License as published by
0062 %   the Free Software Foundation; either version 2 of the License, or
0063 %   (at your option) any later version.
0064 %
0065 %   This program is distributed in the hope that it will be useful,
0066 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0067 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0068 %   GNU General Public License for more details.
0069 %
0070 %   You can obtain a copy of the GNU General Public License from
0071 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0072 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0073 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0074 
0075 if nargin<4 || ~ischar(m)
0076     m='Vq';
0077 end
0078 if nargin<5 || ~numel(tf)
0079     tf=0.01; % default frame length is 10 ms
0080 end
0081 snmax=100;  % clipping limit for SNR
0082 
0083 % filter the input signals if required
0084 
0085 if any(m=='a')  % A-weighting
0086     [b,a]=stdspectrum(2,'z',fs);
0087     s=filter(b,a,s);
0088     r=filter(b,a,r);
0089 elseif any(m=='b') %  BS-468 weighting
0090     [b,a]=stdspectrum(8,'z',fs);
0091     s=filter(b,a,s);
0092     r=filter(b,a,r);
0093 end
0094 
0095 mq=~any(m=='z');
0096 nr=min(length(r), length(s));
0097 kf=round(tf*fs); % length of frame in samples
0098 ifr=kf+mq:kf:nr-mq; % ending sample of each frame
0099 ifl=ifr(end);
0100 nf=numel(ifr);
0101 rf=sum(reshape(r(mq+1:ifl).^2,kf,nf),1);
0102 ef=sum(reshape((s(mq+1:ifl)-r(mq+1:ifl)).^2,kf,nf),1);
0103 if mq
0104     efm=sum(reshape((s(3:ifl+1)-r(2:ifl)).^2,kf,nf),1);
0105     efp=sum(reshape((s(1:ifl-1)-r(2:ifl)).^2,kf,nf),1);
0106     efa=0.5*(efp+efm)-ef;
0107     efb=0.5*(efp-efm);
0108     efmk=(abs(efb)<2*efa) & (efa>0); % mask for frames with a valid minimum
0109     if any(efmk)
0110         ef(efmk)=ef(efmk)-0.25*efb(efmk).^2./efa(efmk);
0111     end
0112     ef=min(min(ef,efm),efp);
0113 end
0114 
0115 em=ef==0; % mask for zero noise frames
0116 rm=rf==0; % mask for zero reference frames
0117 snf=10*log10((rf+rm)./(ef+em));
0118 snf(rm)=-snmax;
0119 snf(em)=snmax;
0120 
0121 % select the frames to include
0122 
0123 if any(m=='w')
0124     vf=true(1,nf); % include all frames
0125 elseif any(m=='v');
0126     vs=vadsohn(r,fs,'na');
0127     nvs=length(vs);
0128     [vss,vix]=sort([ifr'; vs(:,2)]);
0129     vjx=zeros(nvs+nf,5);
0130     vjx(vix,1)=(1:nvs+nf)'; % sorted position
0131     vjx(1:nf,2)=vjx(1:nf,1)-(1:nf)'; % prev VAD frame end (or 0 or nvs+1 if none)
0132     vjx(nf+1:end,2)=vjx(nf+1:end,1)-(1:nvs)'; % prev snr frame end (or 0 or nvs+1 if none)
0133     dvs=[vss(1)-mq; vss(2:end)-vss(1:end-1)];  % number of samples from previous frame boundary
0134     vjx(:,3)=dvs(vjx(:,1)); % number of samples from previous frame boundary
0135     vjx(1:nf,4)=vs(min(1+vjx(1:nf,2),nvs),3); % VAD result for samples between prev frame boundary and this one
0136     vjx(nf+1:end,4)=vs(:,3); % VAD result for samples between prev frame boundary and this one
0137     vjx(1:nf,5)=1:nf; % SNR frame to accumulte into
0138     vjx(vjx(nf+1:end,2)>=nf,3)=0;  % zap any VAD frame beyond the last snr fram
0139     vjx(nf+1:end,5)=min(vjx(nf+1:end,2)+1,nf); % SNR frame to accumulate into
0140     vf=full(sparse(1,vjx(:,5),vjx(:,3).*vjx(:,4),1,nf))>kf/2; % accumulate into SNR frames and compare with threshold
0141 else  % default is 'V'
0142     [lev,af,fso,vad]=activlev(r,fs);    % do VAD on reference signal
0143     vf=sum(reshape(vad(mq+1:ifl),kf,nf),1)>kf/2; % find frames that are mostly active
0144 end
0145 seg=mean(snf(vf));
0146 glo=10*log10(sum(rf(vf))/sum(ef(vf)));
0147 
0148 if ~nargout || any (m=='p')
0149     subplot(311);
0150     plot((1:length(s))/fs,s);
0151     ylabel('Signal');
0152     title(sprintf('SNR = %.1f dB, SNR_{seg} = %.1f dB',glo,seg));
0153     axh(1)=gca;
0154     subplot(312);
0155     plot((1:length(r))/fs,r);
0156     ylabel('Reference');
0157     axh(2)=gca;
0158     subplot(313);
0159     snv=snf;
0160     snv(~vf)=NaN;
0161     snu=snf;
0162     snu(vf>0)=NaN;
0163     plot([1 nr]/fs,[glo seg; glo seg],':k',((1:nf)*kf+(1-kf)/2)/fs,snv,'-b',((1:nf)*kf+(1-kf)/2)/fs,snu,'-r');
0164     ylabel('Frame SNR');
0165     xlabel('Time (s)');
0166     axh(3)=gca;
0167     linkaxes(axh,'x');
0168 end
0169

Generated on Tue 19-Sep-2017 12:07:31 by m2html © 2003