Home > voicebox > activlev.m

activlev

PURPOSE ^

ACTIVLEV Measure active speech level as in ITU-T P.56 [LEV,AF,FSO]=(sp,FS,MODE)

SYNOPSIS ^

function [lev,af,fso,vad]=activlev(sp,fs,mode)

DESCRIPTION ^

ACTIVLEV Measure active speech level as in ITU-T P.56 [LEV,AF,FSO]=(sp,FS,MODE)

Usage: (1) lev=activlev(s,fs);     % speech level in units of power
       (2) db=activlev(s,fs,'d');  % speech level in dB
       (3) s=activlev(s,fs,'n');   % normalize active level to 0 dB

Inputs: sp     is the speech signal (with better than 20dB SNR)
        FS     is the sample frequency in Hz (see also FSO below)
        MODE   is a combination of the following:
               0 - omit high pass filter completely (i.e. include DC)
               3 - high pass filter at 30 Hz instead of 200 Hz (but allows mains hum to pass)
               4 - high pass filter at 40 Hz instead of 200 Hz (but allows mains hum to pass)
               1 - use cheybyshev 1 filter
               2 - use chebyshev 2 filter (default)
               e - use elliptic filter
               h - omit low pass filter at 5.5, 12 or 18 kHz
               w - use wideband filter frequencies: 70 Hz to 12 kHz
               W - use ultra wideband filter frequencies: 30 Hz to 18 kHz
               d - give outputs in dB rather than power
               n - output a normalized speech signal as the first argument
               N - output a normalized filtered speech signal as the first argument
               l - give both active and long-term power levels
               a - include A-weighting filter
               i - include ITU-R-BS.468/ITU-T-J.16 weighting filter
               z - do NOT zero-pad the signal by 0.35 s

Outputs:
    If the "n" option is specified, a speech signal normalized to 0dB will be given as
    the first output followed by the other outputs.
        LEV    gives the speech level in units of power (or dB if mode='d')
               if mode='l' is specified, LEV is a row vector with the "long term
               level" as its second element (this is just the mean power)
        AF     is the activity factor (or duty cycle) in the range 0 to 1
        FSO    is a column vector of intermediate information that allows
               you to process a speech signal in chunks. Thus:
                       fso=fs;
                       for i=1:inc:nsamp
                           [lev,af,fso]=activlev(sp(i:min(i+inc-1,nsamp)),fso,['z' mode]);
                       end
                       lev=activlev([],fso)
               is equivalent to:
                       lev=activlev(sp(1:nsamp),fs,mode)
               but is much slower. The two methods will not give identical results
               because they will use slightly different thresholds. Note you need
               the 'z' option for all calls except the last.
        VAD    is a boolean vector the same length as sp that acts as an approximate voice activity detector

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [lev,af,fso,vad]=activlev(sp,fs,mode)
0002 %ACTIVLEV Measure active speech level as in ITU-T P.56 [LEV,AF,FSO]=(sp,FS,MODE)
0003 %
0004 %Usage: (1) lev=activlev(s,fs);     % speech level in units of power
0005 %       (2) db=activlev(s,fs,'d');  % speech level in dB
0006 %       (3) s=activlev(s,fs,'n');   % normalize active level to 0 dB
0007 %
0008 %Inputs: sp     is the speech signal (with better than 20dB SNR)
0009 %        FS     is the sample frequency in Hz (see also FSO below)
0010 %        MODE   is a combination of the following:
0011 %               0 - omit high pass filter completely (i.e. include DC)
0012 %               3 - high pass filter at 30 Hz instead of 200 Hz (but allows mains hum to pass)
0013 %               4 - high pass filter at 40 Hz instead of 200 Hz (but allows mains hum to pass)
0014 %               1 - use cheybyshev 1 filter
0015 %               2 - use chebyshev 2 filter (default)
0016 %               e - use elliptic filter
0017 %               h - omit low pass filter at 5.5, 12 or 18 kHz
0018 %               w - use wideband filter frequencies: 70 Hz to 12 kHz
0019 %               W - use ultra wideband filter frequencies: 30 Hz to 18 kHz
0020 %               d - give outputs in dB rather than power
0021 %               n - output a normalized speech signal as the first argument
0022 %               N - output a normalized filtered speech signal as the first argument
0023 %               l - give both active and long-term power levels
0024 %               a - include A-weighting filter
0025 %               i - include ITU-R-BS.468/ITU-T-J.16 weighting filter
0026 %               z - do NOT zero-pad the signal by 0.35 s
0027 %
0028 %Outputs:
0029 %    If the "n" option is specified, a speech signal normalized to 0dB will be given as
0030 %    the first output followed by the other outputs.
0031 %        LEV    gives the speech level in units of power (or dB if mode='d')
0032 %               if mode='l' is specified, LEV is a row vector with the "long term
0033 %               level" as its second element (this is just the mean power)
0034 %        AF     is the activity factor (or duty cycle) in the range 0 to 1
0035 %        FSO    is a column vector of intermediate information that allows
0036 %               you to process a speech signal in chunks. Thus:
0037 %                       fso=fs;
0038 %                       for i=1:inc:nsamp
0039 %                           [lev,af,fso]=activlev(sp(i:min(i+inc-1,nsamp)),fso,['z' mode]);
0040 %                       end
0041 %                       lev=activlev([],fso)
0042 %               is equivalent to:
0043 %                       lev=activlev(sp(1:nsamp),fs,mode)
0044 %               but is much slower. The two methods will not give identical results
0045 %               because they will use slightly different thresholds. Note you need
0046 %               the 'z' option for all calls except the last.
0047 %        VAD    is a boolean vector the same length as sp that acts as an approximate voice activity detector
0048 
0049 %For completeness we list here the contents of the FSO structure:
0050 %
0051 %   ffs : sample frequency
0052 %   fmd : mode string
0053 %    nh : hangover time in samples
0054 %    ae : smoothing filter coefs
0055 %    abl: HP filter numerator and denominator coefficient
0056 %    bh : LP filter numerator coefficient
0057 %    ah : LP filter denominator coefficients
0058 %    ze : smoothing filter state
0059 %    zl : HP filter state
0060 %    zh : LP filter state
0061 %    zx : hangover max filter state
0062 %  emax : maximum envelope exponent + 1
0063 %   ssq : signal sum of squares
0064 %    ns : number of signal samples
0065 %    ss : sum of speech samples (not actually used here)
0066 %    kc : cumulative occupancy counts
0067 %    aw : weighting filter denominator
0068 %    bw : weighting filter numerator
0069 %    zw : weighting filter state
0070 %
0071 % This routine implements "Method B" from [1],[2] to calculate the active
0072 % speech level which is defined to be the speech energy divided by the
0073 % duration of speech activity. Speech is designated as "active" based on an
0074 % adaptive threshold applied to the smoothed rectified speech signal. A
0075 % bandpass filter is first applied to the input speech whose -0.25 dB points
0076 % are at 200 Hz & 5.5 kHz by default but this can be changed to 70 Hz & 5.5 kHz
0077 % or to 30 Hz & 18 kHz by specifying the 'w' or 'W' options; these
0078 % correspond respectively to Annexes B and C in [2].
0079 %
0080 % References:
0081 % [1]    ITU-T. Objective measurement of active speech level. Recommendation P.56, Mar. 1993.
0082 % [2]    ITU-T. Objective measurement of active speech level. Recommendation P.56, Dec. 2011.
0083 
0084 %      Copyright (C) Mike Brookes 2008-2016
0085 %      Version: $Id: activlev.m 10436 2018-03-22 21:49:12Z dmb $
0086 %
0087 %   VOICEBOX is a MATLAB toolbox for speech processing.
0088 %   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
0089 %
0090 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0091 %   This program is free software; you can redistribute it and/or modify
0092 %   it under the terms of the GNU General Public License as published by
0093 %   the Free Software Foundation; either version 2 of the License, or
0094 %   (at your option) any later version.
0095 %
0096 %   This program is distributed in the hope that it will be useful,
0097 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0098 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0099 %   GNU General Public License for more details.
0100 %
0101 %   You can obtain a copy of the GNU General Public License from
0102 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0103 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0104 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0105 
0106 persistent nbin thresh c25zp c15zp e5zp
0107 if isempty(nbin)
0108     nbin=20;    % 60 dB range at 3dB per bin
0109     thresh=15.9;    % threshold in dB
0110     % High pass s-domain zeros and poles of filters with passband ripple<0.25dB, stopband<-50dB, w0=1
0111     %    w0=fzero(@ch2,0.5); [c2z,c2p,k]=cheby2(5,50,w0,'high','s');
0112     %    function v=ch2(w); [c2z,c2p,k]=cheby2(5,50,w,'high','s'); v= 20*log10(prod(abs(1i-c2z))/prod(abs(1i-c2p)))+0.25;
0113     c25zp=[0.37843443673309i 0.23388534441447i; -0.20640255179496+0.73942185906851i -0.54036889596392+0.45698784092898i];
0114     c25zp=[[0; -0.66793268833792] c25zp conj(c25zp)];
0115     %       [c1z,c1p,c1k] = cheby1(5,0.25,1,'high','s');
0116     c15zp=[-0.659002835294875+1.195798636925079i -0.123261821596263+0.947463030958881i];
0117     c15zp=[zeros(1,5); -2.288586431066945 c15zp conj(c15zp)];
0118     %      [ez,ep,ek] = ellip(5,0.25,50,1,'high','s')
0119     e5zp=[0.406667680649209i 0.613849362744881i; -0.538736390607201+1.130245082677107i -0.092723126159100+0.958193646330194i];
0120     e5zp=[[0; -1.964538608244084]  e5zp conj(e5zp)];
0121     %    w=linspace(0.2,2,100);
0122     %    figure(1); plot(w,20*log10(abs(freqs(real(poly(c15zp(1,:))),real(poly(c15zp(2,:))),w)))); title('Chebyshev 1');
0123     %    figure(2); plot(w,20*log10(abs(freqs(real(poly(c25zp(1,:))),real(poly(c25zp(2,:))),w)))); title('Chebyshev 2');
0124     %    figure(3); plot(w,20*log10(abs(freqs(real(poly(e5zp(1,:))),real(poly(e5zp(2,:))),w)))); title('Elliptic');
0125 end
0126 
0127 if ~isstruct(fs)                        % no state vector given
0128     if nargin<3
0129         mode=' ';
0130     end
0131     fso.ffs=fs;                           % sample frequency
0132     ti=1/fs;
0133     g=exp(-ti/0.03);                    % pole position for envelope filter
0134     fso.ae=[1 -2*g g^2]/(1-g)^2;        % envelope filter coefficients (DC gain = 1)
0135     fso.ze=zeros(2,1);
0136     fso.nh=ceil(0.2/ti)+1;              % hangover time in samples
0137     fso.zx=-Inf;                        % initial value for maxfilt()
0138     fso.emax=-Inf;                      % maximum exponent
0139     fso.ns=0;
0140     fso.ssq=0;
0141     fso.ss=0;
0142     fso.kc=zeros(nbin,1);               % cumulative occupancy counts
0143     % s-plane zeros and poles of high pass 5'th order filter -0.25dB at w=1 and -50dB stopband
0144     if any(mode=='1')
0145         szp=c15zp;              % Chebyshev 1
0146     elseif any(mode=='e')
0147         szp=e5zp;               % Elliptic
0148     else
0149         szp=c25zp;              % Chebyshev 2
0150     end
0151     flh=[200 5500];             % default frequency range +- 0.25 dB
0152     if any(mode=='w')
0153         flh=[70 12000];         % super-wideband (Annex B of [2])
0154     elseif any(mode=='W')
0155         flh=[30 18000];         % full band (Annex C of [2])
0156     end
0157     if any(mode=='3')
0158         flh(1)=30;              % force a 30 Hz HPF cutoff
0159     end
0160     if any(mode=='4')
0161         flh(1)=40;              % force a 40 Hz HPF cutoff
0162     end
0163     if any(mode=='r')              % included for backward compatibility
0164         mode=['0h' mode];        % abolish both filters
0165     elseif fs<flh(2)*2.2
0166         mode=['h' mode];           % abolish lowpass filter at low sample rates
0167     end
0168     fso.fmd=mode;                % save mode flags
0169     if all(mode~='0')           % implement the HPF as biquads to avoid rounding errors
0170         zl=2./(1-szp*tan(flh(1)*pi/fs))-1;      % Transform s-domain poles/zeros with bilinear transform
0171         abl=[ones(2,1) -zl(:,1) -2*real(zl(:,2:3))  abs(zl(:,2:3)).^2];     % biquad coefficients
0172         hfg=(abl*[1 -1 0 0 0 0]').*(abl*[1 0 -1 0 1 0]').*(abl*[1 0 0 -1 0 1]');
0173         abl=abl(:,[1 2 1 3 5 1 4 6]);               % reorder into biquads
0174         abl(1,1:2)= abl(1,1:2)*hfg(2)/hfg(1);       % force Nyquist gain to equal 1
0175         fso.abl=abl;
0176         fso.zl=zeros(5,1);                          % space for HPF filter state
0177     end
0178     if all(mode~='h')
0179         zh=2./(szp/tan(flh(2)*pi/fs)-1)+1;     % Transform s-domain poles/zeros with bilinear transform
0180         ah=real(poly(zh(2,:)));
0181         bh=real(poly(zh(1,:)));
0182         fso.bh=bh*sum(ah)/sum(bh);
0183         fso.ah=ah;
0184         fso.zh=zeros(5,1);
0185     end
0186     if any(mode=='a')
0187         [fso.bw,fso.aw]=stdspectrum(2,'z',fs);
0188         fso.zw=zeros(length(fso.aw)-1,1);
0189     elseif any(mode=='i')
0190         [fso.bw,fso.aw]=stdspectrum(8,'z',fs);
0191         fso.zw=zeros(length(fso.aw)-1,1);
0192     end
0193 else
0194     fso=fs;             % use existing structure
0195 end
0196 md=fso.fmd;
0197 if nargin<3
0198     mode=fso.fmd;
0199 end
0200 nsp=length(sp); % original length of speech
0201 if all(mode~='z')
0202     nz=ceil(0.35*fso.ffs); % number of zeros to append
0203     sp=[sp(:);zeros(nz,1)];
0204 else
0205     nz=0;
0206 end
0207 ns=length(sp);
0208 if ns                       % process this speech chunk
0209     % apply the input filters to the speech
0210     if all(md~='0')         % implement the HPF as biquads to avoid rounding errors
0211         [sq,fso.zl(1)]=filter(fso.abl(1,1:2),fso.abl(2,1:2),sp(:),fso.zl(1));       % highpass filter: real pole/zero
0212         [sq,fso.zl(2:3)]=filter(fso.abl(1,3:5),fso.abl(2,3:5),sq(:),fso.zl(2:3));      % highpass filter: biquad 1
0213         [sq,fso.zl(4:5)]=filter(fso.abl(1,6:8),fso.abl(2,6:8),sq(:),fso.zl(4:5));      % highpass filter: biquad 2
0214     else
0215         sq=sp(:);
0216     end
0217     if all(md~='h')
0218         [sq,fso.zh]=filter(fso.bh,fso.ah,sq(:),fso.zh);     % lowpass filter
0219     end
0220     if any(md=='a') || any(md=='i')
0221         [sq,fso.zw]=filter(fso.bw,fso.aw,sq(:),fso.zw);     % weighting filter
0222     end
0223     fso.ns=fso.ns+ns;                               % count the number of speech samples
0224     fso.ss=fso.ss+sum(sq);                          % sum of speech samples
0225     fso.ssq=fso.ssq+sum(sq.*sq);                    % sum of squared speech samples
0226     [s,fso.ze]=filter(1,fso.ae,abs(sq(:)),fso.ze);     % envelope filter
0227     [qf,qe]=log2(s.^2);                             % take efficient log2 function, 2^qe is upper limit of bin
0228     qe(qf==0)=-Inf;                                 % fix zero values
0229     [qe,qk,fso.zx]=maxfilt(qe,1,fso.nh,1,fso.zx);      % apply the 0.2 second hangover
0230     oemax=fso.emax;
0231     fso.emax=max(oemax,max(qe)+1);
0232     if fso.emax==-Inf
0233         fso.kc(1)=fso.kc(1)+ns;
0234     else
0235         qe=min(fso.emax-qe,nbin);   % force in the range 1:nbin. Bin k has 2^(emax-k-1)<=s^2<=2^(emax-k)
0236         wqe=ones(length(qe),1);
0237         % below: could use kc=cumsum(accumarray(qe,wqe,nbin)) but unsure about backwards compatibility
0238         kc=cumsum(full(sparse(qe,wqe,wqe,nbin,1)));     % cumulative occupancy counts
0239         esh=fso.emax-oemax;                             % amount to shift down previous bin counts
0240         if esh<nbin-1                                   % if any of the previous bins are worth keeping
0241             kc(esh+1:nbin-1)=kc(esh+1:nbin-1)+fso.kc(1:nbin-esh-1);
0242             kc(nbin)=kc(nbin)+sum(fso.kc(nbin-esh:nbin));
0243         else
0244             kc(nbin)=kc(nbin)+sum(fso.kc); % otherwise just add all old counts into the last (lowest) bin
0245         end
0246         fso.kc=kc;
0247     end
0248 end
0249 if fso.ns                       % now calculate the output values
0250     if fso.ssq>0
0251         aj=10*log10(fso.ssq*(fso.kc).^(-1));
0252         % equivalent to cj=20*log10(sqrt(2).^(fso.emax-(1:nbin)-1));
0253         cj=10*log10(2)*(fso.emax-(1:nbin)-1);               % lower limit of bin j in dB
0254         mj=aj'-cj-thresh;
0255         %  jj=find(mj*sign(mj(1))<=0); % Find threshold
0256         jj=find(mj(1:end-1)<0 &  mj(2:end)>=0,1);           % find +ve transition through threshold
0257         if isempty(jj)                                      % if we never cross the threshold
0258             if mj(end)<=0                                   % if we end up below if
0259                 jj=length(mj)-1;            % take the threshold to be the bottom of the last (lowest) bin
0260                 jf=1;
0261             else                            % if we are always above it
0262                 jj=1;                       % take the threshold to be the bottom of the first (highest) bin
0263                 jf=0;
0264             end
0265         else
0266             jf=1/(1-mj(jj+1)/mj(jj));       % fractional part of j using linear interpolation
0267         end
0268         lev=aj(jj)+jf*(aj(jj+1)-aj(jj));    % active level in decibels
0269         lp=10.^(lev/10);                    % active level in power
0270         if any(md=='d')                     % 'd' option -> output in dB
0271             lev=[lev 10*log10(fso.ssq/fso.ns)];
0272         else                                % ~'d' option -> output in power
0273             lev=[lp fso.ssq/fso.ns];
0274         end
0275         af=fso.ssq/(fso.ns*lp);
0276     else                        % if all samples are equal to zero
0277         af=0;
0278         if any(md=='d')         % 'd' option -> output in dB
0279             lev=[-Inf -Inf];    % active level is 0 dB
0280         else                    % ~'d' option -> output in power
0281             lev=[0 0];          % active level is 0 power
0282         end
0283     end
0284     if all(md~='l')
0285         lev=lev(1);         % only output the first element of lev unless 'l' option
0286     end
0287 end
0288 if nargout>3
0289     vad=maxfilt(s(1:nsp),1,fso.nh,1);
0290     vad=vad>(sqrt(lp)/10^(thresh/20));
0291 end
0292 if ~nargout
0293     vad=maxfilt(s,1,fso.nh,1);
0294     vad=vad>(sqrt(lp)/10^(thresh/20));
0295     levdb=10*log10(lp);
0296     clf;
0297     subplot(2,2,[1 2]);
0298     tax=(1:ns)/fso.ffs;
0299     plot(tax,sp,'-y',tax,s,'-r',tax,(vad>0)*sqrt(lp),'-b');
0300     xlabel('Time (s)');
0301     title(sprintf('Active Level = %.2g dB, Activity = %.0f%% (ITU-T P.56)',levdb,100*af));
0302     axisenlarge([-1 -1 -1.4 -1.05]);
0303     if nz>0
0304         hold on
0305         ylim=get(gca,'ylim');
0306         plot(tax(end-nz)*[1 1],ylim,':k');
0307         hold off
0308     end
0309     ylabel('Amplitude');
0310     legend('Signal','Smoothed envelope','VAD * Active-Level','Location','SouthEast');
0311     subplot(2,2,4);
0312     plot(cj,repmat(levdb,nbin,1),'k:',cj,aj(:),'-b',cj,cj,'-r',levdb-thresh*ones(1,2),[levdb-thresh levdb],'-r');
0313     xlabel('Threshold (dB)');
0314     ylabel('Active Level (dB)');
0315     legend('Active Level','Speech>Thresh','Threshold','Location','NorthWest');
0316     texthvc(levdb-thresh,levdb-0.5*thresh,sprintf('%.1f dB ',thresh),'rmr');
0317     axisenlarge([-1 -1.05]);
0318     ylim=get(gca,'ylim');
0319     set(gca,'ylim',[levdb-1.2*thresh max(ylim(2),levdb+1.9*thresh)]);
0320     kch=filter([1 -1],1,kc);
0321     subplot(2,2,3);
0322     bar(5*log10(2)+cj(end:-1:1),kch(end:-1:1)*100/kc(end));
0323     set(gca,'xlim',[cj(end) cj(1)+10*log10(2)]);
0324     ylim=get(gca,'ylim');
0325     hold on
0326     plot(lev([1 1]),ylim,'k:',lev([1 1])-thresh,ylim,'r:');
0327     hold off
0328     texthvc(lev(1),ylim(2),sprintf(' Act\n Lev'),'ltk');
0329     texthvc(lev(1)-thresh,ylim(2),sprintf('Threshold '),'rtr');
0330     xlabel('Frame power (dB)')
0331     ylabel('% frames');
0332 elseif any(md=='n') || any(md=='N') % output normalized speech waveform
0333     fsx=fso; % shift along other outputs
0334     fso=af;
0335     af=lev;
0336     if any(md=='n')
0337         sq=sp; % 'n' -> use unfiltered speech
0338     end
0339     if fsx.ns>0 && fsx.ssq>0 % if there has been any non-zero speech
0340         lev=sq(1:nsp)/sqrt(lp);
0341     else
0342         lev=sq(1:nsp);
0343     end
0344 end

Generated on Wed 28-Mar-2018 15:33:24 by m2html © 2003