Home > voicebox > activlev.m

activlev

PURPOSE ^

ACTIVLEV Measure active speech level as in ITU-T P.56 [LEV,AF,FSO]=(sp,FS,MODE)

SYNOPSIS ^

function [lev,af,fso,vad]=activlev(sp,fs,mode)

DESCRIPTION ^

ACTIVLEV Measure active speech level as in ITU-T P.56 [LEV,AF,FSO]=(sp,FS,MODE)

Usage: (1) lev=activlev(s,fs);     % speech level in units of power
       (2) db=activlev(s,fs,'d');  % speech level in dB
       (3) s=activlev(s,fs,'n');   % normalize active level to 0 dB

Inputs: sp     is the speech signal (with better than 20dB SNR)
        FS     is the sample frequency in Hz (see also FSO below)
        MODE   is a combination of the following:
               0 - omit high pass filter completely (i.e. include DC)
               3 - high pass filter at 30 Hz instead of 200 Hz (but allows mains hum to pass)
               4 - high pass filter at 40 Hz instead of 200 Hz (but allows mains hum to pass)
               1 - use cheybyshev 1 filter
               2 - use chebyshev 2 filter (default)
               e - use elliptic filter
               h - omit low pass filter at 5.5, 12 or 18 kHz
               w - use wideband filter frequencies: 70 Hz to 12 kHz
               W - use ultra wideband filter frequencies: 30 Hz to 18 kHz
               d - give outputs in dB rather than power
               n - output a normalized speech signal as the first argument
               N - output a normalized filtered speech signal as the first argument
               l - give both active and long-term power levels
               a - include A-weighting filter
               i - include ITU-R-BS.468/ITU-T-J.16 weighting filter

Outputs:
    If the "n" option is specified, a speech signal normalized to 0dB will be given as
    the first output followed by the other outputs.
        LEV    gives the speech level in units of power (or dB if mode='d')
               if mode='l' is specified, LEV is a row vector with the "long term
               level" as its second element (this is just the mean power)
        AF     is the activity factor (or duty cycle) in the range 0 to 1
        FSO    is a column vector of intermediate information that allows
               you to process a speech signal in chunks. Thus:
                       fso=fs; for i=1:inc:nsamp,[lev,fso]=activlev(sp(i:i+inc-1),fso,mode); end
               is equivalent to:
                       lev=activlev(sp(1:nsamp),fs,mode)
               but is much slower. The two methods will not give identical results
               because they will use slightly different thresholds.
        VAD    is a boolean vector the same length as sp that acts as an approximate voice activity detector

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [lev,af,fso,vad]=activlev(sp,fs,mode)
0002 %ACTIVLEV Measure active speech level as in ITU-T P.56 [LEV,AF,FSO]=(sp,FS,MODE)
0003 %
0004 %Usage: (1) lev=activlev(s,fs);     % speech level in units of power
0005 %       (2) db=activlev(s,fs,'d');  % speech level in dB
0006 %       (3) s=activlev(s,fs,'n');   % normalize active level to 0 dB
0007 %
0008 %Inputs: sp     is the speech signal (with better than 20dB SNR)
0009 %        FS     is the sample frequency in Hz (see also FSO below)
0010 %        MODE   is a combination of the following:
0011 %               0 - omit high pass filter completely (i.e. include DC)
0012 %               3 - high pass filter at 30 Hz instead of 200 Hz (but allows mains hum to pass)
0013 %               4 - high pass filter at 40 Hz instead of 200 Hz (but allows mains hum to pass)
0014 %               1 - use cheybyshev 1 filter
0015 %               2 - use chebyshev 2 filter (default)
0016 %               e - use elliptic filter
0017 %               h - omit low pass filter at 5.5, 12 or 18 kHz
0018 %               w - use wideband filter frequencies: 70 Hz to 12 kHz
0019 %               W - use ultra wideband filter frequencies: 30 Hz to 18 kHz
0020 %               d - give outputs in dB rather than power
0021 %               n - output a normalized speech signal as the first argument
0022 %               N - output a normalized filtered speech signal as the first argument
0023 %               l - give both active and long-term power levels
0024 %               a - include A-weighting filter
0025 %               i - include ITU-R-BS.468/ITU-T-J.16 weighting filter
0026 %
0027 %Outputs:
0028 %    If the "n" option is specified, a speech signal normalized to 0dB will be given as
0029 %    the first output followed by the other outputs.
0030 %        LEV    gives the speech level in units of power (or dB if mode='d')
0031 %               if mode='l' is specified, LEV is a row vector with the "long term
0032 %               level" as its second element (this is just the mean power)
0033 %        AF     is the activity factor (or duty cycle) in the range 0 to 1
0034 %        FSO    is a column vector of intermediate information that allows
0035 %               you to process a speech signal in chunks. Thus:
0036 %                       fso=fs; for i=1:inc:nsamp,[lev,fso]=activlev(sp(i:i+inc-1),fso,mode); end
0037 %               is equivalent to:
0038 %                       lev=activlev(sp(1:nsamp),fs,mode)
0039 %               but is much slower. The two methods will not give identical results
0040 %               because they will use slightly different thresholds.
0041 %        VAD    is a boolean vector the same length as sp that acts as an approximate voice activity detector
0042 
0043 %For completeness we list here the contents of the FSO structure:
0044 %
0045 %   ffs : sample frequency
0046 %   fmd : mode string
0047 %    nh : hangover time in samples
0048 %    ae : smoothing filter coefs
0049 %    abl: HP filter numerator and denominator coefficient
0050 %    bh : LP filter numerator coefficient
0051 %    ah : LP filter denominator coefficients
0052 %    ze : smoothing filter state
0053 %    zl : HP filter state
0054 %    zh : LP filter state
0055 %    zx : hangover max filter state
0056 %  emax : maximum envelope exponent + 1
0057 %   ssq : signal sum of squares
0058 %    ns : number of signal samples
0059 %    ss : sum of speech samples (not actually used here)
0060 %    kc : cumulative occupancy counts
0061 %    aw : weighting filter denominator
0062 %    bw : weighting filter numerator
0063 %    zw : weighting filter state
0064 %
0065 % This routine implements "Method B" from [1],[2] to calculate the active
0066 % speech level which is defined to be the speech energy divided by the
0067 % duration of speech activity. Speech is designated as "active" based on an
0068 % adaptive threshold applied to the smoothed rectified speech signal. A
0069 % bandpass filter is first applied to the input speech whose -0.25 dB points
0070 % are at 200 Hz & 5.5 kHz by default but this can be changed to 70 Hz & 5.5 kHz
0071 % or to 30 Hz & 18 kHz by specifying the 'w' or 'W' options; these
0072 % correspond respectively to Annexes B and C in [2].
0073 %
0074 % References:
0075 % [1]    ITU-T. Objective measurement of active speech level. Recommendation P.56, Mar. 1993.
0076 % [2]    ITU-T. Objective measurement of active speech level. Recommendation P.56, Dec. 2011.
0077 
0078 %      Copyright (C) Mike Brookes 2008-2016
0079 %      Version: $Id: activlev.m 7336 2016-01-06 11:25:46Z dmb $
0080 %
0081 %   VOICEBOX is a MATLAB toolbox for speech processing.
0082 %   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
0083 %
0084 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0085 %   This program is free software; you can redistribute it and/or modify
0086 %   it under the terms of the GNU General Public License as published by
0087 %   the Free Software Foundation; either version 2 of the License, or
0088 %   (at your option) any later version.
0089 %
0090 %   This program is distributed in the hope that it will be useful,
0091 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0092 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0093 %   GNU General Public License for more details.
0094 %
0095 %   You can obtain a copy of the GNU General Public License from
0096 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0097 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0098 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0099 
0100 persistent nbin thresh c25zp c15zp e5zp
0101 if isempty(nbin)
0102     nbin=20;    % 60 dB range at 3dB per bin
0103     thresh=15.9;    % threshold in dB
0104     % High pass s-domain zeros and poles of filters with passband ripple<0.25dB, stopband<-50dB, w0=1
0105     %    w0=fzero(@ch2,0.5); [c2z,c2p,k]=cheby2(5,50,w0,'high','s');
0106     %    function v=ch2(w); [c2z,c2p,k]=cheby2(5,50,w,'high','s'); v= 20*log10(prod(abs(1i-c2z))/prod(abs(1i-c2p)))+0.25;
0107     c25zp=[0.37843443673309i 0.23388534441447i; -0.20640255179496+0.73942185906851i -0.54036889596392+0.45698784092898i];
0108     c25zp=[[0; -0.66793268833792] c25zp conj(c25zp)];
0109     %       [c1z,c1p,c1k] = cheby1(5,0.25,1,'high','s');
0110     c15zp=[-0.659002835294875+1.195798636925079i -0.123261821596263+0.947463030958881i];
0111     c15zp=[zeros(1,5); -2.288586431066945 c15zp conj(c15zp)];
0112     %      [ez,ep,ek] = ellip(5,0.25,50,1,'high','s')
0113     e5zp=[0.406667680649209i 0.613849362744881i; -0.538736390607201+1.130245082677107i -0.092723126159100+0.958193646330194i];
0114     e5zp=[[0; -1.964538608244084]  e5zp conj(e5zp)];
0115     %    w=linspace(0.2,2,100);
0116     %    figure(1); plot(w,20*log10(abs(freqs(real(poly(c15zp(1,:))),real(poly(c15zp(2,:))),w)))); title('Chebyshev 1');
0117     %    figure(2); plot(w,20*log10(abs(freqs(real(poly(c25zp(1,:))),real(poly(c25zp(2,:))),w)))); title('Chebyshev 2');
0118     %    figure(3); plot(w,20*log10(abs(freqs(real(poly(e5zp(1,:))),real(poly(e5zp(2,:))),w)))); title('Elliptic');
0119 end
0120 
0121 if ~isstruct(fs)                        % no state vector given
0122     if nargin<3
0123         mode=' ';
0124     end
0125     fso.ffs=fs;                           % sample frequency
0126     
0127     ti=1/fs;
0128     g=exp(-ti/0.03);                    % pole position for envelope filter
0129     fso.ae=[1 -2*g g^2]/(1-g)^2;        % envelope filter coefficients (DC gain = 1)
0130     fso.ze=zeros(2,1);
0131     fso.nh=ceil(0.2/ti)+1;              % hangover time in samples
0132     fso.zx=-Inf;                        % initial value for maxfilt()
0133     fso.emax=-Inf;                      % maximum exponent
0134     fso.ns=0;
0135     fso.ssq=0;
0136     fso.ss=0;
0137     fso.kc=zeros(nbin,1);               % cumulative occupancy counts
0138     % s-plane zeros and poles of high pass 5'th order filter -0.25dB at w=1 and -50dB stopband
0139     if any(mode=='1')
0140         szp=c15zp;              % Chebyshev 1
0141     elseif any(mode=='e')
0142         szp=e5zp;               % Elliptic
0143     else
0144         szp=c25zp;              % Chebyshev 2
0145     end
0146     flh=[200 5500];             % default frequency range +- 0.25 dB
0147     if any(mode=='w')
0148         flh=[70 12000];         % super-wideband (Annex B of [2])
0149     elseif any(mode=='W')
0150         flh=[30 18000];         % full band (Annex C of [2])
0151     end
0152     if any(mode=='3')
0153         flh(1)=30;              % force a 30 Hz HPF cutoff
0154     end
0155     if any(mode=='4')
0156         flh(1)=40;              % force a 40 Hz HPF cutoff
0157     end
0158     if any(mode=='r')              % included for backward compatibility
0159         mode=['0h' mode];        % abolish both filters
0160     elseif fs<flh(2)*2.2
0161         mode=['h' mode];           % abolish lowpass filter at low sample rates
0162     end
0163     fso.fmd=mode;                % save mode flags
0164     if all(mode~='0')           % implement the HPF as biquads to avoid rounding errors
0165         zl=2./(1-szp*tan(flh(1)*pi/fs))-1;      % Transform s-domain poles/zeros with bilinear transform
0166         abl=[ones(2,1) -zl(:,1) -2*real(zl(:,2:3))  abs(zl(:,2:3)).^2];     % biquad coefficients
0167         hfg=(abl*[1 -1 0 0 0 0]').*(abl*[1 0 -1 0 1 0]').*(abl*[1 0 0 -1 0 1]');
0168         abl=abl(:,[1 2 1 3 5 1 4 6]);               % reorder into biquads
0169         abl(1,1:2)= abl(1,1:2)*hfg(2)/hfg(1);       % force Nyquist gain to equal 1
0170         fso.abl=abl;
0171         fso.zl=zeros(5,1);                          % space for HPF filter state
0172     end
0173     if all(mode~='h')
0174         zh=2./(szp/tan(flh(2)*pi/fs)-1)+1;     % Transform s-domain poles/zeros with bilinear transform
0175         ah=real(poly(zh(2,:)));
0176         bh=real(poly(zh(1,:)));
0177         fso.bh=bh*sum(ah)/sum(bh);
0178         fso.ah=ah;
0179         fso.zh=zeros(5,1);
0180     end
0181     if any(mode=='a')
0182         [fso.bw,fso.aw]=stdspectrum(2,'z',fs);
0183         fso.zw=zeros(length(fso.aw)-1,1);
0184     elseif any(mode=='i')
0185         [fso.bw,fso.aw]=stdspectrum(8,'z',fs);
0186         fso.zw=zeros(length(fso.aw)-1,1);
0187     end
0188 else
0189     fso=fs;             % use existing structure
0190 end
0191 md=fso.fmd;
0192 ns=length(sp);
0193 if ns                       % process this speech chunk
0194     % apply the input filters to the speech
0195     if all(md~='0')         % implement the HPF as biquads to avoid rounding errors
0196         [sq,fso.zl(1)]=filter(fso.abl(1,1:2),fso.abl(2,1:2),sp(:),fso.zl(1));       % highpass filter: real pole/zero
0197         [sq,fso.zl(2:3)]=filter(fso.abl(1,3:5),fso.abl(2,3:5),sq(:),fso.zl(2:3));      % highpass filter: biquad 1
0198         [sq,fso.zl(4:5)]=filter(fso.abl(1,6:8),fso.abl(2,6:8),sq(:),fso.zl(4:5));      % highpass filter: biquad 2
0199     else
0200         sq=sp(:);
0201     end
0202     if all(md~='h')
0203         [sq,fso.zh]=filter(fso.bh,fso.ah,sq(:),fso.zh);     % lowpass filter
0204     end
0205     if any(md=='a') || any(md=='i')
0206         [sq,fso.zw]=filter(fso.bw,fso.aw,sq(:),fso.zw);     % weighting filter
0207     end
0208     fso.ns=fso.ns+ns;                               % count the number of speech samples
0209     fso.ss=fso.ss+sum(sq);                          % sum of speech samples
0210     fso.ssq=fso.ssq+sum(sq.*sq);                    % sum of squared speech samples
0211     [s,fso.ze]=filter(1,fso.ae,abs(sq(:)),fso.ze);     % envelope filter
0212     [qf,qe]=log2(s.^2);                             % take efficient log2 function, 2^qe is upper limit of bin
0213     qe(qf==0)=-Inf;                                 % fix zero values
0214     [qe,qk,fso.zx]=maxfilt(qe,1,fso.nh,1,fso.zx);      % apply the 0.2 second hangover
0215     oemax=fso.emax;
0216     fso.emax=max(oemax,max(qe)+1);
0217     if fso.emax==-Inf
0218         fso.kc(1)=fso.kc(1)+ns;
0219     else
0220         qe=min(fso.emax-qe,nbin);   % force in the range 1:nbin. Bin k has 2^(emax-k-1)<=s^2<=2^(emax-k)
0221         wqe=ones(length(qe),1);
0222         % below: could use kc=cumsum(accumarray(qe,wqe,nbin)) but unsure about backwards compatibility
0223         kc=cumsum(full(sparse(qe,wqe,wqe,nbin,1)));     % cumulative occupancy counts
0224         esh=fso.emax-oemax;                             % amount to shift down previous bin counts
0225         if esh<nbin-1                                   % if any of the previous bins are worth keeping
0226             kc(esh+1:nbin-1)=kc(esh+1:nbin-1)+fso.kc(1:nbin-esh-1);
0227             kc(nbin)=kc(nbin)+sum(fso.kc(nbin-esh:nbin));
0228         else
0229             kc(nbin)=kc(nbin)+sum(fso.kc); % otherwise just add all old counts into the last (lowest) bin
0230         end
0231         fso.kc=kc;
0232     end
0233 end
0234 if fso.ns                       % now calculate the output values
0235     if fso.ssq>0
0236         aj=10*log10(fso.ssq*(fso.kc).^(-1));
0237         % equivalent to cj=20*log10(sqrt(2).^(fso.emax-(1:nbin)-1));
0238         cj=10*log10(2)*(fso.emax-(1:nbin)-1);               % lower limit of bin j in dB
0239         mj=aj'-cj-thresh;
0240         %  jj=find(mj*sign(mj(1))<=0); % Find threshold
0241         jj=find(mj(1:end-1)<0 &  mj(2:end)>=0,1);           % find +ve transition through threshold
0242         if isempty(jj)                                      % if we never cross the threshold
0243             if mj(end)<=0                                   % if we end up below if
0244                 jj=length(mj)-1;            % take the threshold to be the bottom of the last (lowest) bin
0245                 jf=1;
0246             else                            % if we are always above it
0247                 jj=1;                       % take the threshold to be the bottom of the first (highest) bin
0248                 jf=0;
0249             end
0250         else
0251             jf=1/(1-mj(jj+1)/mj(jj));       % fractional part of j using linear interpolation
0252         end
0253         lev=aj(jj)+jf*(aj(jj+1)-aj(jj));    % active level in decibels
0254         lp=10.^(lev/10);
0255         if any(md=='d')
0256             lev=[lev 10*log10(fso.ssq/fso.ns)];
0257         else
0258             lev=[lp fso.ssq/fso.ns];
0259         end
0260         af=fso.ssq/(fso.ns*lp);
0261     else
0262         af=0;
0263         if all(md~='d')
0264             lev=[0 0];
0265         else
0266             lev=[-200 -200];
0267         end
0268     end
0269     if all(md~='l')
0270         lev=lev(1);         % only output the first element of lev normally
0271     end
0272 end
0273 if nargout>3
0274     vad=maxfilt(s,1,fso.nh,1);
0275     vad=vad>(sqrt(lp)/10^(thresh/20));
0276 end
0277 if ~nargout
0278     vad=maxfilt(s,1,fso.nh,1);
0279     vad=vad>(sqrt(lp)/10^(thresh/20));
0280     levdb=10*log10(lp);
0281     clf;
0282     subplot(2,2,[1 2]);
0283     tax=(1:ns)/fso.ffs;
0284     plot(tax,sp,'-y',tax,s,'-r',tax,(vad>0)*sqrt(lp),'-b');
0285     xlabel('Time (s)');
0286     title(sprintf('Active Level = %.2g dB, Activity = %.0f%% (ITU-T P.56)',levdb,100*af));
0287     axisenlarge([-1 -1 -1.4 -1.05]);
0288     ylabel('Amplitude');
0289     legend('Signal','Smoothed envelope','VAD * Active-Level','Location','SouthEast');
0290     subplot(2,2,4);
0291     plot(cj,repmat(levdb,nbin,1),'k:',cj,aj(:),'-b',cj,cj,'-r',levdb-thresh*ones(1,2),[levdb-thresh levdb],'-r');
0292     xlabel('Threshold (dB)');
0293     ylabel('Active Level (dB)');
0294     legend('Active Level','Speech>Thresh','Threshold','Location','NorthWest');
0295     texthvc(levdb-thresh,levdb-0.5*thresh,sprintf('%.1f dB ',thresh),'rmr');
0296     axisenlarge([-1 -1.05]);
0297     ylim=get(gca,'ylim');
0298     set(gca,'ylim',[levdb-1.2*thresh max(ylim(2),levdb+1.9*thresh)]);
0299     kch=filter([1 -1],1,kc);
0300     subplot(2,2,3);
0301     bar(5*log10(2)+cj(end:-1:1),kch(end:-1:1)*100/kc(end));
0302     set(gca,'xlim',[cj(end) cj(1)+10*log10(2)]);
0303     ylim=get(gca,'ylim');
0304     hold on
0305     plot(lev([1 1]),ylim,'k:',lev([1 1])-thresh,ylim,'r:');
0306     hold off
0307     texthvc(lev(1),ylim(2),sprintf(' Act\n Lev'),'ltk');
0308     texthvc(lev(1)-thresh,ylim(2),sprintf('Threshold '),'rtr');
0309     xlabel('Frame power (dB)')
0310     ylabel('% frames');
0311 elseif any(md=='n') || any(md=='N')
0312     fsx=fso;
0313     fso=af;
0314     af=lev;
0315     if any(md=='n')
0316         sq=sp;
0317     end
0318     if fsx.ns>0 && fsx.ssq>0
0319         lev=sq/sqrt(lp);
0320     else
0321         lev=sq;
0322     end
0323 end

Generated on Sat 23-Jul-2016 15:57:02 by m2html © 2003