Home > voicebox > activlev.m

activlev

PURPOSE ^

ACTIVLEV Measure active speech level as in ITU-T P.56 [LEV,AF,FSO]=(sp,FS,MODE)

SYNOPSIS ^

function [lev,af,fso,vad]=activlev(sp,fs,mode)

DESCRIPTION ^

ACTIVLEV Measure active speech level as in ITU-T P.56 [LEV,AF,FSO]=(sp,FS,MODE)

Usage: (1) lev=activlev(s,fs);     % speech level in units of power
       (2) db=activlev(s,fs,'d');  % speech level in dB
       (3) s=activlev(s,fs,'n');   % normalize active level to 0 dB

Inputs: sp is the speech signal (with better than 20dB SNR)
        FS is the sample frequency in Hz (see also FSO below)
        MODE is a combination of the following:
            r - raw omit input filters (default is 200 Hz to 5.5 kHz)
            0 - no high pass filter (i.e. include DC)
            4 - high pass filter at 40 Hz (but allows mains hum to pass)
            1 - use cheybyshev 1 filter
            2 - use chebyshev 2 filter (default)
            e - use elliptic filter
            h - omit low pass filter at 5.5 kHz
            d - give outputs in dB rather than power
            n - output a normalized speech signal as the first argument
            N - output a normalized filtered speech signal as the first argument
            l - give both active and long-term power levels
            a - include A-weighting filter
            i - include ITU-R-BS.468/ITU-T-J.16 weighting filter

Outputs:
    If the "n" option is specified, a speech signal normalized to 0dB will be given as
    the first output followed by the other outputs.
        LEV gives the speech level in units of power (or dB if mode='d')
            if mode='l' is specified, LEV is a row vector with the "long term
            level" as its second element (this is just the mean power)
        AF  is the activity factor (or duty cycle) in the range 0 to 1
        FSO is a column vector of intermediate information that allows
            you to process a speech signal in chunks. Thus:

            fso=fs; for i=1:inc:nsamp, [lev,fso]=activlev(sp(i:i+inc-1),fso,mode); end

            is equivalent to:                lev=activlev(sp(1:nsamp),fs,mode)

            but is much slower. The two methods will not give identical results
            because they will use slightly different thresholds.
        VAD is a boolean vector the same length as sp that acts as an approximate voice activity detector

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [lev,af,fso,vad]=activlev(sp,fs,mode)
0002 %ACTIVLEV Measure active speech level as in ITU-T P.56 [LEV,AF,FSO]=(sp,FS,MODE)
0003 %
0004 %Usage: (1) lev=activlev(s,fs);     % speech level in units of power
0005 %       (2) db=activlev(s,fs,'d');  % speech level in dB
0006 %       (3) s=activlev(s,fs,'n');   % normalize active level to 0 dB
0007 %
0008 %Inputs: sp is the speech signal (with better than 20dB SNR)
0009 %        FS is the sample frequency in Hz (see also FSO below)
0010 %        MODE is a combination of the following:
0011 %            r - raw omit input filters (default is 200 Hz to 5.5 kHz)
0012 %            0 - no high pass filter (i.e. include DC)
0013 %            4 - high pass filter at 40 Hz (but allows mains hum to pass)
0014 %            1 - use cheybyshev 1 filter
0015 %            2 - use chebyshev 2 filter (default)
0016 %            e - use elliptic filter
0017 %            h - omit low pass filter at 5.5 kHz
0018 %            d - give outputs in dB rather than power
0019 %            n - output a normalized speech signal as the first argument
0020 %            N - output a normalized filtered speech signal as the first argument
0021 %            l - give both active and long-term power levels
0022 %            a - include A-weighting filter
0023 %            i - include ITU-R-BS.468/ITU-T-J.16 weighting filter
0024 %
0025 %Outputs:
0026 %    If the "n" option is specified, a speech signal normalized to 0dB will be given as
0027 %    the first output followed by the other outputs.
0028 %        LEV gives the speech level in units of power (or dB if mode='d')
0029 %            if mode='l' is specified, LEV is a row vector with the "long term
0030 %            level" as its second element (this is just the mean power)
0031 %        AF  is the activity factor (or duty cycle) in the range 0 to 1
0032 %        FSO is a column vector of intermediate information that allows
0033 %            you to process a speech signal in chunks. Thus:
0034 %
0035 %            fso=fs; for i=1:inc:nsamp, [lev,fso]=activlev(sp(i:i+inc-1),fso,mode); end
0036 %
0037 %            is equivalent to:                lev=activlev(sp(1:nsamp),fs,mode)
0038 %
0039 %            but is much slower. The two methods will not give identical results
0040 %            because they will use slightly different thresholds.
0041 %        VAD is a boolean vector the same length as sp that acts as an approximate voice activity detector
0042 
0043 %For completeness we list here the contents of the FSO structure:
0044 %
0045 %   ffs : sample frequency
0046 %   fmd : mode string
0047 %    nh : hangover time in samples
0048 %    ae : smoothing filter coefs
0049 %    bl : 200Hz HP filter numerator
0050 %    al : 200Hz HP filter denominator
0051 %    bh : 5.5kHz LP filter numerator
0052 %    ah : 5.5kHz LP filter denominator
0053 %    ze : smoothing filter state
0054 %    zl : 200Hz HP filter state
0055 %    zh : 5.5kHz LP filter state
0056 %    zx : hangover max filter state
0057 %  emax : maximum envelope exponent + 1
0058 %   ssq : signal sum of squares
0059 %    ns : number of signal samples
0060 %    ss : sum of speech samples (not actually used here)
0061 %    kc : cumulative occupancy counts
0062 %    aw : weighting filter denominator
0063 %    bw : weighting filter numerator
0064 %    zw : weighting filter state
0065 
0066 %      Copyright (C) Mike Brookes 2008-2012
0067 %      Version: $Id: activlev.m 4346 2014-03-17 20:25:55Z dmb $
0068 %
0069 %   VOICEBOX is a MATLAB toolbox for speech processing.
0070 %   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
0071 %
0072 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0073 %   This program is free software; you can redistribute it and/or modify
0074 %   it under the terms of the GNU General Public License as published by
0075 %   the Free Software Foundation; either version 2 of the License, or
0076 %   (at your option) any later version.
0077 %
0078 %   This program is distributed in the hope that it will be useful,
0079 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0080 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0081 %   GNU General Public License for more details.
0082 %
0083 %   You can obtain a copy of the GNU General Public License from
0084 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0085 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0086 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0087 
0088 persistent nbin thresh c25zp c15zp e5zp
0089 if isempty(nbin)
0090     nbin=20;    % 60 dB range at 3dB per bin
0091     thresh=15.9;    % threshold in dB
0092     % High pass s-domain zeros and poles of filters with passband ripple<0.25dB, stopband<-50dB, w0=1
0093     %    w0=fzero(@ch2,0.5); [c2z,c2p,k]=cheby2(5,50,w0,'high','s');
0094     %    function v=ch2(w); [c2z,c2p,k]=cheby2(5,50,w,'high','s'); v= 20*log10(prod(abs(1i-c2z))/prod(abs(1i-c2p)))+0.25;
0095     c25zp=[0.37843443673309i 0.23388534441447i; -0.20640255179496+0.73942185906851i -0.54036889596392+0.45698784092898i];
0096     c25zp=[[0; -0.66793268833792] c25zp conj(c25zp)];
0097     %       [c1z,c1p,c1k] = cheby1(5,0.25,1,'high','s');
0098     c15zp=[-0.659002835294875+1.195798636925079i -0.123261821596263+0.947463030958881i];
0099     c15zp=[zeros(1,5); -2.288586431066945 c15zp conj(c15zp)];
0100     %      [ez,ep,ek] = ellip(5,0.25,50,1,'high','s')
0101     e5zp=[0.406667680649209i 0.613849362744881i; -0.538736390607201+1.130245082677107i -0.092723126159100+0.958193646330194i];
0102     e5zp=[[0; -1.964538608244084]  e5zp conj(e5zp)];
0103     %    w=linspace(0.2,2,100);
0104     %    figure(1); plot(w,20*log10(abs(freqs(real(poly(c15zp(1,:))),real(poly(c15zp(2,:))),w)))); title('Chebyshev 1');
0105     %    figure(2); plot(w,20*log10(abs(freqs(real(poly(c25zp(1,:))),real(poly(c25zp(2,:))),w)))); title('Chebyshev 2');
0106     %    figure(3); plot(w,20*log10(abs(freqs(real(poly(e5zp(1,:))),real(poly(e5zp(2,:))),w)))); title('Elliptic');
0107 end
0108 
0109 if ~isstruct(fs)                        % no state vector given
0110     if nargin<3
0111         mode=' ';
0112     end
0113     fso.ffs=fs;                           % sample frequency
0114     if any(mode=='r')                   % included for backward compatibility
0115         mode=['0h' mode];               % abolish both filters
0116     elseif fs<14000
0117         mode=['h' mode];               % abolish lowpass filter at low sample rates
0118     end
0119     fso.fmd=mode;                      % save mode flags
0120     ti=1/fs;
0121     g=exp(-ti/0.03);                    % pole position for envelope filter
0122     fso.ae=[1 -2*g g^2]/(1-g)^2;        % envelope filter coefficients (DC gain = 1)
0123     fso.ze=zeros(2,1);
0124     fso.nh=ceil(0.2/ti)+1;              % hangover time in samples
0125     fso.zx=-Inf;                        % initial value for maxfilt()
0126     fso.emax=-Inf;                      % maximum exponent
0127     fso.ns=0;
0128     fso.ssq=0;
0129     fso.ss=0;
0130     fso.kc=zeros(nbin,1);               % cumulative occupancy counts
0131     % s-plane zeros and poles of high pass 5'th order filter -0.25dB at w=1 and -50dB stopband
0132     if any(mode=='1')
0133         szp=c15zp;            % Chebyshev 1
0134     elseif any(mode=='e')
0135         szp=e5zp;             % Elliptic
0136     else
0137         szp=c25zp;            % Chebyshev 2
0138     end
0139     if all(mode~='0')
0140         if any(mode=='4')
0141             fl=40;               % 40 Hz cutoff
0142         else
0143             fl=200;              % 200 Hz cutoff
0144         end
0145         zl=2./(1-szp*tan(fl*pi/fs))-1;      % 200 Hz LF limit
0146         al=real(poly(zl(2,:)));              % high pass filter
0147         bl=real(poly(zl(1,:)));
0148         sw=1-2*rem(0:5,2).';
0149         fso.bl=bl*(al*sw)/(bl*sw);                  % scale to give HF gain of 1
0150         fso.al=al;
0151         fso.zl=zeros(5,1);                   % LF filter state
0152     end
0153     if all(mode~='h')
0154         zh=2./(szp/tan(5500*pi/fs)-1)+1;
0155         ah=real(poly(zh(2,:)));
0156         bh=real(poly(zh(1,:)));
0157         fso.bh=bh*sum(ah)/sum(bh);
0158         fso.ah=ah;
0159         fso.zh=zeros(5,1);
0160     end
0161     if any(mode=='a')
0162         [fso.bw fso.aw]=stdspectrum(2,'z',fs);
0163         fso.zw=zeros(length(fso.aw)-1,1);
0164     elseif any(mode=='i')
0165         [fso.bw fso.aw]=stdspectrum(8,'z',fs);
0166         fso.zw=zeros(length(fso.aw)-1,1);
0167     end
0168 else
0169     fso=fs;             % use existing structure
0170 end
0171 md=fso.fmd;
0172 ns=length(sp);
0173 if ns                       % process this speech chunk
0174     % apply the input filters to the speech
0175     if all(md~='0')
0176         [sq,fso.zl]=filter(fso.bl,fso.al,sp(:),fso.zl);     % highpass filter
0177     else
0178         sq=sp(:);
0179     end
0180     if all(md~='h')
0181         [sq,fso.zh]=filter(fso.bh,fso.ah,sq(:),fso.zh);     % lowpass filter
0182     end
0183     if any(md=='a') || any(md=='i')
0184         [sq,fso.zw]=filter(fso.bw,fso.aw,sq(:),fso.zw);     % weighting filter
0185     end
0186     fso.ns=fso.ns+ns;      % count the number of speech samples
0187     fso.ss=fso.ss+sum(sq);  % sum of speech samples
0188     fso.ssq=fso.ssq+sum(sq.*sq);    % sum of squared speech samples
0189     [s,fso.ze]=filter(1,fso.ae,abs(sq(:)),fso.ze);     % envelope filter
0190     [qf,qe]=log2(s.^2);         % take efficient log2 function, 2^qe is upper limit of bin
0191     qe(qf==0)=-Inf;           % fix zero values
0192     [qe,qk,fso.zx]=maxfilt(qe,1,fso.nh,1,fso.zx);       % apply the 0.2 second hangover
0193     oemax=fso.emax;
0194     fso.emax=max(oemax,max(qe)+1);
0195     if fso.emax==-Inf
0196         fso.kc(1)=fso.kc(1)+ns;
0197     else
0198         qe=min(fso.emax-qe,nbin);   % force in the range 1:nbin. Bin k has 2^(emax-k-1)<=s^2<=2^(emax-k)
0199         wqe=ones(length(qe),1);
0200         % below: could use kc=cumsum(accumarray(qe,wqe,nbin)) but unsure about backwards compatibility
0201         kc=cumsum(full(sparse(qe,wqe,wqe,nbin,1)));    % cumulative occupancy counts
0202         esh=fso.emax-oemax;         % amount to shift down previous bin counts
0203         if esh<nbin-1               % if any of the previous bins are worth keeping
0204             kc(esh+1:nbin-1)=kc(esh+1:nbin-1)+fso.kc(1:nbin-esh-1);
0205             kc(nbin)=kc(nbin)+sum(fso.kc(nbin-esh:nbin));
0206         else
0207             kc(nbin)=kc(nbin)+sum(fso.kc); % otherwise just add all old counts into the last (lowest) bin
0208         end
0209         fso.kc=kc;
0210     end
0211 end
0212 if fso.ns                       % now calculate the output values
0213     if fso.ssq>0
0214         aj=10*log10(fso.ssq*(fso.kc).^(-1));
0215         % equivalent to cj=20*log10(sqrt(2).^(fso.emax-(1:nbin)-1));
0216         cj=10*log10(2)*(fso.emax-(1:nbin)-1);   % lower limit of bin j in dB
0217         mj=aj'-cj-thresh;
0218         %  jj=find(mj*sign(mj(1))<=0); % Find threshold
0219         jj=find(mj(1:end-1)<0 &  mj(2:end)>=0,1); % find +ve transition through threshold
0220         if isempty(jj)  % if we never cross the threshold
0221             if mj(end)<=0 % if we end up below if
0222                 jj=length(mj)-1; % take the threshold to be the bottom of the last (lowest) bin
0223                 jf=1;
0224             else          % if we are always above it
0225                 jj=1;     % take the threshold to be the bottom of the first (highest) bin
0226                 jf=0;
0227             end
0228         else
0229             jf=1/(1-mj(jj+1)/mj(jj));   % fractional part of j using linear interpolation
0230         end
0231         lev=aj(jj)+jf*(aj(jj+1)-aj(jj)); % active level in decibels
0232         lp=10.^(lev/10);
0233         if any(md=='d')
0234             lev=[lev 10*log10(fso.ssq/fso.ns)];
0235         else
0236             lev=[lp fso.ssq/fso.ns];
0237         end
0238         af=fso.ssq/(fso.ns*lp);
0239     else
0240         af=0;
0241         if all(md~='d')
0242             lev=[0 0];
0243         else
0244             lev=[-200 -200];
0245         end
0246     end
0247     if all(md~='l')
0248         lev=lev(1);         % only output the first element of lev normally
0249     end
0250 end
0251 if nargout>3
0252     vad=maxfilt(s,1,fso.nh,1);
0253     vad=vad>(sqrt(lp)/10^(thresh/20));
0254 end
0255 if ~nargout
0256     levdb=10*log10(lp);
0257     clf;
0258     subplot(211);
0259     plot((1:ns)/fso.ffs,[sp s (qe<=jj)*sqrt(lp)]);
0260     xlabel('Time (s)');
0261     title(sprintf('Active Level = %.2g dB, Activity = %.0f%% (ITU-T P.56)',levdb,100*af));
0262     ylabel('Amplitude');
0263     legend('Signal','Smoothed envelope','Active Level','Location','SouthEast');
0264     subplot(212);
0265     plot(cj,repmat(levdb,nbin,1),'k:',cj,[aj(:) cj(:)+thresh cj(:)]);
0266     xlabel('Threshold (dB)');
0267     ylabel('Active Level (dB)');
0268     legend('Active Level','Speech Level',sprintf('Threshold+%.1f dB',thresh),'Threshold','Location','SouthEast');
0269 elseif any(md=='n') || any(md=='N')
0270     fsx=fso;
0271     fso=af;
0272     af=lev;
0273     if any(md=='n')
0274         sq=sp;
0275     end
0276     if fsx.ns>0 && fsx.ssq>0
0277         lev=sq/sqrt(lp);
0278     else
0279         lev=sq;
0280     end
0281 end

Generated on Wed 23-Apr-2014 11:18:50 by m2html © 2003