


ACTIVLEV Measure active speech level as in ITU-T P.56 [LEV,AF,FSO]=(sp,FS,MODE)
Usage: (1) lev=activlev(s,fs); % speech level in units of power
(2) db=activlev(s,fs,'d'); % speech level in dB
(3) s=activlev(s,fs,'n'); % normalize active level to 0 dB
Inputs: sp is the speech signal (with better than 20dB SNR)
FS is the sample frequency in Hz (see also FSO below)
MODE is a combination of the following:
r - raw omit input filters (default is 200 Hz to 5.5 kHz)
0 - no high pass filter (i.e. include DC)
4 - high pass filter at 40 Hz (but allows mains hum to pass)
1 - use cheybyshev 1 filter
2 - use chebyshev 2 filter (default)
e - use elliptic filter
h - omit low pass filter at 5.5 kHz
d - give outputs in dB rather than power
n - output a normalized speech signal as the first argument
N - output a normalized filtered speech signal as the first argument
l - give both active and long-term power levels
a - include A-weighting filter
i - include ITU-R-BS.468/ITU-T-J.16 weighting filter

0001 function [lev,af,fso,vad]=activlev(sp,fs,mode) 0002 %ACTIVLEV Measure active speech level as in ITU-T P.56 [LEV,AF,FSO]=(sp,FS,MODE) 0003 % 0004 %Usage: (1) lev=activlev(s,fs); % speech level in units of power 0005 % (2) db=activlev(s,fs,'d'); % speech level in dB 0006 % (3) s=activlev(s,fs,'n'); % normalize active level to 0 dB 0007 % 0008 %Inputs: sp is the speech signal (with better than 20dB SNR) 0009 % FS is the sample frequency in Hz (see also FSO below) 0010 % MODE is a combination of the following: 0011 % r - raw omit input filters (default is 200 Hz to 5.5 kHz) 0012 % 0 - no high pass filter (i.e. include DC) 0013 % 4 - high pass filter at 40 Hz (but allows mains hum to pass) 0014 % 1 - use cheybyshev 1 filter 0015 % 2 - use chebyshev 2 filter (default) 0016 % e - use elliptic filter 0017 % h - omit low pass filter at 5.5 kHz 0018 % d - give outputs in dB rather than power 0019 % n - output a normalized speech signal as the first argument 0020 % N - output a normalized filtered speech signal as the first argument 0021 % l - give both active and long-term power levels 0022 % a - include A-weighting filter 0023 % i - include ITU-R-BS.468/ITU-T-J.16 weighting filter 0024 0025 %Outputs: 0026 % If the "n" option is specified, a speech signal normalized to 0dB will be given as 0027 % the first output followed by the other outputs. 0028 % LEV gives the speech level in units of power (or dB if mode='d') 0029 % if mode='l' is specified, LEV is a row vector with the "long term 0030 % level" as its second element (this is just the mean power) 0031 % AF is the activity factor (or duty cycle) in the range 0 to 1 0032 % FSO is a column vector of intermediate information that allows 0033 % you to process a speech signal in chunks. Thus: 0034 % 0035 % fso=fs; for i=1:inc:nsamp, [lev,fso]=activlev(sp(i:i+inc-1),fso,mode); end 0036 % 0037 % is equivalent to: lev=activlev(sp(1:nsamp),fs,mode) 0038 % 0039 % but is much slower. The two methods will not give identical results 0040 % because they will use slightly different thresholds. 0041 % VAD is a boolean vector the same length as sp that acts as an approximate voice activity detector 0042 0043 %For completeness we list here the contents of the FSO structure: 0044 % 0045 % ffs : sample frequency 0046 % fmd : mode string 0047 % nh : hangover time in samples 0048 % ae : smoothing filter coefs 0049 % bl : 200Hz HP filter numerator 0050 % al : 200Hz HP filter denominator 0051 % bh : 5.5kHz LP filter numerator 0052 % ah : 5.5kHz LP filter denominator 0053 % ze : smoothing filter state 0054 % zl : 200Hz HP filter state 0055 % zh : 5.5kHz LP filter state 0056 % zx : hangover max filter state 0057 % emax : maximum envelope exponent + 1 0058 % ssq : signal sum of squares 0059 % ns : number of signal samples 0060 % ss : sum of speech samples (not actually used here) 0061 % kc : cumulative occupancy counts 0062 % aw : weighting filter denominator 0063 % bw : weighting filter numerator 0064 % zw : weighting filter state 0065 0066 % Copyright (C) Mike Brookes 2008-2012 0067 % Version: $Id: activlev.m 2516 2012-11-19 08:19:45Z dmb $ 0068 % 0069 % VOICEBOX is a MATLAB toolbox for speech processing. 0070 % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html 0071 % 0072 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0073 % This program is free software; you can redistribute it and/or modify 0074 % it under the terms of the GNU General Public License as published by 0075 % the Free Software Foundation; either version 2 of the License, or 0076 % (at your option) any later version. 0077 % 0078 % This program is distributed in the hope that it will be useful, 0079 % but WITHOUT ANY WARRANTY; without even the implied warranty of 0080 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 0081 % GNU General Public License for more details. 0082 % 0083 % You can obtain a copy of the GNU General Public License from 0084 % http://www.gnu.org/copyleft/gpl.html or by writing to 0085 % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. 0086 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0087 0088 persistent nbin thresh c25zp c15zp e5zp 0089 if isempty(nbin) 0090 nbin=20; % 60 dB range at 3dB per bin 0091 thresh=15.9; % threshold in dB 0092 % High pass s-domain zeros and poles of filters with passband ripple<0.25dB, stopband<-50dB, w0=1 0093 % w0=fzero(@ch2,0.5); [c2z,c2p,k]=cheby2(5,50,w0,'high','s'); 0094 % function v=ch2(w); [c2z,c2p,k]=cheby2(5,50,w,'high','s'); v= 20*log10(prod(abs(1i-c2z))/prod(abs(1i-c2p)))+0.25; 0095 c25zp=[0.37843443673309i 0.23388534441447i; -0.20640255179496+0.73942185906851i -0.54036889596392+0.45698784092898i]; 0096 c25zp=[[0; -0.66793268833792] c25zp conj(c25zp)]; 0097 % [c1z,c1p,c1k] = cheby1(5,0.25,1,'high','s'); 0098 c15zp=[-0.659002835294875+1.195798636925079i -0.123261821596263+0.947463030958881i]; 0099 c15zp=[zeros(1,5); -2.288586431066945 c15zp conj(c15zp)]; 0100 % [ez,ep,ek] = ellip(5,0.25,50,1,'high','s') 0101 e5zp=[0.406667680649209i 0.613849362744881i; -0.538736390607201+1.130245082677107i -0.092723126159100+0.958193646330194i]; 0102 e5zp=[[0; -1.964538608244084] e5zp conj(e5zp)]; 0103 % w=linspace(0.2,2,100); 0104 % figure(1); plot(w,20*log10(abs(freqs(real(poly(c15zp(1,:))),real(poly(c15zp(2,:))),w)))); title('Chebyshev 1'); 0105 % figure(2); plot(w,20*log10(abs(freqs(real(poly(c25zp(1,:))),real(poly(c25zp(2,:))),w)))); title('Chebyshev 2'); 0106 % figure(3); plot(w,20*log10(abs(freqs(real(poly(e5zp(1,:))),real(poly(e5zp(2,:))),w)))); title('Elliptic'); 0107 end 0108 0109 if ~isstruct(fs) % no state vector given 0110 if nargin<3 0111 mode=' '; 0112 end 0113 fso.ffs=fs; % sample frequency 0114 if any(mode=='r') % included for backward compatibility 0115 mode=['0h' mode]; % abolish both filters 0116 elseif fs<14000 0117 mode=['h' mode]; % abolish lowpass filter at low sample rates 0118 end 0119 fso.fmd=mode; % save mode flags 0120 ti=1/fs; 0121 g=exp(-ti/0.03); % pole position for envelope filter 0122 fso.ae=[1 -2*g g^2]/(1-g)^2; % envelope filter coefficients (DC gain = 1) 0123 fso.ze=zeros(2,1); 0124 fso.nh=ceil(0.2/ti)+1; % hangover time in samples 0125 fso.zx=-Inf; % initial value for maxfilt() 0126 fso.emax=-Inf; % maximum exponent 0127 fso.ns=0; 0128 fso.ssq=0; 0129 fso.ss=0; 0130 fso.kc=zeros(nbin,1); % cumulative occupancy counts 0131 % s-plane zeros and poles of high pass 5'th order filter -0.25dB at w=1 and -50dB stopband 0132 if any(mode=='1') 0133 szp=c15zp; % Chebyshev 1 0134 elseif any(mode=='e') 0135 szp=e5zp; % Elliptic 0136 else 0137 szp=c25zp; % Chebyshev 2 0138 end 0139 if all(mode~='0') 0140 if any(mode=='4') 0141 fl=40; % 40 Hz cutoff 0142 else 0143 fl=200; % 200 Hz cutoff 0144 end 0145 zl=2./(1-szp*tan(fl*pi/fs))-1; % 200 Hz LF limit 0146 al=real(poly(zl(2,:))); % high pass filter 0147 bl=real(poly(zl(1,:))); 0148 sw=1-2*rem(0:5,2).'; 0149 fso.bl=bl*(al*sw)/(bl*sw); % scale to give HF gain of 1 0150 fso.al=al; 0151 fso.zl=zeros(5,1); % LF filter state 0152 end 0153 if all(mode~='h') 0154 zh=2./(szp/tan(5500*pi/fs)-1)+1; 0155 ah=real(poly(zh(2,:))); 0156 bh=real(poly(zh(1,:))); 0157 fso.bh=bh*sum(ah)/sum(bh); 0158 fso.ah=ah; 0159 fso.zh=zeros(5,1); 0160 end 0161 if any(mode=='a') 0162 [fso.bw fso.aw]=stdspectrum(2,'z',fs); 0163 fso.zw=zeros(length(fso.aw)-1,1); 0164 elseif any(mode=='i') 0165 [fso.bw fso.aw]=stdspectrum(8,'z',fs); 0166 fso.zw=zeros(length(fso.aw)-1,1); 0167 end 0168 else 0169 fso=fs; % use existing structure 0170 end 0171 md=fso.fmd; 0172 ns=length(sp); 0173 if ns % process this speech chunk 0174 % apply the input filters to the speech 0175 if all(md~='0') 0176 [sq,fso.zl]=filter(fso.bl,fso.al,sp(:),fso.zl); % highpass filter 0177 else 0178 sq=sp(:); 0179 end 0180 if all(md~='h') 0181 [sq,fso.zh]=filter(fso.bh,fso.ah,sq(:),fso.zh); % lowpass filter 0182 end 0183 if any(md=='a') || any(md=='i') 0184 [sq,fso.zw]=filter(fso.bw,fso.aw,sq(:),fso.zw); % weighting filter 0185 end 0186 fso.ns=fso.ns+ns; % count the number of speech samples 0187 fso.ss=fso.ss+sum(sq); % sum of speech samples 0188 fso.ssq=fso.ssq+sum(sq.*sq); % sum of squared speech samples 0189 [s,fso.ze]=filter(1,fso.ae,abs(sq(:)),fso.ze); % envelope filter 0190 [qf,qe]=log2(s.^2); % take efficient log2 function, 2^qe is upper limit of bin 0191 qe(qf==0)=-Inf; % fix zero values 0192 [qe,qk,fso.zx]=maxfilt(qe,1,fso.nh,1,fso.zx); % apply the 0.2 second hangover 0193 oemax=fso.emax; 0194 fso.emax=max(oemax,max(qe)+1); 0195 if fso.emax==-Inf 0196 fso.kc(1)=fso.kc(1)+ns; 0197 else 0198 qe=min(fso.emax-qe,nbin); % force in the range 1:nbin. Bin k has 2^(emax-k-1)<=s^2<=2^(emax-k) 0199 wqe=ones(length(qe),1); 0200 % below: could use kc=cumsum(accumarray(qe,wqe,nbin)) but unsure about backwards compatibility 0201 kc=cumsum(full(sparse(qe,wqe,wqe,nbin,1))); % cumulative occupancy counts 0202 esh=fso.emax-oemax; % amount to shift down previous bin counts 0203 if esh<nbin-1 % if any of the previous bins are worth keeping 0204 kc(esh+1:nbin-1)=kc(esh+1:nbin-1)+fso.kc(1:nbin-esh-1); 0205 kc(nbin)=kc(nbin)+sum(fso.kc(nbin-esh:nbin)); 0206 else 0207 kc(nbin)=kc(nbin)+sum(fso.kc); % otherwise just add all old counts into the last (lowest) bin 0208 end 0209 fso.kc=kc; 0210 end 0211 end 0212 if fso.ns % now calculate the output values 0213 if fso.ssq>0 0214 aj=10*log10(fso.ssq*(fso.kc).^(-1)); 0215 % equivalent to cj=20*log10(sqrt(2).^(fso.emax-(1:nbin)-1)); 0216 cj=10*log10(2)*(fso.emax-(1:nbin)-1); % lower limit of bin j in dB 0217 mj=aj'-cj-thresh; 0218 % jj=find(mj*sign(mj(1))<=0); % Find threshold 0219 jj=find(mj(1:end-1)<0 & mj(2:end)>=0,1); % find +ve transition through threshold 0220 if isempty(jj) % if we never cross the threshold 0221 if mj(end)<=0 % if we end up below if 0222 jj=length(mj)-1; % take the threshold to be the bottom of the last (lowest) bin 0223 jf=1; 0224 else % if we are always above it 0225 jj=1; % take the threshold to be the bottom of the first (highest) bin 0226 jf=0; 0227 end 0228 else 0229 jf=1/(1-mj(jj+1)/mj(jj)); % fractional part of j using linear interpolation 0230 end 0231 lev=aj(jj)+jf*(aj(jj+1)-aj(jj)); % active level in decibels 0232 lp=10.^(lev/10); 0233 if any(md=='d') 0234 lev=[lev 10*log10(fso.ssq/fso.ns)]; 0235 else 0236 lev=[lp fso.ssq/fso.ns]; 0237 end 0238 af=fso.ssq/(fso.ns*lp); 0239 else 0240 af=0; 0241 if all(md~='d') 0242 lev=[0 0]; 0243 else 0244 lev=[-200 -200]; 0245 end 0246 end 0247 if all(md~='l') 0248 lev=lev(1); % only output the first element of lev normally 0249 end 0250 end 0251 if nargout>3 0252 vad=maxfilt(s,1,fso.nh,1); 0253 vad=vad>(sqrt(lp)/10^(thresh/20)); 0254 end 0255 if ~nargout 0256 levdb=10*log10(lp); 0257 clf; 0258 subplot(211); 0259 plot((1:ns)/fso.ffs,[sp s (qe<=jj)*sqrt(lp)]); 0260 xlabel('Time (s)'); 0261 title(sprintf('Active Level = %.2g dB, Activity = %.0f%% (ITU-T P.56)',levdb,100*af)); 0262 ylabel('Amplitude'); 0263 legend('Signal','Smoothed envelope','Active Level','Location','SouthEast'); 0264 subplot(212); 0265 plot(cj,repmat(levdb,nbin,1),'k:',cj,[aj(:) cj(:)+thresh cj(:)]); 0266 xlabel('Threshold (dB)'); 0267 ylabel('Active Level (dB)'); 0268 legend('Active Level','Speech Level',sprintf('Threshold+%.1f dB',thresh),'Threshold','Location','SouthEast'); 0269 elseif any(md=='n') || any(md=='N') 0270 fsx=fso; 0271 fso=af; 0272 af=lev; 0273 if any(md=='n') 0274 sq=sp; 0275 end 0276 if fsx.ns>0 && fsx.ssq>0 0277 lev=sq/sqrt(lp); 0278 else 0279 lev=sq; 0280 end 0281 end