V_MELCEPST Calculate the mel cepstrum of a signal C=(S,FS,W,NC,P,N,INC,FL,FH) Simple use: (1) c=v_melcepst(s,fs) % calculate mel cepstrum with 12 coefs, 256 sample frames (2) c=v_melcepst(s,fs,'E0dD') % include log energy, 0th cepstral coef, delta and delta-delta coefs Inputs: s speech signal fs sample rate in Hz (default 11025) w mode string (see below) nc number of cepstral coefficients excluding 0'th coefficient [default 12] p number of filters in v_filterbank [default: floor(3*log(fs)) = approx 2.1 per ocatave] n length of frame in samples [default power of 2 < (0.03*fs)] inc frame increment [default n/2] fl low end of the lowest filter as a fraction of fs [default = 0] fh high end of highest filter as a fraction of fs [default = 0.5] w any sensible combination of the following: 'R' rectangular window in time domain 'N' Hanning window in time domain 'M' Hamming window in time domain (default) 't' triangular shaped filters in mel domain (default) 'n' hanning shaped filters in mel domain 'm' hamming shaped filters in mel domain 'p' filters act in the power domain 'a' filters act in the absolute magnitude domain (default) '0' include 0'th order cepstral coefficient 'E' include log energy 'd' include delta coefficients (dc/dt) 'D' include delta-delta coefficients (d^2c/dt^2) 'z' highest and lowest filters taper down to zero (default) 'y' lowest filter remains at 1 down to 0 frequency and highest filter remains at 1 up to nyquist freqency If 'ty' or 'ny' is specified, the total power in the fft is preserved. Outputs: c mel cepstrum output: one frame per row. Log energy, if requested, is the first element of each row followed by the delta and then the delta-delta coefficients. tc fractional time in samples at the centre of each frame with the first sample being 1.
0001 function [c,tc]=v_melcepst(s,fs,w,nc,p,n,inc,fl,fh) 0002 %V_MELCEPST Calculate the mel cepstrum of a signal C=(S,FS,W,NC,P,N,INC,FL,FH) 0003 % 0004 % 0005 % Simple use: (1) c=v_melcepst(s,fs) % calculate mel cepstrum with 12 coefs, 256 sample frames 0006 % (2) c=v_melcepst(s,fs,'E0dD') % include log energy, 0th cepstral coef, delta and delta-delta coefs 0007 % 0008 % Inputs: 0009 % s speech signal 0010 % fs sample rate in Hz (default 11025) 0011 % w mode string (see below) 0012 % nc number of cepstral coefficients excluding 0'th coefficient [default 12] 0013 % p number of filters in v_filterbank [default: floor(3*log(fs)) = approx 2.1 per ocatave] 0014 % n length of frame in samples [default power of 2 < (0.03*fs)] 0015 % inc frame increment [default n/2] 0016 % fl low end of the lowest filter as a fraction of fs [default = 0] 0017 % fh high end of highest filter as a fraction of fs [default = 0.5] 0018 % 0019 % w any sensible combination of the following: 0020 % 0021 % 'R' rectangular window in time domain 0022 % 'N' Hanning window in time domain 0023 % 'M' Hamming window in time domain (default) 0024 % 0025 % 't' triangular shaped filters in mel domain (default) 0026 % 'n' hanning shaped filters in mel domain 0027 % 'm' hamming shaped filters in mel domain 0028 % 0029 % 'p' filters act in the power domain 0030 % 'a' filters act in the absolute magnitude domain (default) 0031 % 0032 % '0' include 0'th order cepstral coefficient 0033 % 'E' include log energy 0034 % 'd' include delta coefficients (dc/dt) 0035 % 'D' include delta-delta coefficients (d^2c/dt^2) 0036 % 0037 % 'z' highest and lowest filters taper down to zero (default) 0038 % 'y' lowest filter remains at 1 down to 0 frequency and 0039 % highest filter remains at 1 up to nyquist freqency 0040 % 0041 % If 'ty' or 'ny' is specified, the total power in the fft is preserved. 0042 % 0043 % Outputs: c mel cepstrum output: one frame per row. Log energy, if requested, is the 0044 % first element of each row followed by the delta and then the delta-delta 0045 % coefficients. 0046 % tc fractional time in samples at the centre of each frame 0047 % with the first sample being 1. 0048 % 0049 0050 % BUGS: (1) should have power limit as 1e-16 rather than 1e-6 (or possibly a better way of choosing this) 0051 % and put into VOICEBOX 0052 % (2) get v_rdct to change the data length (properly) instead of doing it explicitly (wrongly) 0053 0054 % Copyright (C) Mike Brookes 1997 0055 % Version: $Id: v_melcepst.m 10865 2018-09-21 17:22:45Z dmb $ 0056 % 0057 % VOICEBOX is a MATLAB toolbox for speech processing. 0058 % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html 0059 % 0060 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0061 % This program is free software; you can redistribute it and/or modify 0062 % it under the terms of the GNU General Public License as published by 0063 % the Free Software Foundation; either version 2 of the License, or 0064 % (at your option) any later version. 0065 % 0066 % This program is distributed in the hope that it will be useful, 0067 % but WITHOUT ANY WARRANTY; without even the implied warranty of 0068 % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 0069 % GNU General Public License for more details. 0070 % 0071 % You can obtain a copy of the GNU General Public License from 0072 % http://www.gnu.org/copyleft/gpl.html or by writing to 0073 % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA. 0074 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 0075 0076 if nargin<2 fs=11025; end 0077 if nargin<3 w='M'; end 0078 if nargin<4 nc=12; end 0079 if nargin<5 p=floor(3*log(fs)); end 0080 if nargin<6 n=pow2(floor(log2(0.03*fs))); end 0081 if nargin<9 0082 fh=0.5; 0083 if nargin<8 0084 fl=0; 0085 if nargin<7 0086 inc=floor(n/2); 0087 end 0088 end 0089 end 0090 0091 if isempty(w) 0092 w='M'; 0093 end 0094 if any(w=='R') 0095 [z,tc]=v_enframe(s,n,inc); 0096 elseif any (w=='N') 0097 % [z,tc]=v_enframe(s,hanning(n),inc); 0098 [z,tc]=enframe(s,0.5-0.5*cos(2*pi*(1:n)'/(n+1)),inc); % Hanning window 0099 else 0100 % [z,tc]=v_enframe(s,hamming(n),inc); 0101 [z,tc]=enframe(s,0.54-0.46*cos(2*pi*(0:n-1)'/(n-1)),inc); % Hamming window 0102 end 0103 f=rfft(z.'); 0104 [m,a,b]=v_melbankm(p,n,fs,fl,fh,w); 0105 pw=f(a:b,:).*conj(f(a:b,:)); 0106 pth=max(pw(:))*1E-20; 0107 if any(w=='p') 0108 y=log(max(m*pw,pth)); 0109 else 0110 ath=sqrt(pth); 0111 y=log(max(m*abs(f(a:b,:)),ath)); 0112 end 0113 c=rdct(y).'; 0114 nf=size(c,1); 0115 nc=nc+1; 0116 if p>nc 0117 c(:,nc+1:end)=[]; 0118 elseif p<nc 0119 c=[c zeros(nf,nc-p)]; 0120 end 0121 if ~any(w=='0') 0122 c(:,1)=[]; 0123 nc=nc-1; 0124 end 0125 if any(w=='E') 0126 c=[log(max(sum(pw),pth)).' c]; 0127 nc=nc+1; 0128 end 0129 0130 % calculate derivative 0131 0132 if any(w=='D') 0133 vf=(4:-1:-4)/60; 0134 af=(1:-1:-1)/2; 0135 ww=ones(5,1); 0136 cx=[c(ww,:); c; c(nf*ww,:)]; 0137 vx=reshape(filter(vf,1,cx(:)),nf+10,nc); 0138 vx(1:8,:)=[]; 0139 ax=reshape(filter(af,1,vx(:)),nf+2,nc); 0140 ax(1:2,:)=[]; 0141 vx([1 nf+2],:)=[]; 0142 if any(w=='d') 0143 c=[c vx ax]; 0144 else 0145 c=[c ax]; 0146 end 0147 elseif any(w=='d') 0148 vf=(4:-1:-4)/60; 0149 ww=ones(4,1); 0150 cx=[c(ww,:); c; c(nf*ww,:)]; 0151 vx=reshape(filter(vf,1,cx(:)),nf+8,nc); 0152 vx(1:8,:)=[]; 0153 c=[c vx]; 0154 end 0155 0156 if nargout<1 0157 [nf,nc]=size(c); 0158 % t=((0:nf-1)*inc+(n-1)/2)/fs; 0159 ci=(1:nc)-any(w=='0')-any(w=='E'); 0160 imh = imagesc(tc/fs,ci,c.'); 0161 axis('xy'); 0162 xlabel('Time (s)'); 0163 ylabel('Mel-cepstrum coefficient'); 0164 map = (0:63)'/63; 0165 colormap([map map map]); 0166 colorbar; 0167 end 0168