Home > voicebox > distispf.m

distispf

PURPOSE ^

DISTISPF calculates the Itakura-Saito spectral distance between power spectra D=(PF1,PF2,MODE)

SYNOPSIS ^

function d=distispf(pf1,pf2,mode)

DESCRIPTION ^

DISTISPF calculates the Itakura-Saito spectral distance between power spectra D=(PF1,PF2,MODE)

 Inputs: PF1,PF2     Power spectra to be compared. Each row represents a power spectrum: the first
                     and last columns represent the DC and Nyquist terms respectively.
                     PF1 and PF2 must have the same number of columns.

         MODE        Character string selecting the following options:
                         'x'  Calculate the full distance matrix from every row of PF1 to every row of PF2
                         'd'  Calculate only the distance between corresponding rows of PF1 and PF2
                              The default is 'd' if PF1 and PF2 have the same number of rows otherwise 'x'.
           
 Output: D           If MODE='d' then D is a column vector with the same number of rows as the shorter of PF1 and PF2.
                     If MODE='x' then D is a matrix with the same number of rows as PF1 and the same number of columns as PF2'.

 The Itakura-Saito spectral distance is the average over +ve and -ve frequency of 

                      pf1/pf2 - log(pf1/pf2) - 1     =     exp(v) - v - 1         where v=log(pf1/pf2)

 The Itakura-Saito distance is asymmetric: pf1>pf2 contributes more to the distance than pf2>pf1. 
 A symmetrical version is the COSH distance: distchpf(x,y)=(distispf(x,y)+distispf(y,x))/2

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function d=distispf(pf1,pf2,mode)
0002 %DISTISPF calculates the Itakura-Saito spectral distance between power spectra D=(PF1,PF2,MODE)
0003 %
0004 % Inputs: PF1,PF2     Power spectra to be compared. Each row represents a power spectrum: the first
0005 %                     and last columns represent the DC and Nyquist terms respectively.
0006 %                     PF1 and PF2 must have the same number of columns.
0007 %
0008 %         MODE        Character string selecting the following options:
0009 %                         'x'  Calculate the full distance matrix from every row of PF1 to every row of PF2
0010 %                         'd'  Calculate only the distance between corresponding rows of PF1 and PF2
0011 %                              The default is 'd' if PF1 and PF2 have the same number of rows otherwise 'x'.
0012 %
0013 % Output: D           If MODE='d' then D is a column vector with the same number of rows as the shorter of PF1 and PF2.
0014 %                     If MODE='x' then D is a matrix with the same number of rows as PF1 and the same number of columns as PF2'.
0015 %
0016 % The Itakura-Saito spectral distance is the average over +ve and -ve frequency of
0017 %
0018 %                      pf1/pf2 - log(pf1/pf2) - 1     =     exp(v) - v - 1         where v=log(pf1/pf2)
0019 %
0020 % The Itakura-Saito distance is asymmetric: pf1>pf2 contributes more to the distance than pf2>pf1.
0021 % A symmetrical version is the COSH distance: distchpf(x,y)=(distispf(x,y)+distispf(y,x))/2
0022 
0023 % The Itakura-Saito distance can also be calculated directly from AR coefficients; providing np is large
0024 % enough, the values of d0 and d1 in the following will be very similar:
0025 %
0026 %         np=255; d0=distisar(ar1,ar2); d1=distispf(lpcar2pf(ar1,np),lpcar2pf(ar2,np))
0027 %
0028 
0029 % Ref: A.H.Gray Jr and J.D.Markel, "Distance measures for speech processing", IEEE ASSP-24(5): 380-391, Oct 1976
0030 %      L. Rabiner abd B-H Juang, "Fundamentals of Speech Recognition", Section 4.5, Prentice-Hall 1993, ISBN 0-13-015157-2
0031 %      F.Itakura & S.Saito, "A statistical method for estimation of speech spectral density and formant frequencies",
0032 %                            Electronics & Communications in Japan, 53A: 36-43, 1970.
0033 
0034 
0035 %      Copyright (C) Mike Brookes 1997
0036 %      Version: $Id: distispf.m 713 2011-10-16 14:45:43Z dmb $
0037 %
0038 %   VOICEBOX is a MATLAB toolbox for speech processing.
0039 %   Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
0040 %
0041 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0042 %   This program is free software; you can redistribute it and/or modify
0043 %   it under the terms of the GNU General Public License as published by
0044 %   the Free Software Foundation; either version 2 of the License, or
0045 %   (at your option) any later version.
0046 %
0047 %   This program is distributed in the hope that it will be useful,
0048 %   but WITHOUT ANY WARRANTY; without even the implied warranty of
0049 %   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
0050 %   GNU General Public License for more details.
0051 %
0052 %   You can obtain a copy of the GNU General Public License from
0053 %   http://www.gnu.org/copyleft/gpl.html or by writing to
0054 %   Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
0055 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
0056 
0057 [nf1,p2]=size(pf1);
0058 p1=p2-1;
0059 nf2=size(pf2,1);
0060 if nargin<3 | isempty(mode) mode='0'; end
0061 if any(mode=='d') | (mode~='x' & nf1==nf2)
0062    nx=min(nf1,nf2);
0063    r=pf1(1:nx,:)./pf2(1:nx,:);
0064    q=r-log(r);
0065    d=(sum(q(:,2:p1),2)+0.5*(q(:,1)+q(:,p2)))/p1-1;
0066 else
0067    r=permute(pf1(:,:,ones(1,nf2)),[1 3 2])./permute(pf2(:,:,ones(1,nf1)),[3 1 2]);
0068    q=r-log(r);
0069    d=(sum(q(:,:,2:p1),3)+0.5*(q(:,:,1)+q(:,:,p2)))/p1-1;
0070 end

Generated on Fri 22-Sep-2017 19:37:38 by m2html © 2003