VOICEBOX: Speech Processing Toolbox for MATLAB
Introduction
VOICEBOX is a speech processing toolbox consists of MATLAB routines that are
maintained by and mostly written by
Mike Brookes,
Department of Electrical & Electronic
Engineering, Imperial College, Exhibition
Road, London SW7 2BT, UK. Several of the routines require MATLAB V6.5 or above
and require (normally slight) modification to work with earlier veresions.
The routines are available as a zip archive
and are made available under the terms of the GNU Public
License.
The routine VOICEBOX.M contains
various installationdependent parameters which may need to be altered before
using the toolbox. In particular it contains a number of default directory paths
indicating where temporary files should be created, where speech data normally
resides, etc. You can override these defaults by editing voicebox.m directly or,
more conveniently, by setting an environment variable VOICEBOX to the path of an
initializing mfile. See the comments in
voicebox.m for a fuller description.
For reading compressed SPHERE format files, you will need the
SHORTEN program written by Tony Robinson and SoftSound
Limited www.softsound.com. The path to
the shorten executable must be set in
voicebox.m.Unfortunately, the current version does not work on 64bit
systems.
MATLAB doesn't really like unicode fonts; some nonunicode fonts containing
IPA phonetic symbols developed by SIL are
available here.
 Audio File Input/Output
 Read and write WAV and other speech file formats
 Frequency Scales
 Convert between Hz, Mel, Erb and MIDI frequency scales
 Fourier/DCT/Hartley Transforms
 Various related transforms
 Random Number and Probability Distributions
 Generate random vectors and noise signals
 Vector Distances
 Calculate distances between vector lists
 Speech Analysis
 Active level estimation, Spectrograms
 LPC Analysis of Speech
 Linear Predictive Coding routines
 Speech Synthesis
 Texttospeech synthesis and glottal waveform models
 Speech Enhancement
 Spectral noise subtraction
 Speech Coding
 PCM coding, Vector quantisation
 Speech Recognition
 Frontend processing for recognition
 Signal Processing
 Miscellaneous signal processing functions
 Information Theory
 Routines for entropy calculation and symbol codes
 Computer Vision
 Routines for 3D rotation
 Printing and Display Functions
 Utilities for printing and graphics
 Voicebox Parameters and System Interface
 Get or set VOICEBOX and WINDOWS system parameters
 Utility Functions
 Miscellaneous utility functions
Routines are available to read and, in some cases write, a variety of
file formats:
Read 
Write 
Suffix 

readwav 
writewav 
.wav 
These routines allow an arbitrary number of channels and can
deal with linear PCM (any precision up to 32 bits), Alaw PCM,
Mulaw PCM and Floating point formats. Large files can be read and
written in small chunks. 
readhtk 
writehtk 
.htk 
Read and write waveform and parameter files used by Microsoft's
Hidden Markov Toolkit. 
readsfs 

.sfs 
Speech Filing system files from Mark Huckvale at UCL. 
readsph 

.sph 
NIST Sphere format files (including TIMIT). Needs
SHORTEN for compressed files. 
readaif 

.aif 
AIFF format (Audio Interchange File Format) used by Mac users. 
readcnx 

cnx 
Read Connex database files (from BT) 
readau 

au 
Read AV audio files (from Sun) 
From f 
To f 
Scale 

frq2bark 
bark2frq 
bark 
The bark scale is based on critical bands and masking in
the human ear. 
frq2cent 
cent2frq 
erb 
The cent scale is in increments of 0.01 semitones. 
frq2erb 
erb2frq 
erb 
The erb scale is based on the equivalent rectangular
bandwidths of the human ear. 
frq2mel 
mel2frq 
mel 
The mel scale is based on the human perception of
sinewave pitch. 
frq2midi 
midi2frq 
midi 
The midi standard specifies a numbering of semitones
with middle C being 60. They can use the normal equal tempered scale
or else the pythagorean scale of just intonation. They will in
addition output note names in a character format. 
Forward 
Inverse 

rfft 
irfft 
Forward and inverse discrete fourier transforms on real data.
Only the first half of the conjugate symmetric transform is
generated. For even length data, the inverse routine is
asumptotically twice as fast as the builtin MATLAB routine. 
rsfft 

Forward transform of real, symmetric data to give the first half
only of the real, symmetric transform. 
zoomfft 

Calculate the discrete fourier transform at an arbitrary set of
linearly spaced frequencies. Can be used to zoom into a subset of
the full frequency range. 
rdct 
irdct 
Forward and inverse discrete cosine transform on real data. 
rhartley 
rhartley 
Hartley transform on real data (forward and inverse transforms
are the same). 

Random Number Generation
randvec 
generates random vectors from gaussian
or lognormal mixture distributions. 
randiscr 
generates discrete random values with a specified
probability vector 
stdspectrum 
generates noise samples or filter coefficients for a
variety of standard spectra including: A, B, C or BS468 weighting,
USASI noise, POTS spectrum, LTASS, Internal masking noise (from SII
spec) 
randfilt 
generates filtered gaussian noise without any
startup transients. 
rnsubset 
selects a random subset of k elements from
the numbers 1:n 

Probability Density Functions
lognmpdf 
calculates the pdf of a lognormal distribution 
gaussmix 
generates a multivariate Gaussian mixture model (GMM) from
training data 
gaussmixd 
determines marginal and conditional distributions from a GMM and
can be used to perform inference on unobserved variables. 
gaussmixg 
calculates the global mean, covariance matrix and mode of a GMM 
gaussmixm 
estimates the mean and variance of the magnitude of a GMM vector
variate 
gaussmixm_cart 
calculate the CART regression tree used by gaussmixm 
gaussmixk 
calculates the KulbackLeibler Divergence, D(fg), between two
GMMs 
gaussmixp 
calculates and plots full and marginal log probability and
relative mixture probabilities from a GMM 
gaussmixt 
multiplies two GMMs together 
v_chimv 
approximates the mean and variance of a noncentral chi
distribution 
vonmisespdf 
calculate the pdf of the Von Mises (circular normal)
distribution 

Miscellaneous
berk2prob 
convert Berkson matrix to probability 
gausprod 
calculates the product of two gaussian distributions 
histndim 
calculates an ndimensional histogram (and plots a
2D one) 
maxgauss 
calculates the mean and variance of the maximum element of a
gaussian vector 
prob2berk 
convert probability matrix to Berksons 
disteusq 
calculates the squared euclidean distance between
all pairs of rows of two matrices. 
distitar 
calculates the Itakura spectral distances between sets of AR
coefficients. 
distitpf 
calculates the Itakura spectral distances between power spectra. 
distisar 
calculates the ItakuraSaito spectral distances between sets of
AR coefficients. 
distispf 
calculates the ItakuraSaito spectral distances between power
spectra. 
distchar 
calculates the COSH spectral distances between sets of AR
coefficients. 
distchpf 
calculates the COSH spectral distances between power spectra. 
activlev 
calculates the active level of a speech segment according to
ITUT recommendation P.56. 
activlevg 
calculates the active level of a speech segment robustly to
added noise 
dypsa 
estimates the glottal closure instants from the speech waveform. 
enframe 
can be used to split a signal up into frames. It can
optionally apply a window to each frame. 
correlogram 
Calculates a 3D correlogram [slowly] 
ewgrpdel 
calculates the energyweighted group delay waveform. 
fram2wav 
interpolates a sequence of framebased value into a waveform 
filtbankm 
Transformation matrix for a linear/mel/erb/barkspaced
filterbank from dft output 
fxpefac 
PEFAC pitch tracker 
fxrapt 
is an implementation of the RAPT pitch tracker by David Talkin. 
gammabank 
Determine a bank of IIR gammatone filters 
importsii 
calculate the SII importance function 
mos2pesq 
Convert MOS values to PESQ speech quality scores 
overlapadd 
Join frames up using overlapadd processing. Commonly used with
enframe. 
pesq2mos 
Convert PESQ speech quality scores to MOS values 
phon2sone 
Convert signal levels from phons to sones 
psycdigit 
experimental estimation of monotonic/unimodal psychometric
function using TIDIGITS 
psycest 
experimental estimation of monotonic psychometric function 
psycestu 
experimental estimation of unimodal psychometric function 
psychofunc 
calculate psychometric function 
v_sigma 
estimate glottal opening and closure instants from the
laryngograph/EGG waveform 
snrseg 
calculate segmental SNR and global SNR relative to a reference
signal 
sone2phon 
Convert signal levels from sones to phons 
soundspeed 
gives the speed of sound as a function of temperature 
spgrambw 
draws a spectrogram with many options. See
tutorial. 
txalign 
finds the best alignment (in a least squares sense) between two
sets of time markers (e.g. glottal closure instants). 
vadsohn 
voice activity detector 
v_ppmvu 
Calculate the PPM, VU or EBU levels of a signal 
lpcauto &
lpccovar 
perform linear predictive coding (LPC) analysis. The
routines relating to LPC are described in more detail on
another page. A large number of
conversion routines are included for changing
the form of the LPC coefficients (e.g. AR coefficients, reflection
coefficients etc.): these are of the form lpcxx2yy where xx and yy
denote the coefficient sets. 
lpcrr2am 
calculates LPC filters for all orders up to a given maximum. 
lpcbwexp 
performs bandwidth expansion on an LPC filter. 
ccwarpf 
performs frequency warping in the complex cepstrum domain. 
lpcifilt 
performs inverse filtering to estimate the glottal waveform from
the speech signal and the lpc coefficients. 
lpcrand 
can be used to generate random, stable filters for testing
purposes. 
sapisynth 
Texttospeech synthesis (TTS) of a string or matrix
entries 
glotros 
Calculates the Rosenberg model of the glottal flow
waveform 
glotlf 
Calculates the LiljencrantsFant model of the glottal flow
waveform 
estnoiseg 
uses an MMSE algorithm to estimate the noise
spectrum from a noisy speech signal that has been divided into
frames. 
estnoisem 
uses a minimumstatistics algorithm to estimate the
noise spectrum from a noisy speech signal that has been divided into
frames. 
specsub 
performs speech enhancement using spectral subtraction 
ssubmmse 
performs speech enhancement using the MMSE or log MMSE criteria 
ssubmmsev 
performs speech enhancement using the MMSE or log MMSE criteria
with VADbased noise estimate 
lin2pcma 
converts an audio waveform to 8bit Alaw PCM format 
lin2pcmu 
converts an audio waveform to 8bit mulaw PCM format 
pcma2lin 
converts 8bit Alaw PCM to a waveform 
pcmu2lin 
converts 8bit mulaw PCM to a waveform 
kmeanlbg 
vector quantisation using the LBG algorithm 
kmeanhar 
vector quantisation using the Kharmonic means algorithm 
potsband 
calculates a bandpass filter corresponding to the standard
telephone passband. 
v_kmeans 
vector quantisation using the Kmeans algorithm 
melcepst 
implements a melcepstrum front end for a recogniser 
melbankm 
constructs a bandpass filterbank with melspaced centre
frequencies 
cep2pow 
converts multivariate Gaussian means and covariances from the
log power or cepstral domain to the power domain 
pow2cep 
converts multivariate Gaussian means and covariances from the
power domain to the log power or cepstral domain 
ldatrace 
performs Linear Discriminant Analysis with optional constraints
on the transform matrix 
Signal Processing
ditherq 
adds dither and quantizes a signal 
dlyapsq 
solves the discrete lyapunov equation using an efficient square
root algorithm 
filterbank 
Apply a bank of IIR filters to a signal 
maxfilt 
performs running maximum filter 
meansqtf 
calculates the output power of a rational filter with a white
noise input 
momfilt 
generate running moments from a signal 
sigalign 
align a clean reference with a noise signal and find optimum
gain 
schmitt 
passes a signal through a schmitt trigger having hysteresis 
teager 
calculate the Teager energy waveform 
v_findpeaks 
finds the peaks in a signal 
v_windows 
generates window functions 
v_windinfo 
calculate window properties and figures of merit 
zerocros 
finds the zero crossings of a signal with interpolation 
huffman 
calculates optimum Dary symbol code from a
probability mass vector 
entropy 
calculates entropy and conditional entropy for discrete and
continuous distributions 
imagehomog 
Apply a homography transformation to an image with
bilinear interpolation 
polygonarea 
Calculates the area of a polygon 
polygonwind 
Determines whether points are inside or outside a
polygon 
polygonxline 
Determines where a line crosses a polygon 
qrabs 
Absolute value of a real quaternion 
qrdivide 
divide two real quaternions (or invert one) 
qrdotdiv 
elmentwise division of two real quaternion arrays 
qrdotmult 
elmentwise multiplication of two real quaternion
arrays 
qrmult 
multiply two real quaternion arrays 
qrpermute 
permute the indices of a quaternion array 
rectifyhomog 
Apply rectifing homographies to a set of cameras to
make their optical axes parallel 
rot2 
converts between the following representations of
rotations: rotation matrix (ro), euler angles (eu), axis of rotation
(ax), plane of rotation (pl), real quaternion vector (qr), real
quaternion matrix (mr), complex quaternion vector (qc), complex
quaternion matrix (mc). A detailed description is given
here. 
rotqrmean 
Find the average of several rotation quaternions 
rotqrvec 
Apply a quaternion rotation to an array of 3D
vectors 
skew3d 
Convert between vectors and skew symmetric matrices:
3x3 matrix <> 3x1 vector and 4x4 Plucker matrix <> 6x1 vector. 
sphrharm 
forward and inverse spherical harmonic transform
using uniform, Gaussian or arbitrary inclination (elevation) grids
and a uniform azimuth grid. 
upolyhedron 
Calculate the vertex coordinates and other
characteristics of a uniform polyhedron 
axisenlarge 
enlarge the axes of a figure slightly 
bitsprec 
rounds values to a precision of n bits 
cblabel 
add a label to the colourbar 
figbolden 
makes the lines on a figure bold, enlarges font
sizes and adjusts colours for printing clearly 
fig2emf 
optionally makes the lines on a figure bold and then
saves in windows metafile format 
frac2bin 
converts numbers to fixedpoint binary strings 
lambda2rgb 
convert wavelength to an RGB or XYZ triplet 
sprintsi 
prints a value with the correct standard SI multiplier (e.g.
2100 prints as 2.1 k) 
texthvc 
add text to plots with specified alignment and colour 
tilefigs 
arrange all figures on the screen 
v_colormap 
set and display colormap information including colormaps that
print well in monochrome 
xticksi 
Label the xaxis tick marks using SI multipliers for large and
small values. Particularly useful for logarithmic plots. 
yticksi 
Label the yaxis tick marks using SI multipliers for large and
small values. Particularly useful for logarithmic plots. 


voicebox 
contains a number of installationdependent global
parameters and is likely to need editing for each particular setup. 
unixwhich 
searches the WINDOWS system path for an executable (like UNIX
which command) 
winenvar 
Obtains WINDOWS environment variables 
atan2sc 
arctangent function that returns the sin and cos of
the angle 
bitsprec 
Rounds values to a precision of n bits 
choosenk 
all possible ways of choosing k elements out of the
numbers 1:n without duplications 
choosrnk 
all possible ways of choosing k elements out of the
numbers 1:n with duplications allowed 
dlyapsq 
Solve the discrete lyapunov equation 
dualdiag 
simultaneously diagonalises two matrices: this is
useful in computing LDA or IMELDA transforms. 
finishat 
Estimate the finishing time of a long loop 
fopenmkd 
Equivalent to FOPEN() but creates any
missing directories/folders 
hostipinfo 
Gives information about computer name and internet
connections 
logsum 
calculates log(sum(exp(x))) without overflow
problems. 
minspane 
Calculates the minimum spanning tree (a.k.a.
shortest spanning tree) of a set of ndimensional points 
mintrace 
Find a row permutation to minimize the trace of a
matrix 
m2htmlpwd 
Create HTML documentation of matlab routines in the
current directory 
nearnonz 
Replace zero elements by the nearest nonzero
elements 
permutes 
all possible permutations of the numbers 1:n 
quadpeak 
find a quadraticallyinterpolated peak in a
Ndimensional array by fitting a quadratic function to the array
values 
rotation 
generates rotation matrices 
zerotrim 
removes from a matrix any trailing rows
and columns that are all zero. 