VOICEBOX: Speech Processing Toolbox for MATLAB
Introduction
VOICEBOX is a speech processing toolbox consists of MATLAB routines that are
maintained by and mostly written by Mike Brookes, Department of Electrical & Electronic
Engineering, Imperial College,
Exhibition Road, London SW7 2BT, UK. Several of the routines require MATLAB
V6.5 or above and require (normally slight) modification to work with earlier
veresions.
The routines are available as a zip archive and are made available
under the terms of the GNU Public License.
The routine VOICEBOX.M contains various
installation-dependent parameters which may need to be altered before using the
toolbox. In particular it contains a number of default directory paths
indicating where temporary files should be created, where speech data normally
resides, etc. See the comments in voicebox.m for
a fuller description.
For reading compressed SPHERE format files, you will need the SHORTEN program written by Tony Robinson and SoftSound
Limited www.softsound.com. The path to
the shorten executable must be set in voicebox.m.
Please send any comments, suggestions, bug reports etc to mike.brookes@ic.ac.uk.
- Audio File Input/Output
- Read and write WAV and other speech file formats
- Frequency Scales
- Convert between Hz, Mel, Erb and MIDI frequency scales
- Fourier/DCT/Hartley Transforms
- Various related transforms
- Random Number and Probability Distributions
- Generate random vectors and noise signals
- Vector Distances
- Calculate distances between vector lists
- Speech Analysis
- Active level estimation, Spectrograms
- LPC Analysis of Speech
- Linear Predictive Coding routines
- Speech Synthesis
- Glottal waveform models
- Speech Enhancement
- Spectral noise subtraction
- Speech Coding
- PCM coding, Vector quantisation
- Speech Recognition
- Front-end processing for recognition
- Signal Processing
- Miscellaneous signal processing functions
- Information Theory
- Routines for entropy calculation and symbol codes
- Computer Vision
- Routines for 3D rotation
- Printing and Display Functions
- Utilities for printing and graphics
- Voicebox Parameters and System Interface
- Get or set VOICEBOX and WINDOWS system parameters
- Utility Functions
- Miscellaneous utility functions
Routines are available to read and, in some cases write, a variety of file
formats:
| Read |
Write |
Suffix |
|
| readwav |
writewav |
.wav |
These routines allow an arbitrary number of channels and can deal with
linear PCM (any precision up to 32 bits), A-law PCM and Mu-law PCM. Large files
can be read and written in small chunks. |
| readhtk |
writehtk |
.htk |
Read and write waveform and parameter files used by Microsoft's Hidden
Markov Toolkit. |
| readsfs |
|
.sfs |
Speech Filing system files from Mark Huckvale at UCL. |
| readsph |
|
.sph |
NIST Sphere format files (including TIMIT). Needs SHORTEN for compressed files. |
| readaif |
|
.aif |
AIFF format (Audio Interchange File Format) used by Mac users. |
| readcnx |
|
cnx |
Read Connex database files (from BT) |
| readau |
|
au |
Read AV audio files (from Sun) |
| From f |
To f |
Scale |
|
| frq2mel |
mel2frq |
mel |
The mel scale is based on the human perception of sinewave
pitch. |
| frq2erb |
erb2frq |
erb |
The erb scale is based on the equivalent rectangular bandwidths of
the human ear. |
| frq2erb |
erb2frq |
bark |
The bark scale is based on critical bands and masking in the human
ear. |
| frq2midi |
midi2frq |
midi |
The midi standard specifies a numbering of semitones with
middle C being 60. They can use the normal equal tempered scale or else the
pythagorean scale of just intonation. They will in addition output note names
in a character format. |
| Forward |
Inverse |
|
| rfft |
irfft |
Forward and inverse discrete fourier transforms on real data. Only the first
half of the conjugate symmetric transform is generated. For even length data,
the inverse routine is asumptotically twice as fast as the built-in MATLAB
routine. |
| rsfft |
|
Forward transform of real, symmetric data to give the first half only of the
real, symmetric transform. |
| zoomfft |
|
Calculate the discrete fourier transform at an arbitrary set of linearly
spaced frequencies. Can be used to zoom into a subset of the full frequency
range. |
| rdct |
irdct |
Forward and inverse discrete cosine transform on real data. |
| rhartley |
rhartley |
Hartley transform on real data (forward and inverse transforms are the
same). |
-
Random Number Generation
| randvec |
generates random vectors
from gaussian or lognormal mixture distributions. |
| randiscr |
generates discrete random values
with a specified probability vector |
| stdspectrum |
generates noise samples or filter coefficients for a variety of
standard spectra including: A, B, C or BS468 weighting, USASI noise, POTS
spectrum, LTASS, Internal masking noise (from SII spec) |
| randfilt |
generates filtered
gaussian noise without any startup transients. |
| rnsubset |
selects a random subset of k
elements from the numbers 1:n |
-
Probability Density Functions
| lognmpdf |
calculates the pdf of a lognormal
distribution |
| gaussmix |
generates a multivariate Gaussian mixture model (GMM) from training data |
| gaussmixp |
calculates full and marginal log probability and relative mixture
probabilities from a GMM |
| gaussmixd |
determines marginal and conditional distributions from a GMM and can be used
to perform inference on unobserved variables. |
-
Miscellaneous
| histndim |
calculates an n-dimensional
histogram (and plots a 2-D one) |
| gausprod |
calculates the product of two
gaussian distributions |
| maxgauss |
calculates the mean and variance of
the maximum element of a gaussian vector |
| disteusq |
calculates the squared
euclidean distance between all pairs of rows of two matrices. |
| distitar |
calculates the Itakura spectral distances between sets
of AR coefficients. |
| distitpf |
calculates the Itakura spectral distances between power
spectra. |
| distisar |
calculates the Itakura-Saito spectral distances between sets
of AR coefficients. |
| distispf |
calculates the Itakura-Saito spectral distances between power
spectra. |
| distchar |
calculates the COSH spectral distances between sets
of AR coefficients. |
| distchpf |
calculates the COSH spectral distances between power
spectra. |
| enframe |
can be used to split a signal up into frames. It can optionally
apply a window to each frame. |
| overlapadd |
Join frames up using overlap-add processing. Commonly used with enframe. |
| fram2wav |
interpolates a sequence of frame-based value into a waveform |
| ewgrpdel |
calculates the energy-weighted group delay waveform. |
| activlev |
calculates the active level of a speech segment according to ITU-T
recommendation P.56. |
| spgrambw |
draws a monochrome spectrogram with a dB scale. |
| txalign |
finds the best alignment (in a least squares sense) between two sets of
time markers (e.g. glottal closure instants). |
| dypsa |
estimates the glottal closure instants from the speech waveform. |
| fxrapt |
is an implementation of the RAPT pitch tracker by David Talkin. |
| soundspeed |
gives the speed of sound as a function of temperature |
| importsii |
calculate the SII importance function |
lpcauto &
lpccovar |
perform linear predictive coding (LPC) analysis. The routines
relating to LPC are described in more detail on another
page. A large number of conversion routines are
included for changing the form of the LPC coefficients (e.g. AR coefficients,
reflection coefficients etc.): these are of the form lpcxx2yy where xx and yy
denote the coefficient sets. |
| lpcrr2am |
calculates LPC filters for all orders up to a given maximum. |
| lpcbwexp |
performs bandwidth expansion on an LPC filter. |
| ccwarpf |
performs frequency warping in the complex cepstrum domain. |
| lpcifilt |
performs inverse filtering to estimate the glottal waveform from the speech
signal and the lpc coefficients. |
| lpcrand |
can be used to generate random, stable filters for testing purposes. |
| glotros |
Calculates the Rosenberg model of the glottal flow waveform |
| glotlf |
Calculates the Liljencrants-Fant model of the glottal flow waveform |
| estnoisem |
uses a
minimum-statistics algorithm to estimate the noise spectrum from a noisy speech
signal that has been divided into frames. |
| specsub |
performs speech enhancement using spectral subtraction |
| ssubmmse |
performs speech enhancement using the MMSE or log MMSE criteria |
| lin2pcma |
converts an audio waveform to 8-bit A-law PCM format |
| lin2pcmu |
converts an audio waveform to 8-bit mu-law PCM format |
| pcma2lin |
converts 8-bit A-law PCM to a waveform |
|
pcmu2lin |
converts 8-bit mu-law PCM to a waveform |
| kmeans |
vector quantisation using the K-means algorithm |
| kmeanlbg |
vector quantisation using the LBG algorithm |
| kmeanhar |
vector quantisation using the K-harmonic means algorithm |
| potsband |
calculates a bandpass
filter corresponding to the standard telephone passband. |
| melcepst |
implements a
mel-cepstrum front end for a recogniser |
| melbankm |
constructs a bandpass filterbank with mel-spaced centre frequencies |
| cep2pow |
converts multivariate Gaussian means and covariances from the log power or
cepstral domain to the power domain |
| pow2cep |
converts multivariate Gaussian means and covariances from the power domain
to the log power or cepstral domain |
| ldatrace |
performs Linear Discriminant
Analysis with optional constraints on the transform matrix |
Signal Processing
| findpeaks |
finds the peaks in a signal |
| maxfilt |
performs running maximum filter |
| meansqtf |
calculates the output power of a rational filter with a white noise
input |
| windows |
generates window functions |
| windinfo |
calculate window properties and figures of merit |
| zerocros |
finds the zero crossings of a signal with interpolation |
| ditherq |
adds dither and quantizes a signal |
| schmitt |
passes a signal through a schmitt trigger having hysteresis |
| dlyapsq |
solves the discrete lyapunov equation using an efficient square root
algorithm |
| momfilt |
generate running moments from a signal |
| huffman |
calculates optimum D-ary symbol code from a probability mass
vector |
| entropy |
calculates entropy and conditional entropy for discrete and continuous
distributions |
| rot--2-- |
converts between the following representations of rotations:
rotation matrix (ro), euler angles (eu), axis of rotation (ax), plane of
rotation (pl), real quaternion vector (qr), real quaternion matrix (mr), complex
quaternion vector (qc), complex quaternion matrix (mc). A detailed description
is given here. |
| peak2dquad |
find a quadratically-interpolated peak in a 2D array by fitting
a biquadratic function to the array values |
| polygonarea |
Calculates the area of a polygon |
| polygonwind |
Determines whether points are inside or outside a polygon |
| polygonxline |
Determines where a line crosses a polygon |
| figbolden |
makes the lines on a figure bold and enlarges font sizes for
printing clearly |
| xticksi |
Label the x-axis tick marks using SI multipliers for large and small values.
Particularly useful for logarithmic plots. |
| yticksi |
Label the y-axis tick marks using SI multipliers for large and small values.
Particularly useful for logarithmic plots. |
| sprintsi |
prints a value with the correct standard SI multiplier (e.g. 2100 prints as
2.1 k) |
| bitsprec |
rounds values to a precision of n bits |
| frac2bin |
converts numbers to fixed-point binary strings |
| voicebox |
contains a number of installation-dependent global parameters
and is likely to need editing for each particular setup. |
| unixwhich |
searches the WINDOWS system path for an executable (like UNIX which
command) |
| winenvar |
Obtains WINDOWS environment variables |
| zerotrim |
removes from a matrix any trailing rows and columns that are
all zero. |
| logsum |
calculates log(sum(exp(x))) without overflow problems. |
| dualdiag |
simultaneously diagonalises two matrices: this is useful in computing LDA
or IMELDA transforms. |
| permutes |
all possible permutations of the numbers 1:n |
| choosenk |
all possible ways of choosing k elements out of the numbers 1:n without
duplications |
| choosrnk |
all possible ways of choosing k elements out of the numbers 1:n with
duplications allowed |
| rotation |
generates rotation matrices |
| skew3d |
manipulates 3#3 skew symmetric matrices |
| atan2sc |
arctangent function that returns the sin and cos of the angle |
| bitsprec |
Rounds values to a precision of n bits |
| dlyapsq |
Solve the discrete lyapunov equation |
| finishat |
Estimate the finishing time of a long loop |
| m2htmlpwd |
Create HTML documentation of matlab routines in the current
directory |