VOICEBOX is a speech processing toolbox consists of MATLAB routines that are maintained by and mostly written by Mike Brookes, Department of Electrical & Electronic Engineering, Imperial College, Exhibition Road, London SW7 2BT, UK.
The routines are available as a GitHub repository or a zip archive and are made available under the terms of the GNU Public License. To avoid conflicts, all routine names begin with a "v_" prefix. For compatibility with legacy code, aliased versions without the prefix are included but these are likely to be removed in the future (the routine v_voicebox_update.m is included to update legacy code to the new names).
Please send any comments, suggestions, bug reports etc to mike.brookes@ic.ac.uk.Please see the site accessibility statement here.
Routines are available to read and, in some cases write, a variety of file formats:
Read Write Suffix v_readwav v_writewav .wav These routines allow an arbitrary number of channels and can deal with linear PCM (any precision up to 32 bits), A-law PCM, Mu-law PCM and Floating point formats. Large files can be read and written in small chunks. wavread wavwrite .wav Emulations of legacy MATLAB WAV file routines v_readhtk v_writehtk .htk Read and write waveform and parameter files used by Microsoft's Hidden Markov Toolkit. v_readsfs .sfs Speech Filing system files from Mark Huckvale at UCL. v_readsph .sph NIST Sphere format files (including TIMIT). Needs SHORTEN for compressed files. v_readaif .aif AIFF format (Audio Interchange File Format) used by Mac users. v_readcnx cnx Read Connex database files (from BT) v_readau au Read AV audio files (from Sun)
From f | To f | Scale | |
v_frq2bark | v_bark2frq | bark | The bark scale is based on critical bands and masking in the human ear. |
v_frq2cent | v_cent2frq | erb | The cent scale is in increments of 0.01 semitones. |
v_frq2erb | v_erb2frq | erb | The erb scale is based on the equivalent rectangular bandwidths of the human ear. |
v_frq2mel | v_mel2frq | mel | The mel scale is based on the human perception of sinewave pitch. |
v_frq2midi | v_midi2frq | midi | The midi standard specifies a numbering of semitones with middle C being 60. They can use the normal equal tempered scale or else the pythagorean scale of just intonation. They will in addition output note names in a character format. |
Forward | Inverse | |
v_rfft | v_irfft | Forward and inverse discrete fourier transforms on real data. Only the first half of the conjugate symmetric transform is generated. For even length data, the inverse routine is asumptotically twice as fast as the built-in MATLAB routine. |
v_rsfft | Forward transform of real, symmetric data to give the first half only of the real, symmetric transform. | |
v_zoomfft | Calculate the discrete fourier transform at an arbitrary set of linearly spaced frequencies. Can be used to zoom into a subset of the full frequency range. | |
v_rdct | v_irdct | Forward and inverse discrete cosine transform on real data. |
v_rhartley | v_rhartley | Hartley transform on real data (forward and inverse transforms are the same). |
v_randvec | generates random vectors from gaussian or lognormal mixture distributions. |
v_ randiscr | generates discrete random values with a specified probability vector |
v_ stdspectrum | generates noise samples or filter coefficients for a variety of standard spectra including: A, B, C or BS468 weighting, USASI noise, POTS spectrum, LTASS, Internal masking noise (from SII spec) |
v_ randfilt | generates filtered gaussian noise without any startup transients. |
v_ rnsubset | selects a random subset of k elements from the numbers 1:n |
v_ lognmpdf | calculates the pdf of a lognormal distribution |
v_gaussmix | generates a multivariate Gaussian mixture model (GMM) from training data |
v_gaussmixd | determines marginal and conditional distributions from a GMM and can be used to perform inference on unobserved variables. |
v_gaussmixg | calculates the global mean, covariance matrix and mode of a GMM |
v_gaussmixm | estimates the mean and variance of the magnitude of a GMM vector variate |
v_gaussmixk | calculates the Kulback-Leibler Divergence, D(f||g), between two GMMs |
v_gaussmixp | calculates and plots full and marginal log probability and relative mixture probabilities from a GMM |
v_gaussmixt | multiplies two GMMs together |
v_normcdflog | calculates the log of the normal cdf without underflow issues |
v_chimv | approximates the mean and variance of a non-central chi distribution |
v_vonmisespdf | calculate the pdf of the Von Mises (circular normal) distribution |
v_ berk2prob | convert Berkson matrix to probability |
v_gausprod | calculates the product of two gaussian distributions |
v_ histndim | calculates an n-dimensional histogram (and plots a 2-D one) |
v_pdfmoments | convert between central moments, raw moments and cumulants |
v_maxgauss | calculates the mean and variance of the maximum element of a gaussian vector |
v_prob2berk | convert probability matrix to Berksons |
v_ disteusq calculates the squared euclidean distance between all pairs of rows of two matrices. v_distitar calculates the Itakura spectral distances between sets of AR coefficients. v_distitpf calculates the Itakura spectral distances between power spectra. v_distisar calculates the Itakura-Saito spectral distances between sets of AR coefficients. v_distispf calculates the Itakura-Saito spectral distances between power spectra. v_distchar calculates the COSH spectral distances between sets of AR coefficients. v_distchpf calculates the COSH spectral distances between power spectra.
v_activlev calculates the active level of a speech segment according to ITU-T recommendation P.56. v_activlevg calculates the active level of a speech segment robustly to added noise v_dypsa estimates the glottal closure instants from the speech waveform. v_ enframe can be used to split a signal up into frames. It can optionally apply a window to each frame. v_correlogram Calculates a 3D correlogram [slowly] v_ewgrpdel calculates the energy-weighted group delay waveform. v_fram2wav interpolates a sequence of frame-based value into a waveform v_filtbankm Transformation matrix for a linear/mel/erb/bark-spaced filterbank from dft output v_fxpefac PEFAC pitch tracker v_fxrapt is an implementation of the RAPT pitch tracker by David Talkin. v_gammabank Determine a bank of IIR gammatone filters v_importsii calculate the SII importance function v_mos2pesq Convert MOS values to PESQ speech quality scores v_overlapadd Join frames up using overlap-add processing. Commonly used with enframe. v_pesq2mos Convert PESQ speech quality scores to MOS values v_phon2sone Convert signal levels from phons to sones v_psycdigit experimental estimation of monotonic/unimodal psychometric function using TIDIGITS v_psycest experimental estimation of monotonic psychometric function v_psycestu experimental estimation of unimodal psychometric function v_psychofunc calculate psychometric function v_sigma estimate glottal opening and closure instants from the laryngograph/EGG waveform v_snrseg calculate segmental SNR and global SNR relative to a reference signal v_sone2phon Convert signal levels from sones to phons v_soundspeed gives the speed of sound as a function of temperature v_spgrambw draws a spectrogram with many options. See tutorial. v_txalign finds the best alignment (in a least squares sense) between two sets of time markers (e.g. glottal closure instants). v_vadsohn voice activity detector v_ppmvu Calculate the PPM, VU or EBU levels of a signal
The routines relating to LPC are described in more detail on another page. A large number of conversion routines are included for changing the form of the LPC coefficients (e.g. AR coefficients, reflection coefficients etc.): these are of the form lpcxx2yy where xx and yy denote the coefficient sets. v_ccwarpf performs frequency warping in the complex cepstrum domain. v_ lpcauto Perform autocorrelation LPC analysis v_lpccovar perform covariance LPC analysis v_lpcbwexp performs bandwidth expansion on an LPC filter. v_lpcifilt performs inverse filtering to estimate the glottal waveform from the speech signal and the lpc coefficients. v_lpcrand can be used to generate random, stable filters for testing purposes. v_lpcrr2am calculates LPC filters for all orders up to a given maximum. v_lpcstable determines filter stability and forces filter stability
v_ sapisynth Text-to-speech synthesis (TTS) of a string or matrix entries v_ glotros Calculates the Rosenberg model of the glottal flow waveform v_glotlf Calculates the Liljencrants-Fant model of the glottal flow waveform
v_ estnoiseg uses an MMSE algorithm to estimate the noise spectrum from a noisy speech signal that has been divided into frames. v_ estnoisem uses a minimum-statistics algorithm to estimate the noise spectrum from a noisy speech signal that has been divided into frames. v_specsub performs speech enhancement using spectral subtraction v_spendred performs speech enhancement and dereverberation v_ssubmmse performs speech enhancement using the MMSE or log MMSE criteria v_ssubmmsev performs speech enhancement using the MMSE or log MMSE criteria with VAD-based noise estimate
v_ lin2pcma converts an audio waveform to 8-bit A-law PCM format v_lin2pcmu converts an audio waveform to 8-bit mu-law PCM format v_pcma2lin converts 8-bit A-law PCM to a waveform v_pcmu2lin converts 8-bit mu-law PCM to a waveform v_kmeanlbg vector quantisation using the LBG algorithm v_kmeanhar vector quantisation using the K-harmonic means algorithm v_potsband calculates a bandpass filter corresponding to the standard telephone passband. v_kmeans vector quantisation using the K-means algorithm
v_ melcepst implements a mel-cepstrum front end for a recogniser v_melbankm constructs a bandpass filterbank with mel-spaced centre frequencies v_cep2pow converts multivariate Gaussian means and covariances from the log power or cepstral domain to the power domain v_pow2cep converts multivariate Gaussian means and covariances from the power domain to the log power or cepstral domain v_ldatrace performs Linear Discriminant Analysis with optional constraints on the transform matrix
v_convfft 1-dimensional convolution/corrolation using FFT v_ditherq adds dither and quantizes a signal v_dlyapsq solves the discrete lyapunov equation using an efficient square root algorithm v_filterbank Apply a bank of IIR filters to a signal v_maxfilt performs running maximum filter v_meansqtf calculates the output power of a rational filter with a white noise input v_momfilt generate running moments from a signal v_sigalign align a clean reference with a noise signal and find optimum gain v_schmitt passes a signal through a schmitt trigger having hysteresis v_teager calculate the Teager energy waveform v_addnoise add noise to a signal at a chosen SNR v_findpeaks finds the peaks in a signal v_windows generates window functions v_windinfo calculate window properties and figures of merit v_zerocros finds the zero crossings of a signal with interpolation
v_huffman | calculates optimum D-ary symbol code from a probability mass vector |
v_entropy | calculates entropy and conditional entropy for discrete and continuous distributions |
v_imagehomog | Apply a homography transformation to an image with bilinear interpolation |
v_polygonarea | Calculates the area of a polygon |
v_polygonwind | Determines whether points are inside or outside a polygon |
v_polygonxline | Determines where a line crosses a polygon |
v_qrabs | Absolute value of a real quaternion |
v_qrdivide | divide two real quaternions (or invert one) |
v_qrdotdiv | elmentwise division of two real quaternion arrays |
v_qrdotmult | elmentwise multiplication of two real quaternion arrays |
v_qrmult | multiply two real quaternion arrays |
v_qrpermute | permute the indices of a quaternion array |
v_rectifyhomog | Apply rectifing homographies to a set of cameras to make their optical axes parallel |
v_rot--2-- | converts between the following representations of rotations: rotation matrix (ro), euler angles (eu), axis of rotation (ax), plane of rotation (pl), real quaternion vector (qr), real quaternion matrix (mr), complex quaternion vector (qc), complex quaternion matrix (mc). A detailed description is given here. |
v_rotqrmean | Find the average of several rotation quaternions |
v_rotqrvec | Apply a quaternion rotation to an array of 3D vectors |
v_skew3d | Convert between vectors and skew symmetric matrices: 3x3 matrix <-> 3x1 vector and 4x4 Plucker matrix <-> 6x1 vector. |
v_sphrharm | forward and inverse spherical harmonic transform using uniform, Gaussian or arbitrary inclination (elevation) grids and a uniform azimuth grid. |
v_upolyhedron | Calculate the vertex coordinates and other characteristics of a uniform polyhedron |
v_axisenlarge | enlarge the axes of a figure slightly |
v_bitsprec | rounds values to a precision of n bits |
v_cblabel | add a label to the colourbar |
v_figbolden | makes the lines on a figure bold, enlarges font sizes and adjusts colours for printing clearly |
v_fig2emf | optionally makes the lines on a figure bold and then saves in windows metafile format |
v_fig2pdf | optionally makes the lines on a figure bold and then saves in ps, eps or pdf format |
v_frac2bin | converts numbers to fixed-point binary strings |
v_lambda2rgb | convert wavelength to an RGB or XYZ triplet |
v_sprintcpx | prints a the real and imaginary parts of a complex number |
v_sprintsi | prints a value with the correct standard SI multiplier (e.g. 2100 prints as 2.1 k) |
v_texthvc | add text to plots with specified alignment and colour |
v_tilefigs | arrange all figures on the screen |
v_colormap | set and display colormap information including colormaps that print well in monochrome |
v_xticksi | Label the x-axis tick marks using SI multipliers for large and small values. Particularly useful for logarithmic plots. |
v_yticksi | Label the y-axis tick marks using SI multipliers for large and small values. Particularly useful for logarithmic plots. |
v_ hostipinfo | Gives information about computer name and internet connections |
v_regexfiles | Recursively find files that match a regular expression pattern |
v_unixwhich | searches the WINDOWS system path for an executable (like UNIX which command) |
v_voicebox | contains a number of installation-dependent global parameters and is likely to need editing for each particular setup. |
v_voicebox_update | update old code to new names by inserting a "v_" prefix where needed |
v_winenvar | Obtains WINDOWS environment variables |
v_atan2sc arctangent function that returns the sin and cos of the angle v_besselratio calculate the Bessel function ratio: besseli(v+1,x)./besseli(v,x) v_bitsprec Rounds values to a precision of n bits v_choosenk all possible ways of choosing k elements out of the numbers 1:n without duplications v_choosrnk all possible ways of choosing k elements out of the numbers 1:n with duplications allowed v_dlyapsq Solve the discrete lyapunov equation v_ dualdiag simultaneously diagonalises two matrices: this is useful in computing LDA or IMELDA transforms. v_finishat Estimate the finishing time of a long loop v_fopenmkd Equivalent to FOPEN() but creates any missing directories/folders v_gammalns Calculates log(gamma(x)) for signed real-valued x v_horizdiff Estimates the horizontal difference between two functions of x v_ hypergeom1f1 Confluent Hypergeometric Function (Kummer's M function) v_logsum calculates log(sum(exp(x))) without overflow problems. v_minspane Calculates the minimum spanning tree (a.k.a. shortest spanning tree) of a set of n-dimensional points v_mintrace Find a row permutation to minimize the trace of a matrix v_m2htmlpwd Create HTML documentation of matlab routines in the current directory v_nearnonz Replace zero elements by the nearest non-zero elements v_permutes all possible permutations of the numbers 1:n v_quadpeak find a quadratically-interpolated peak in a N-dimensional array by fitting a quadratic function to the array values v_rotation generates rotation matrices v_ zerotrim removes from a matrix any trailing rows and columns that are all zero.