Speech Processing

Mike Brookes
20 lectures in the Spring Term

Note that this is an old version of the course; the current version is now given by Patrick Naylor.

Syllabus

The human vocal and auditory systems. Characteristics of speech signals: phonemes, prosody, IPA notation. Lossless tube model of speech production. Time and frequency domain representations of speech; window characteristics and time/frequency resolution tradeoffs. Properties of digital filters: mean log response, resonance gain and bandwidth relations, bandwidth expansion transformation, all-pass filter characteristics. Autocorrelation and covariance linear prediction of speech; optimality criteria in time and frequency domains; alternate LPC parametrisation. Speech coding: PCM, ADPCM, CELP. Speech synthesis: language processing, prosody, diphone and formant synthesis; time domain pitch and speech modification. Speech recognition: hidden Markov models and associated recognition and training algorithms. Language modelling. Large vocabulary recognition. Acoustic preprocessing for speech recognition.

Lecture List

  1. Overview of Course
  2. Sound Waves in a Tube
  3. Time-Frequency Representation
  4. Characteristics of Filters
  5. Autocorrelation Linear Prediction and Spectral Whitening
  6. Covariance LPC and LPC Parameter Sets
  7. Cepstral Coefficients and Line Spectrum Frequencies
  8. Speech Coding using uniform and non-uniform quantisation
  9. Speech Coding using Adaptive Differential PCM
  10. Code-excited Linear Prediction
  11. Phonetics: Vowels, Consonants and Prosody
  12. Speech Synthesis: Words to Phonemes
  13. Speech Synthesis: Phonemes to Sounds
  14. Introduction to Speech Recognition
  15. Hidden Markov Models and Viterbi Recognition
  16. Hidden Markov Model Training
  17. Continuous Speech Recognition
  18. Language Modelling
  19. Input Processing