Distant Speech Recognition

by: Matthias Christ Woelfel - John McDonough

Distant Speech Recognition
Author: Matthias Christ Woelfel, John McDonough

Publisher: Wiley-Blackwell (an imprint of John Wiley & Sons Ltd)

List price: £ 86.25

Deastore.com price (info) € 101.57

Format: Other digital

Publication date: 29 May 2009

Availability: (info) Not available

ISBN: 0470714085 ISBN 13: 9780470714089

Distant Speech Recognition by Matthias Christ Woelfel - John McDonough

The performance of conventional Automatic Speech Recognition (ASR) systems degrades dramatically as soon as the microphone is moved away from the mouth of the speaker. This book presents a description of theoretic abstraction and practical issues inherent in the distant ASR problem. It is suitable for those in speech technology, and acoustics. Top page

Complete description

This is a complete overview of distant automatic speech recognition. The performance of conventional Automatic Speech Recognition (ASR) systems degrades dramatically as soon as the microphone is moved away from the mouth of the speaker. This is due to a broad variety of effects such as background noise, overlapping speech from other speakers, and reverberation. While traditional ASR systems under perform for speech captured with far-field sensors, there are a number of novel techniques within the recognition system as well as techniques developed in other areas of signal processing that can mitigate the deleterious effects of noise and reverberation, as well as separating speech from overlapping speakers. "Distant Speech Recognition" presents a contemporary and comprehensive description of both theoretic abstraction and practical issues inherent in the distant ASR problem. This book: covers the entire topic of distant ASR and offers practical solutions to overcome the problems related to it; provides documentation and sample scripts to enable readers to construct state-of-the-art distant speech recognition systems; gives relevant background information in acoustics and filter techniques; explains the extraction and enhancement of classification relevant speech features; describes maximum likelihood as well as discriminative parameter estimation, and maximum likelihood normalization techniques; discusses the use of multi-microphone configurations for speaker tracking and channel combination; and, presents several applications of the methods and technologies described in this book. There is an accompanying website with open source software and tools to construct state-of-the-art distant speech recognition systems. This reference will be an invaluable resource for researchers, developers, engineers and other professionals, as well as advanced students in speech technology, signal processing, acoustics, statistics and artificial intelligence fields. Top page

General info

Publisher & Imprint: Wiley-Blackwell (an imprint of John Wiley & Sons Ltd)

City: Chicester

Pages: 600

More info: height 250 mm width 150 mm weight 666 gr

Top page

Age recommended: Professional and scholarly

Summary Distant Speech Recognition 1 Introduction 1.1 Research and Applications in Academia and Industry 1.2 Challenges in Distant Speech Recognition 1.3 System Evaluation 1.4 Fields of Speech Recognition 1.5 Robust Perception 1.6 Organizations, Conferences and Journals 1.7 Useful Tools, Data Resources and Evaluation Campaigns 1.8 Organization of this Book 2 Acoustics 2.1 Physical Aspect of Sound 2.2 Speech Signals 2.3 Human Perception of Sound 2.4 The Acoustic Environment 2.5 Recording Techniques and Sensor Configuration 2.6 Summary and Further Reading 3 Signal Processing and Filtering Techniques 3.1 Linear Time-Invariant Systems 3.2 The Discrete Fourier Transform 3.3 Short-Time Fourier Transform 3.4 Summary and Further Reading 4 Bayesian Filters 4.1 Sequential Bayesian Estimation 4.2 Wiener Filter 4.3 Kalman Filter and Variations 4.4 Particle Filters 4.5 Summary and Further Reading 5 Speech Feature Extraction 5.1 Short-Time Spectral Analysis 5.2 Perceptually Motivated Representation 5.3 Spectral Estimation and Analysis 5.4 Cepstral Processing 5.5 Comparison between Mel-Frequency, Perceptual LP and warped MVDR Cepstral Coefficient Front Ends 5.6 Feature Augmentation 5.7 Feature Reduction 5.8 Feature-Space Minimum Phone Error 5.9 Summary and Further Reading 6 Speech Feature Enhancement 6.1 Noise and Reverberation in Various Domains 6.2 Two Principal Approaches 6.3 Direct Speech Feature Enhancement 6.4 Schematics of Indirect Speech Feature Enhancement 6.5 Estimating Additive Distortion 6.6 Estimating Convolutional Distortion 6.7 Distortion Evolution 6.8 Distortion Evaluation 6.9 Distortion Compensation 6.10 Joint Estimation of Additive and Convolutional Distortions 6.11 Observation Uncertainty 6.12 Summary and Further Reading 7 Search: Finding the BestWord Hypothesis 7.1 Fundamentals of Search 7.2 Weighted Finite-State Transducers 7.3 Knowledge Sources 7.4 Fast On-the-Fly Composition 7.5 Word and Lattice Combination 7.6 Summary and Further Reading 8 Hidden Markov Model Parameter Estimation 8.1 Maximum Likelihood Parameter Estimation 8.2 Discriminative Parameter Estimation 8.3 Summary and Further Reading 9 Feature and Model Transformation 9.1 Feature Transformation Techniques 9.2 Model Transformation Techniques 9.3 Acoustic Model Combination 9.4 Summary and Further Reading 10 Speaker Localization and Tracking 10.2 Speaker Tracking with the Kalman Filter 10.3 Tracking Multiple Simultaneous Speakers 10.4 Audio-Visual Speaker Tracking 10.5 Speaker Tracking with the Particle Filter 10.6 Summary and Further Reading 11 Digital Filter Banks 11.1 Uniform Discrete Fourier Transform Filter Banks 11.2 Polyphase Implementation 11.3 Decimation and Expansion 11.4 Noble Identities 11.5 Nyquist(M) Filters 11.6 Filter Bank Design of De Haan et al 11.7 Filter Bank Design with the Nyquist(M) Criterion 11.8 Quality Assessment of Filter Bank Prototypes 11.9 Summary and Further Reading 12 Blind Source Separation 12.1 Channel Quality and Selection 12.2 Independent Component Analysis 12.3 BSS Algorithms based on Second Order Statistics 12.4 Summary and Further Reading 13 Beamforming 13.1 Beamforming Fundamentals 13.2 Beamforming Performance Measures 13.3 Conventional Beamforming Algorithms 13.4 Recursive Algorithms 13.5 Nonconventional Beamforming Algorithms 13.6 Array shape calibration 13.7 Summary and Further Reading 14 Hands On 14.1 Example Room Configurations 14.2 Automatic Speech Recognition Engines 14.3 Word Error Rate 14.4 Single-Channel Feature Enhancement Experiments 14.5 Acoustic Speaker Tracking Experiments 14.6 Audio-Video Speaker Tracking Experiments 14.7 Speaker Tracking Performance vs. Word Error Rate 14.8 Single-Speaker Beamforming Experiments 14.9 Speech Separation Experiments 14.10Filter Bank Experiments 14.11Summary and Further Reading A List of Abbreviations B Useful Background B.1 Discrete Cosine Transform B.2 Matrix Inversion Lemma B.3 Cholesky decomposition B.4 Distance Measures B.5 Super-Gaussian Probability Density Functions B.5.1 Generalized Gaussian pdf B.5.2 Super-Gaussian pdfs with the Meier G-function B.6 Entropy B.7 Relative Entropy B.8 Transformation Law of Probabilities B.9 Cascade of Warping Stages B.10 Taylor Series B.11 Correlation and Covariance B.12 Bessel Functions B.13 Proof of the Nyquist-Shannon Sampling Theorem B.14 Proof of Equations (11.31-11.32) B.15 Givens Rotations B.16 Derivatives with respect to Complex Vectors B.17 Perpendicular Projection Operators Bibliography Top page

Add your comment

Add a review - Highlight this book to a friend

Please login or register to send your review

Top page