Solutions for two important problems for the deployment of noise-robust large
vocabulary automatic speech recognizers using the missing data paradigm are presented. irst
problem is the generation of missing data masks. We propose and evaluate a method based on
vector quantization and harmonicity that successfully exploits the characteristics of speech
while requiring only weak assumptions on the noise. A second problem that is addressed is
computational efficiency. We advocate the usage of PROSPECT features and the L-cluster-Mbest
method for Gaussian selection. In total, a speed up of a factor of about 6 can be achieved
with these methods.
Keywords: Large vocabulary continuous speech recognition, missing feature, vector quantization,
harmonicity.