Nayak, Shekhar and Bhati, Saurabhchand and Kodukula, Sri Rama Murty
(2017)
An investigation into instantaneous frequency estimation methods for improved speech recognition features.
In: IEEE Global Conference on Signal and Information Processing (GlobalSIP), 14-16 November 2017, Montreal, QC, Canada.
Full text not available from this repository.
(
Request a copy)
Abstract
There have been several studies, in the recent past, pointing to the importance of analytic phase of the speech signal in human perception, especially in noisy conditions. However, phase information is still not used in state-of-the-art speech recognition systems. In this paper, we illustrate the importance of analytic phase of the speech signal for automatic speech recognition. As the computation of analytic phase suffers from inevitable phase wrapping problem, we extract features from its time derivative, referred to as instantaneous frequency (IF). In this work, we highlight the issues involved in IF extraction from speech-like signals, and propose suitable modifications for IF extraction from speech signals. We used the deep neural network (DNN) framework to build a speech recognition system using features extracted from the IF of speech signals. The speech recognition system based on IF features delivered a phoneme error rate of 21.8% on TIMIT database, while the baseline system based on mel-frequency cepstral coefficients (MFCCs) delivered a phoneme error rate of 18.4%. The combination of IF and MFCC features based systems, using minimum Bayes risk (MBR) decoding, provided a relative improvement of 8.7% over the baseline system, illustrating the significance of analytic phase for speech recognition.
Actions (login required)
|
View Item |