Instantaneous Frequency Features for Noise Robust Speech Recognition

Nayak, Shekhar and Shashank, Dhar B. and Kodukula, Sri Rama Murty and et al, . (2019) Instantaneous Frequency Features for Noise Robust Speech Recognition. In: 25th National Conference on Communications, NCC, 20 - 23 February 2019, Bangalore, India.

Full text not available from this repository. (Request a copy)

Abstract

Analytic phase of the speech signal plays an important role in human speech perception, specially in the presence of noise. Generally, phase information is ignored in most of the recent speech recognition systems. In this paper, we illustrate the importance of analytic phase of the speech signal for noise robust automatic speech recognition. To avoid phase wrapping problem involved in the computation of analytic phase, features are extracted from instantaneous frequency (IF) which is time derivative of analytic phase. Deep neural network (DNN) based acoustic models are trained on clean speech using features extracted from the IF of speech signals. Robustness of IF features in combination with mel-frequency cepstral coefficients (MFCCs) was evaluated in varied noisy conditions. System combination using minimum Bayes risk decoding of IF features with MFCCs delivered absolute improvements of upto 13% over MFCC features alone for DNN based systems under noisy conditions. The impact of the system combination of magnitude and phase based features on different phonetic classes was studied under noisy conditions and was found to model both voiced and unvoiced phonetic classes efficiently.

[error in script]

IITH Creators: