Instantaneous frequency filter-bank features for low resource speech recognition using deep recurrent architectures

Nayak, S. and Shiva Kumar, C. and Sri Rama Murty, K. (2021) Instantaneous frequency filter-bank features for low resource speech recognition using deep recurrent architectures. In: 27th National Conference on Communications, NCC 2021, 27 July 2021 through 30 July 2021, Virtual, Kanpur.

Full text not available from this repository. (Request a copy)

Abstract

Recurrent neural networks (RNNs) and its variants have achieved significant success in speech recognition. Long short term memory (LSTM) and gated recurrent units (GRUs) are the two most popular variants which overcome the vanishing gradient problem of RNNs and also learn effectively long term dependencies. Light gated recurrent units (Li-GRUs) are more compact versions of standard GRUs. Li-GRUs have been shown to provide better recognition accuracy with significantly faster training. These different RNN inspired architectures invariably use magnitude based features and the phase information is generally ignored. We propose to incorporate the features derived from the analytic phase of the speech signals for speech recognition using these RNN variants. Instantaneous frequency filter-bank (IFFB) features derived from Fourier transform relations performed at par with the standard MFCC features for recurrent units based acoustic models despite being derived from phase information only. Different system combinations of IFFB features with the magnitude based features provided lowest PER of 12.9% and showed relative improvements of up to 16.8% over standalone MFCC features on TIMIT phone recognition using Li-GRU based architecture. IFFB features significantly outperformed the modified group delay coefficients (MGDC) features in all our experiments. © 2021 IEEE.

[error in script]
IITH Creators:
IITH CreatorsORCiD
Sri Rama Murty, K.UNSPECIFIED
Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: Feature extraction, Instantaneous frequency, Li-GRU, RNN, Speech recognition
Subjects: Electrical Engineering
Divisions: Department of Electrical Engineering
Depositing User: Mrs Haseena VKKM
Date Deposited: 26 Apr 2022 06:05
Last Modified: 26 Apr 2022 06:05
URI: http://raiithold.iith.ac.in/id/eprint/9250
Publisher URL: https://ieeexplore.ieee.org/document/9530049
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 9250 Statistics for this ePrint Item