End to End ASR Free Keyword Spotting with Transfer Learning from Speech Synthesis

Bramhendra, Koilakuntla and Kodukula, Sri Rama Murty (2020) End to End ASR Free Keyword Spotting with Transfer Learning from Speech Synthesis. Masters thesis, Indian Institute of Technology Hyderabad.

Preview

Text
Mtech_Thesis_TD1591_2019.pdf
Download (2MB) | Preview

Abstract

Keyword Spotting is an important application in speech. But it requires as much as data of an Automatic Speech Recognition(ASR).But the problem is much specific compare to that of an ASR. Here the work made an effort to reduce the transcribed data dependency while building the ASR. Traditional keyword spotting(KWS) architectures built on top of ASR. Such as Lattice indexing and Keyword filler models are very popular in this approach. Though they give very good accuracy the former suffers being a offline system, and the latter suffer from less accuracy Here we proposed an improvement to an approach called End-to-End ASR free Keyword Spotting. This system has been inspired from traditional keyword spotting architectures consist of three modules namely acoustic encoder and phonetic encoder and keyword neural network. The acoustic encoder process the speech features and gets a fixed length representation, same as phonetic encoder gets fixed length representation both concatenated to form input for keyword neural network. The keyword network predicts whether the keyword exist or not. Here we proposed to retain all the hidden representation to have temporal resolution to identify the location of the query. And also we propose to pretrain the phonetic encoder to make aware of acoustic projection. By doing these changes the performance is improved by 7.1% absolutely. And in addition to that system being end to end gives an advantage of making it easily deploy able.

[error in script]

IITH Creators: