Bramhendra, Koilakuntla and Kodukula, Sri Rama Murty
(2020)
End to End ASR Free Keyword Spotting with Transfer Learning from Speech Synthesis.
Masters thesis, Indian Institute of Technology Hyderabad.
Abstract
Keyword Spotting is an important application in speech. But it requires as much as data of an
Automatic Speech Recognition(ASR).But the problem is much specific compare to that of an ASR.
Here the work made an effort to reduce the transcribed data dependency while building the ASR.
Traditional keyword spotting(KWS) architectures built on top of ASR. Such as Lattice indexing and
Keyword filler models are very popular in this approach. Though they give very good accuracy the
former suffers being a offline system, and the latter suffer from less accuracy Here we proposed an
improvement to an approach called End-to-End ASR free Keyword Spotting. This system has been
inspired from traditional keyword spotting architectures consist of three modules namely acoustic encoder and phonetic encoder and keyword neural network. The acoustic encoder process the
speech features and gets a fixed length representation, same as phonetic encoder gets fixed length
representation both concatenated to form input for keyword neural network. The keyword network
predicts whether the keyword exist or not. Here we proposed to retain all the hidden representation
to have temporal resolution to identify the location of the query. And also we propose to pretrain
the phonetic encoder to make aware of acoustic projection. By doing these changes the performance
is improved by 7.1% absolutely. And in addition to that system being end to end gives an advantage
of making it easily deploy able.
Actions (login required)
|
View Item |