Speaker embedding extraction with virtual phonetic information

Sreekanth, S and Rafi, B Shaik Mohammad and Kodukula, Sri Rama Murty and et al, . (2019) Speaker embedding extraction with virtual phonetic information. In: 7th IEEE Global Conference on Signal and Information Processing, GlobalSIP, 11-14 November 2019, Ottawa,Canada.

Full text not available from this repository. (Request a copy)

Abstract

In the recent past, deep neural networks have been successfully employed to extract fixed-dimensional speaker embeddings from the speech signal. The commonly used x-vectors are extracted by projecting the magnitude spectral features extracted from the speech signal onto a speaker-discriminative space. As the x-vectors do not explicitly capture the speaker-specific phonological pronunciation variability, phonetic vectors extracted from an automatic speech recognition (ASR) engine were supplied as auxiliary information to improve the performance of the x-vector system. However, the development of ASR engine requires a huge amount of manually transcribed speech data. In this paper, we propose to transcribe the speech signal in an unsupervised manner with the cluster labels obtained from a mixture of autoencoders (MoA) trained on a large amount of speech data. The unsupervised labels, referred to as virtual phonetic transcriptions, are used to extract the phonetic vectors. The virtual phonetic vectors extracted using MoA are supplied as auxiliary information to the x-vector system. The performance of the proposed system is compared with the state-of-the-art x-vector system on NIST SRE-2010 data. The proposed unsupervised auxiliary information provides a relative improvement of 12.08%, 3.61% and 16.66% over the x-vector system on core-core, core-10sec and 10sec-10sec conditions, respectively.

[error in script]
IITH Creators:
IITH CreatorsORCiD
Kodukula, Sri Rama Murtyhttps://orcid.org/0000-0002-6355-5287
Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: Mixture of autoencoders, Speaker recognition, Time-delay neural network, Unsupervised c-vector, X-vector, Indexed in Scopus
Subjects: Electrical Engineering
Divisions: Department of Electrical Engineering
Depositing User: Team Library
Date Deposited: 27 Feb 2020 07:55
Last Modified: 27 Feb 2020 07:55
URI: http://raiithold.iith.ac.in/id/eprint/7482
Publisher URL: http://doi.org/10.1109/GlobalSIP45357.2019.8969551
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 7482 Statistics for this ePrint Item