Significance of analytic phase of speech signals in speaker verification

Vijayan, K and Reddy, P R and Kodukula, Sri Rama Murty (2016) Significance of analytic phase of speech signals in speaker verification. Speech Communication, 81. pp. 54-71. ISSN 0167-6393

Full text not available from this repository. (Request a copy)

Abstract

The objective of this paper is to establish the importance of phase of analytic signal of speech, referred to as the analytic phase, in human perception of speaker identity, as well as in automatic speaker verification. Subjective studies are conducted using analytic phase distorted speech signals, and the adversities occurred in human speaker verification task are observed. Motivated from the perceptual studies, we propose a method for feature extraction from analytic phase of speech signals. As unambiguous computation of analytic phase is not possible due to the phase wrapping problem, feature extraction is attempted from its derivative, i.e., the instantaneous frequency (IF). The IF is computed by exploiting the properties of the Fourier transform, and this strategy is free from the phase wrapping problem. The IF is computed from narrowband components of speech signal, and discrete cosine transform is applied on deviations in IF to pack the information in smaller number of coefficients, which are referred to as IF cosine coefficients (IFCCs). The nature of information in the proposed IFCC features is studied using minimal-pair ABX (MP-ABX) tasks, and t-stochastic neighbor embedding (t-SNE) visualizations. The performance of IFCC features is evaluated on NIST 2010 SRE database and is compared with mel frequency cepstral coefficients (MFCCs) and frequency domain linear prediction (FDLP) features. All the three features, IFCC, FDLP and MFCC, provided competitive speaker verification performance with average EERs of 2.3%, 2.2% and 2.4%, respectively. The IFCC features are more robust to vocal effort mismatch, and provided relative improvements of 26% and 11% over MFCC and FDLP features, respectively, on the evaluation conditions involving vocal effort mismatch. Since magnitude and phase represent different components of the speech signal, we have attempted to fuse the evidences from them at the i-vector level of speaker verification system. It is found that the i-vector fusion is considerably better than the conventional scores fusion. The i-vector fusion of FDLP+IFCC features provided a relative improvement of 36% over the system based on FDLP features alone, while the fusion of MFCC+IFCC provided a relative improvement of 37% over the system based on MFCC alone, illustrating that the proposed IFCC features provide complementary speaker specific information to the magnitude based FDLP and MFCC features.

[error in script]
IITH Creators:
IITH CreatorsORCiD
Kodukula, Sri Rama Murtyhttps://orcid.org/0000-0002-6355-5287
Item Type: Article
Additional Information: We would like to thank the anonymous reviewers for their creative suggestions and constructive criticisms, which helped to improve the content of this paper.
Uncontrolled Keywords: Analytic phase; Instantaneous frequency; Feature extraction; MP-ABX tasks; t-SNE visualization; Speaker verification
Subjects: Others > Electricity
Divisions: Department of Electrical Engineering
Depositing User: Team Library
Date Deposited: 28 Mar 2016 04:38
Last Modified: 05 Dec 2017 03:57
URI: http://raiithold.iith.ac.in/id/eprint/2252
Publisher URL: https://doi.org/10.1016/j.specom.2016.02.005
OA policy: http://www.sherpa.ac.uk/romeo/issn/0167-6393/
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 2252 Statistics for this ePrint Item