Multi-Feature Integration for Speaker Embedding Extraction

Sankala, Sreekanth and Rafi B, Shaik Mohammad and Kodukula, Sri Rama Murty (2022) Multi-Feature Integration for Speaker Embedding Extraction. In: 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022, 23 May 2022 through 27 May 2022, Virtual, Online.

[img] Text
ICASSP_IEEE_International_Conference_on_Acoustics_Speech_and_Signal_Processing_Proceedings.pdf - Published Version
Restricted to Registered users only

Download (1MB) | Request a copy

Abstract

The performance of the automatic speaker recognition system is becoming more and more accurate, with the advancement in deep learning methods. However, current speaker recognition system performances are subjective to the training conditions, thereby decreasing the performance drastically even on slightly varied test data. A lot of methods such as using various data augmentation structures, various loss functions, and integrating multiple features systems have been proposed and shown a performance improvement. This work focuses on integrating multiple features to improve speaker verification performance. Speaker information is commonly represented in the different kinds of features, where the redundant and irrelevant information such as noise and channel information will affect the dimensions of different features in a different manner. In this work, we intend to maximize the speaker information by reconstructing the extracted speaker information in one feature from the other features while at the same time minimizing the irrelevant information. The experiments with the multi-feature integration model demonstrate improved performance than the stand-alone models by significant margins. Also, the extracted speaker embeddings are found to be noise-robust. © 2022 IEEE

[error in script]
IITH Creators:
IITH CreatorsORCiD
Kodukula, Sri Rama Murtyhttps://orcid.org/0000-0002-6355-5287
Item Type: Conference or Workshop Item (Paper)
Additional Information: This work was supported by DST National Mission Interdisciplinary Cyber-Physical Systems (NM-ICPS), Technology Innovation Hub on Autonomous Navigation and Data Acquisition Systems: TiHAN Foundations at Indian Institute of Technology (IIT) Hyderabad
Uncontrolled Keywords: Deep speaker embeddings; Feature fusion; Self supervised learning; Speaker verification
Subjects: Civil Engineering
Divisions: Department of Electrical Engineering
Depositing User: . LibTrainee 2021
Date Deposited: 16 Jul 2022 05:46
Last Modified: 16 Jul 2022 05:46
URI: http://raiithold.iith.ac.in/id/eprint/9741
Publisher URL: http://doi.org/10.1109/ICASSP43922.2022.9746318
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 9741 Statistics for this ePrint Item