Self-Supervised Phonotactic Representations for Language Identification

Ramesh, G. and Kumar, C. Shiva and Kodukula, Sri Rama Murty (2021) Self-Supervised Phonotactic Representations for Language Identification. In: 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021, 30 August 2021through 3 September 2021, Brno.

[img] Text
INTERSPEECH.pdf - Published Version
Restricted to Registered users only

Download (829kB) | Request a copy

Abstract

Phonotactic constraints characterize the sequence of permissible phoneme structures in a language and hence form an important cue for language identification (LID) task. As phonotactic constraints span across multiple phonemes, the short-term spectral analysis (20-30 ms) alone is not sufficient to capture them. The speech signal has to be analyzed over longer contexts (100s of milliseconds) in order to extract features representing the phonotactic constraints. The supervised senone classifiers, aimed at modeling triphone context, have been used for extracting language-specific features for the LID task. However, it is difficult to get large amounts of manually labeled data to train the supervised models. In this work, we explore a selfsupervised approach to extract long-term contextual features for the LID task. We have used wav2vec architecture to extract contextualized representations from multiple frames of the speech signal. The contextualized representations extracted from the pre-trained wav2vec model are used for the LID task. The performance of the proposed features is evaluated on a dataset containing 7 Indian languages. The proposed self-supervised embeddings achieved 23% absolute improvement over the acoustic features and 3% absolute improvement over their supervised counterparts. Copyright © 2021 ISCA.

[error in script]
IITH Creators:
IITH CreatorsORCiD
Kodukula, Sri Rama Murtyhttps://orcid.org/0000-0002-6355-5287
Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: Bottleneck features; Deep neural networks; Language identification; Self-supervised representations
Subjects: Electrical Engineering
Divisions: Department of Electrical Engineering
Depositing User: . LibTrainee 2021
Date Deposited: 24 Sep 2022 11:23
Last Modified: 24 Sep 2022 11:23
URI: http://raiithold.iith.ac.in/id/eprint/10694
Publisher URL: http://doi.org/10.21437/Interspeech.2021-1310
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 10694 Statistics for this ePrint Item