Neural Comb Filtering Using Sliding Window Attention Network for Speech Enhancement

Parvathala, Venkatesh and Andhavarapu, Sivaganesh and Pamisetty, Giridhar and Kodukula, Sri Rama Murty (2022) Neural Comb Filtering Using Sliding Window Attention Network for Speech Enhancement. Circuits, Systems, and Signal Processing. ISSN 0278-081X

Full text not available from this repository. (Request a copy)

Abstract

In this paper, we demonstrate the significance of restoring harmonics of the fundamental frequency (pitch) in the deep neural network (DNN)-based speech enhancement. The parameters of the DNN can be estimated by minimizing the mask loss, but it does not restore the pitch harmonics, especially at higher frequencies. In this paper, we propose to restore the pitch harmonics in the spectral domain by minimizing cepstral loss around the pitch peak. Restoring the cepstral pitch peak, in turn, helps in restoring the pitch harmonics in the enhanced spectrum. The proposed cepstral pitch-peak loss acts as an adaptive comb filter on voiced segments and emphasizes the pitch harmonics in the speech spectrum. The network parameters are estimated using a combination of mask loss and cepstral pitch-peak loss. We show that this combination offers the complementary advantages of enhancing both the voiced and unvoiced regions. The DNN-based methods primarily rely on the network architecture, and hence, the prediction accuracy improves with the increasing complexity of the architecture. The lower complex models are essential for real-time processing systems. In this work, we propose a compact model using a sliding-window attention network (SWAN). The SWAN is trained to regress the spectral magnitude mask (SMM) from the noisy speech signal. Our experimental results demonstrate that the proposed approach achieves comparable performance with the state-of-the-art noncausal and causal speech enhancement methods with much lesser computational complexity. Our three-layered noncausal SWAN achieves 2.99 PESQ on the Valentini database with only 10 9 floating-point operations (FLOPs). © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.

[error in script]
IITH Creators:
IITH CreatorsORCiD
Kodukula, Sri Rama Murtyhttps://orcid.org/0000-0002-6355-5287
Item Type: Article
Uncontrolled Keywords: Cepstral pitch peak; Pitch harmonics; Production-related loss; Spectral magnitude mask; Transformer
Subjects: Electrical Engineering
Divisions: Department of Electrical Engineering
Depositing User: . LibTrainee 2021
Date Deposited: 16 Aug 2022 07:29
Last Modified: 16 Aug 2022 07:29
URI: http://raiithold.iith.ac.in/id/eprint/10184
Publisher URL: http://doi.org/10.1007/s00034-022-02123-2
OA policy: https://v2.sherpa.ac.uk/id/publication/15622
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 10184 Statistics for this ePrint Item