Towards Robust and Semantically Organised Latent Representations for Unsupervised Text Style Transfer

Narasimhan, S. and Dey, S. and Desarkar, Maunendra Sankar (2022) Towards Robust and Semantically Organised Latent Representations for Unsupervised Text Style Transfer. In: 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, 10 July 2022through 15 July 2022, Seattle.

[img] Text
NAACL_2022.pdf - Published Version
Restricted to Registered users only

Download (906kB) | Request a copy

Abstract

Recent studies show that auto-encoder based approaches successfully perform language generation, smooth sentence interpolation, and style transfer over unseen attributes using unlabelled datasets in a zero-shot manner. The latent space geometry of such models is organised well enough to perform on datasets where the style is “coarse-grained” i.e. a small fraction of words alone in a sentence are enough to determine the overall style label. A recent study uses a discrete token-based perturbation approach to map “similar” sentences (“similar” defined by low Levenshtein distance/high word overlap) close by in latent space. This definition of “similarity” does not look into the underlying nuances of the constituent words while mapping latent space neighbourhoods and therefore fails to recognise sentences with different style-based semantics while mapping latent neighbourhoods. We introduce EPAAEs (Embedding Perturbed Adversarial AutoEncoders) which completes this perturbation model, by adding a finely adjustable noise component on the continuous embeddings space. We empirically show that this (a) produces a better organised latent space that clusters stylistically similar sentences together, (b) performs best on a diverse set of text style transfer tasks than similar denoising-inspired baselines, and (c) is capable of fine-grained control of Style Transfer strength. We also extend the text style transfer tasks to NLI datasets and show that these more complex definitions of style are learned best by EPAAE. To the best of our knowledge, extending style transfer to NLI tasks has not been explored before. © 2022 Association for Computational Linguistics.

[error in script]
IITH Creators:
IITH CreatorsORCiD
Desarkar, Maunendra Sankarhttps://orcid.org/0000-0003-1963-7338
Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: Auto encoders; Coarse-grained; Embeddings; Language generation; Levenshtein distance; Neighbourhood; Noise components; Perturbation approach; Perturbation model; Space geometry
Subjects: Computer science
Divisions: Department of Computer Science & Engineering
Depositing User: . LibTrainee 2021
Date Deposited: 03 Oct 2022 10:07
Last Modified: 03 Oct 2022 10:07
URI: http://raiithold.iith.ac.in/id/eprint/10776
Publisher URL:
Related URLs:

Actions (login required)

View Item View Item
Statistics for RAIITH ePrint 10776 Statistics for this ePrint Item