Towards Robust and Semantically Organised Latent Representations for Unsupervised Text Style Transfer

Narasimhan, S. and Dey, S. and Desarkar, Maunendra Sankar (2022) Towards Robust and Semantically Organised Latent Representations for Unsupervised Text Style Transfer. In: 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, 10 July 2022through 15 July 2022, Seattle.

Text
NAACL_2022.pdf - Published Version
Restricted to Registered users only
Download (906kB) | Request a copy

Abstract

Recent studies show that auto-encoder based approaches successfully perform language generation, smooth sentence interpolation, and style transfer over unseen attributes using unlabelled datasets in a zero-shot manner. The latent space geometry of such models is organised well enough to perform on datasets where the style is “coarse-grained” i.e. a small fraction of words alone in a sentence are enough to determine the overall style label. A recent study uses a discrete token-based perturbation approach to map “similar” sentences (“similar” defined by low Levenshtein distance/high word overlap) close by in latent space. This definition of “similarity” does not look into the underlying nuances of the constituent words while mapping latent space neighbourhoods and therefore fails to recognise sentences with different style-based semantics while mapping latent neighbourhoods. We introduce EPAAEs (Embedding Perturbed Adversarial AutoEncoders) which completes this perturbation model, by adding a finely adjustable noise component on the continuous embeddings space. We empirically show that this (a) produces a better organised latent space that clusters stylistically similar sentences together, (b) performs best on a diverse set of text style transfer tasks than similar denoising-inspired baselines, and (c) is capable of fine-grained control of Style Transfer strength. We also extend the text style transfer tasks to NLI datasets and show that these more complex definitions of style are learned best by EPAAE. To the best of our knowledge, extending style transfer to NLI tasks has not been explored before. © 2022 Association for Computational Linguistics.

[error in script]

IITH Creators:

IITH Creators	ORCiD
Desarkar, Maunendra Sankar	https://orcid.org/0000-0003-1963-7338

Item Type:

Conference or Workshop Item (Paper)

Uncontrolled Keywords:

Auto encoders; Coarse-grained; Embeddings; Language generation; Levenshtein distance; Neighbourhood; Noise components; Perturbation approach; Perturbation model; Space geometry

Subjects:

Computer science

Divisions:

Department of Computer Science & Engineering

Depositing User:

. LibTrainee 2021

Date Deposited:

03 Oct 2022 10:07

Last Modified:

03 Oct 2022 10:07

URI:

http://raiithold.iith.ac.in/id/eprint/10776