David Donahue


2019

pdf bib
Adversarial Decomposition of Text Representation
Alexey Romanov | Anna Rumshisky | Anna Rogers | David Donahue
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

In this paper, we present a method for adversarial decomposition of text representation. This method can be used to decompose a representation of an input sentence into several independent vectors, each of them responsible for a specific aspect of the input sentence. We evaluate the proposed method on two case studies: the conversion between different social registers and diachronic language change. We show that the proposed method is capable of fine-grained controlled change of these aspects of the input sentence. It is also learning a continuous (rather than categorical) representation of the style of the sentence, which is more linguistically realistic. The model uses adversarial-motivational training and includes a special motivational loss, which acts opposite to the discriminator and encourages a better decomposition. Furthermore, we evaluate the obtained meaning embeddings on a downstream task of paraphrase detection and show that they significantly outperform the embeddings of a regular autoencoder.

2017

pdf bib
HumorHawk at SemEval-2017 Task 6: Mixing Meaning and Sound for Humor Recognition
David Donahue | Alexey Romanov | Anna Rumshisky
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes the winning system for SemEval-2017 Task 6: #HashtagWars: Learning a Sense of Humor. Humor detection has up until now been predominantly addressed using feature-based approaches. Our system utilizes recurrent deep learning methods with dense embeddings to predict humorous tweets from the @midnight show #HashtagWars. In order to include both meaning and sound in the analysis, GloVe embeddings are combined with a novel phonetic representation to serve as input to an LSTM component. The output is combined with a character-based CNN model, and an XGBoost component in an ensemble model which achieves 0.675 accuracy on the evaluation data.