MemeCap: A Dataset for Captioning and Interpreting Memes

EunJeong Hwang, Vered Shwartz


Abstract
Memes are a widely popular tool for web users to express their thoughts using visual metaphors. Understanding memes requires recognizing and interpreting visual metaphors with respect to the text inside or around the meme, often while employing background knowledge and reasoning abilities. We present the task of meme captioning and release a new dataset, MemeCap. Our dataset contains 6.3K memes along with the title of the post containing the meme, the meme captions, the literal image caption, and the visual metaphors. Despite the recent success of vision and language (VL) models on tasks such as image captioning and visual question answering, our extensive experiments using state-of-the-art VL models show that they still struggle with visual metaphors, and perform substantially worse than humans.
Anthology ID:
2023.emnlp-main.89
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1433–1445
Language:
URL:
https://aclanthology.org/2023.emnlp-main.89
DOI:
10.18653/v1/2023.emnlp-main.89
Bibkey:
Cite (ACL):
EunJeong Hwang and Vered Shwartz. 2023. MemeCap: A Dataset for Captioning and Interpreting Memes. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 1433–1445, Singapore. Association for Computational Linguistics.
Cite (Informal):
MemeCap: A Dataset for Captioning and Interpreting Memes (Hwang & Shwartz, EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.89.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.89.mp4