RCLN at SemEval-2023 Task 1: Leveraging Stable Diffusion and Image Captions for Visual WSD

Antonina Mijatovic, Davide Buscaldi, Ekaterina Borisova


Abstract
This paper describes the participation of the RCLN team at the Visual Word Sense Disambiguation task at SemEval 2023. The participation was focused on the use of CLIP as a base model for the matching between text and images with additional information coming from captions generated from images and the generation of images from the prompt text using Stable Diffusion. The results we obtained are not particularly good, but interestingly enough, we were able to improve over the CLIP baseline in Italian by recurring simply to the generated images.
Anthology ID:
2023.semeval-1.301
Volume:
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Giovanni Da San Martino, Harish Tayyar Madabushi, Ritesh Kumar, Elisa Sartori
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
2174–2178
Language:
URL:
https://aclanthology.org/2023.semeval-1.301
DOI:
10.18653/v1/2023.semeval-1.301
Bibkey:
Cite (ACL):
Antonina Mijatovic, Davide Buscaldi, and Ekaterina Borisova. 2023. RCLN at SemEval-2023 Task 1: Leveraging Stable Diffusion and Image Captions for Visual WSD. In Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pages 2174–2178, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
RCLN at SemEval-2023 Task 1: Leveraging Stable Diffusion and Image Captions for Visual WSD (Mijatovic et al., SemEval 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.semeval-1.301.pdf