Dynamic Regularization in UDA for Transformers in Multimodal Classification

Ivonne Monter-Aldana, Adrian Pastor Lopez Monroy, Fernando Sanchez-Vega


Abstract
Multimodal machine learning is a cutting-edge field that explores ways to incorporate information from multiple sources into models. As more multimodal data becomes available, this field has become increasingly relevant. This work focuses on two key challenges in multimodal machine learning. The first is finding efficient ways to combine information from different data types. The second is that often, one modality (e.g., text) is stronger and more relevant, making it difficult to identify meaningful patterns in the weaker modality (e.g., image). Our approach focuses on more effectively exploiting the weaker modality while dynamically regularizing the loss function. First, we introduce a new two-stream model called Multimodal BERT-ViT, which features a novel intra-CLS token fusion. Second, we utilize a dynamic adjustment that maintains a balance between specialization and generalization during the training to avoid overfitting, which we devised. We add this dynamic adjustment to the Unsupervised Data Augmentation (UDA) framework. We evaluate the effectiveness of these proposals on the task of multi-label movie genre classification using the Moviescope and MM-IMDb datasets. The evaluation revealed that our proposal offers substantial benefits, while simultaneously enabling us to harness the weaker modality without compromising the information provided by the stronger.
Anthology ID:
2023.acl-long.485
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8700–8711
Language:
URL:
https://aclanthology.org/2023.acl-long.485
DOI:
10.18653/v1/2023.acl-long.485
Bibkey:
Cite (ACL):
Ivonne Monter-Aldana, Adrian Pastor Lopez Monroy, and Fernando Sanchez-Vega. 2023. Dynamic Regularization in UDA for Transformers in Multimodal Classification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8700–8711, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Dynamic Regularization in UDA for Transformers in Multimodal Classification (Monter-Aldana et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-long.485.pdf
Video:
 https://aclanthology.org/2023.acl-long.485.mp4