HuaSLIM: Human Attention Motivated Shortcut Learning Identification and Mitigation for Large Language models

Yuqi Ren, Deyi Xiong


Abstract
Large language models have made remarkable progress on a variety of NLP tasks. However, it has been found that they tend to rely on shortcut features that spuriously correlate with labels for prediction, which weakens their generalization on out-of-distribution samples. In this paper, we propose a human attention guided approach to identifying and mitigating shortcut learning, which encourages the LLM-based target model to learn relevant features. We define an attention-based measurement to capture both model and data bias and identify shortcut tokens by exploring both human and neural attention. In a self-distillation framework, we mitigate shortcut learning by dynamically adjusting the distillation temperature according to the detected shortcut tokens and estimated shortcut degree. Additionally, we utilize human attention as a supervisory signal to constrain large language models to pay more attention to relevant tokens. Experimental results on multiple NLP tasks show that our proposed method can effectively identify shortcut tokens, and significantly improve the robustness of large language models on OOD samples, while not undermining the performance on IID data.
Anthology ID:
2023.findings-acl.781
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12350–12365
Language:
URL:
https://aclanthology.org/2023.findings-acl.781
DOI:
10.18653/v1/2023.findings-acl.781
Bibkey:
Cite (ACL):
Yuqi Ren and Deyi Xiong. 2023. HuaSLIM: Human Attention Motivated Shortcut Learning Identification and Mitigation for Large Language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 12350–12365, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
HuaSLIM: Human Attention Motivated Shortcut Learning Identification and Mitigation for Large Language models (Ren & Xiong, Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.781.pdf