Relevance-assisted Generation for Robust Zero-shot Retrieval

Jihyuk Kim, Minsoo Kim, Joonsuk Park, Seung-won Hwang


Abstract
Zero-shot retrieval tasks such as the BEIR benchmark reveal out-of-domain generalization as a key weakness of high-performance dense retrievers. As a solution, domain adaptation for dense retrievers has been actively studied. A notable approach is synthesizing domain-specific data, by generating pseudo queries (PQ), for fine-tuning with domain-specific relevance between PQ and documents. Our contribution is showing that key biases can cause sampled PQ to be irrelevant, negatively contributing to generalization. We propose to preempt their generation, by dividing the generation into simpler subtasks, of generating relevance explanations and guiding the generation to avoid negative generalization. Experiment results show that our proposed approach is more robust to domain shifts, validated on challenging BEIR zero-shot retrieval tasks.
Anthology ID:
2023.emnlp-industry.67
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
December
Year:
2023
Address:
Singapore
Editors:
Mingxuan Wang, Imed Zitouni
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
723–731
Language:
URL:
https://aclanthology.org/2023.emnlp-industry.67
DOI:
10.18653/v1/2023.emnlp-industry.67
Bibkey:
Cite (ACL):
Jihyuk Kim, Minsoo Kim, Joonsuk Park, and Seung-won Hwang. 2023. Relevance-assisted Generation for Robust Zero-shot Retrieval. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 723–731, Singapore. Association for Computational Linguistics.
Cite (Informal):
Relevance-assisted Generation for Robust Zero-shot Retrieval (Kim et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-industry.67.pdf
Video:
 https://aclanthology.org/2023.emnlp-industry.67.mp4