trlX: A Framework for Large Scale Open Source RLHF

Louis Castricato


Abstract
Reinforcement learning from human feedback (RLHF) utilizes human feedback to better align large language models with human preferences via online optimization against a learned reward model. Current RLHF paradigms rely on Proximal Policy Optimization (PPO), which quickly becomes a challenge to implement and scale up to large architectures. To address this difficulty we created the trlX library as a feature-complete open-source framework for RLHF fine-tuning of models up to and exceeding 70 billion parameters. We implemented support for multiple types of distributed training including distributed data parallel, model sharded, as well as tensor, sequential, and pipeline parallelism.
Anthology ID:
2023.nlposs-1.27
Volume:
Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)
Month:
December
Year:
2023
Address:
Singapore
Editors:
Liling Tan, Dmitrijs Milajevs, Geeticka Chauhan, Jeremy Gwinnup, Elijah Rippeth
Venues:
NLPOSS | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
246–246
Language:
URL:
https://aclanthology.org/2023.nlposs-1.27
DOI:
10.18653/v1/2023.nlposs-1.27
Bibkey:
Cite (ACL):
Louis Castricato. 2023. trlX: A Framework for Large Scale Open Source RLHF. In Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), pages 246–246, Singapore. Association for Computational Linguistics.
Cite (Informal):
trlX: A Framework for Large Scale Open Source RLHF (Castricato, NLPOSS-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.nlposs-1.27.pdf