Arno Candel


2023

pdf bib
H2O Open Ecosystem for State-of-the-art Large Language Models
Arno Candel | Jon McKinney | Philipp Singer | Pascal Pfeiffer | Maximilian Jeblick | Chun Ming Lee | Marcos Conde
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Large Language Models (LLMs) represent a revolution in AI. However, they also pose many significant risks, such as the presence of biased, private, copyrighted or harmful text. For this reason we need open, transparent and safe solutions. We introduce a complete open-source ecosystem for developing and testing LLMs. The goal of this project is to boost open alternatives to closed-source approaches. We release h2oGPT, a family of fine-tuned LLMs from 7 to 70 Billion parameters. We also introduce H2O LLM Studio, a framework and no-code GUI designed for efficient fine-tuning, evaluation, and deployment of LLMs using the most recent state-of-the-art techniques. Our code and models are licensed under fully permissive Apache 2.0 licenses. We believe open-source language models help to boost AI development and make it more accessible and trustworthy. Our demo is available at: https://gpt.h2o.ai/