Stack Explorer

TRL

training library

HuggingFace library for reinforcement learning training of LLMs

Official site

Supported languages

Pros and Cons

Ventajas

  • + Transformers integration
  • + Support for PPO, DPO, ORPO
  • + Well documented
  • + Actively maintained

Desventajas

  • - HuggingFace specific
  • - Learning curve for RL

Casos de Uso

  • RLHF training
  • DPO training
  • Model alignment