TRL
training library
HuggingFace library for reinforcement learning training of LLMs
Supported languages
Pros and Cons
Ventajas
- + Transformers integration
- + Support for PPO, DPO, ORPO
- + Well documented
- + Actively maintained
Desventajas
- - HuggingFace specific
- - Learning curve for RL
Casos de Uso
- RLHF training
- DPO training
- Model alignment