vLLM

inference tool

High-performance LLM inference engine

Supported languages

Pros and Cons

Ventajas

+ Very fast
+ Paged Attention
+ Continuous batching
+ OpenAI compatible

Desventajas

- Inference only
- GPU required

Casos de Uso

LLM serving
Inference at scale
Model APIs
Production

Related Technologies

Alternatives

Text Generation Inference Ollama