Stack Explorer

llama.cpp

llm library

LLM inference on pure CPU with C++

Official site

Supported languages

llama.cpp is a C/C++ implementation for inference of Llama and compatible models. It allows running LLMs on pure CPU without GPU, with support for aggressive quantization and processor architecture- specific optimizations.

Concepts

gguf-formatquantizationcpu-optimizationsimdmemory-mappingbatched-inference

Pros and Cons

Ventajas

  • + Works without GPU
  • + Extremely efficient on CPU
  • + Quantization down to 2-bit
  • + Cross-platform (Linux, Mac, Windows)
  • + Optimized Apple Silicon support
  • + Foundation for many popular tools

Desventajas

  • - Slower than GPU inference
  • - Requires model conversion to GGUF
  • - Low-level API
  • - Not for training, inference only

Casos de Uso

  • LLMs on laptops without GPU
  • Deployment on CPU servers
  • Local desktop applications
  • Model development and testing
  • Edge computing with language models

Related Technologies