Stack Explorer

QLoRA (Quantized LoRA)

technique technique

LoRA with quantization for maximum memory efficiency

Official site

Supported languages

QLoRA combines the LoRA technique with 4-bit quantization to enable fine-tuning of massive language models on very limited hardware. Using NormalFloat4 and Double Quantization, it drastically reduces memory usage while maintaining original model quality.

Concepts

4bit-quantizationnormalfloat4double-quantizationpaged-optimizersmemory-efficient-traininggradient-checkpointing

Pros and Cons

Ventajas

  • + Fine-tune 65B+ models on a single 48GB GPU
  • + Reduces memory up to 4x more than standard LoRA
  • + Maintains 99% of full fine-tuning quality
  • + Democratizes access to large LLM fine-tuning
  • + Compatible with most popular models
  • + Reasonable training time

Desventajas

  • - Slightly slower inference due to quantization
  • - Additional configuration complexity
  • - Some models don't quantize well
  • - Requires specific libraries (bitsandbytes)

Casos de Uso

  • Fine-tuning 70B models on consumer GPUs
  • Training on laptops with GPU
  • Rapid experimentation with large models
  • Creating custom models on a budget
  • Academic research with limited resources