Stack Explorer

LoRA (Low-Rank Adaptation)

technique technique

Efficient fine-tuning with low-rank matrices

Official site

Supported languages

LoRA is a revolutionary fine-tuning technique that allows adapting large language models without training all their parameters. It works by freezing the original model weights and injecting trainable low-rank matrices into each transformer layer, drastically reducing memory requirements and training time.

Concepts

low-rank-decompositionadaptersparameter-efficient-fine-tuningmatrix-factorizationfrozen-weightstrainable-parametersrank-selection

Pros and Cons

Ventajas

  • + Reduces memory usage up to 10x compared to full fine-tuning
  • + Significantly faster training
  • + Enables fine-tuning on consumer hardware (8-16GB GPUs)
  • + Small adapter models easy to share (MB vs GB)
  • + Multiple adapters can be loaded dynamically
  • + Preserves original model knowledge

Desventajas

  • - May not capture very complex changes in model behavior
  • - Requires careful hyperparameter selection (rank, alpha)
  • - Quality may be slightly lower than full fine-tuning in extreme cases
  • - Not all frameworks support it equally well

Casos de Uso

  • Adapting LLMs to specific domains (legal, medical, technical)
  • Fine-tuning with limited computational resources
  • Creating multiple specialized versions of a base model
  • Model personalization for specific tasks
  • Rapid experimentation with different configurations