Stack Explorer

Knowledge Distillation

technique technique

Transfer knowledge from large to small models

Official site

Supported languages

Knowledge Distillation is a model compression technique where a small model (student) learns to mimic the behavior of a large model (teacher). It enables creating efficient models that maintain much of the performance of much larger models.

Concepts

teacher-student-learningsoft-labelstemperature-scalinglogit-matchingfeature-matchingcompression-ratio

Pros and Cons

Ventajas

  • + Small models with large model performance
  • + Drastic reduction in inference costs
  • + Enables edge and mobile deployment
  • + Lower latency in production
  • + Preserves specific teacher capabilities
  • + Well-established and documented technique

Desventajas

  • - Requires access to teacher model
  • - Complex training process
  • - Doesn't transfer all knowledge
  • - Needs large datasets for good transfer
  • - Student never surpasses teacher

Casos de Uso

  • Creating lightweight LLM versions for production
  • Models for mobile and IoT devices
  • API cost reduction
  • Specialization of general models
  • Domain-specific model creation