Knowledge Distillation

technique technique

Transfer knowledge from large to small models

Supported languages

Knowledge Distillation is a model compression technique where a small model (student) learns to mimic the behavior of a large model (teacher). It enables creating efficient models that maintain much of the performance of much larger models.

Concepts

teacher-student-learningsoft-labelstemperature-scalinglogit-matchingfeature-matchingcompression-ratio

Pros and Cons

Ventajas

+ Small models with large model performance
+ Drastic reduction in inference costs
+ Enables edge and mobile deployment
+ Lower latency in production
+ Preserves specific teacher capabilities
+ Well-established and documented technique

Desventajas

- Requires access to teacher model
- Complex training process
- Doesn't transfer all knowledge
- Needs large datasets for good transfer
- Student never surpasses teacher

Casos de Uso

Creating lightweight LLM versions for production
Models for mobile and IoT devices
API cost reduction
Specialization of general models
Domain-specific model creation