Knowledge Distillation
technique technique
Transfer knowledge from large to small models
Supported languages
Knowledge Distillation is a model compression technique where a small model (student) learns to mimic the behavior of a large model (teacher). It enables creating efficient models that maintain much of the performance of much larger models.
Concepts
teacher-student-learningsoft-labelstemperature-scalinglogit-matchingfeature-matchingcompression-ratio
Pros and Cons
Ventajas
- + Small models with large model performance
- + Drastic reduction in inference costs
- + Enables edge and mobile deployment
- + Lower latency in production
- + Preserves specific teacher capabilities
- + Well-established and documented technique
Desventajas
- - Requires access to teacher model
- - Complex training process
- - Doesn't transfer all knowledge
- - Needs large datasets for good transfer
- - Student never surpasses teacher
Casos de Uso
- Creating lightweight LLM versions for production
- Models for mobile and IoT devices
- API cost reduction
- Specialization of general models
- Domain-specific model creation