A training technique where a smaller "student" model learns to mimic the outputs of a larger "teacher" model, achieving competitive performance with fewer pa...
Making a small AI learn from a big AI โ like a little kid learning from a wise teacher to become smart without getting as big.
A technique where a large, powerful AI model teaches a smaller model to give similar answers โ so you get good results on cheaper hardware.
A training technique where a smaller "student" model learns to mimic the outputs of a larger "teacher" model, achieving competitive performance with fewer parameters.
A model compression technique where soft probability distributions from a teacher model provide richer training signal than hard labels, enabling the student to approximate the teacher's performance at a fraction of the compute.
Transfer of learned representations from a high-capacity teacher to a compact student by matching soft logits (Hinton distillation), intermediate features, or attention patterns โ trading capacity for inference efficiency while preserving most of the teacher's generalization.
Want to explore Knowledge Distillation in depth?
Ask SeekBox and get answers from 7 AI engines at once.
Try it in SeekBox โ